Information Retrieval and Data Mining

Core course, 9 ECTS credits, winter semester 2019 – 2020

Basic Information

TypeCore course, 9 ECTS
Lecturers

Dr. Andrew Yates and Dr. Rishiraj Saha Roy

Coordinators and Contact

Sreyasi Nag Chowdhury and Azin Ghazimatin

Lectures

Wednesdays, 16-18, E1 3 - Hörsaal II (0.02) and Fridays, 14-16, E1 3 - Hörsaal II (0.02)

(first lecture will be on Wednesday, Oct 16)

Tutorials

Please select your tutorial slot here: https://forms.gle/qjhu7jwkaRHGNWFM8

 

Monday 14-16 (first tutorial is on Nov 4th):

  • Group A - E1.3 SR014
  • Group B - E1.3 SR015
  • Group C - E1.3 SR016
  • Group D - E1.3 SR107
 

Tuesday 10-12 (first tutorial is on Nov 5th):

  • Group E - E1.3 SR015
  • Group F - E1.3 SR016
  • Group G - E1.3 SR107
  • Group H - E1.1 SR106
 
 
ExamsTo be announced

Teaching Assistants

 
Anna TigunovaVĩnh Thịnh Hồ Sebastian Dalleiger
Anna Guimarães Ghazaleh Haratinezhad TorbatiDavid Kaltenpoth
Shrestha Ghosh Magdalena Kaiser Alexander Marx
Hiba Arnaout Janis Kalofolias 
 

Lecture Schedule

LectureDateTopicLecturerReading
Lecture 0116.10.19Foundations I RSRAggarwal Ch.  2
Lecture 0218.10.19Foundations IIRSRAggarwal Ch. 12
Lecture 0323.10.19Statistics IAY 
Lecture 0425.10.19Statistics IIAY 
Lecture 0530.10.19Pattern Mining IRSR 
holiday01.11.19   
Lecture 0606.11.19Pattern Mining IIRSR 
Lecture 0708.11.19ClassificationAY 
Lecture 0813.11.19Clustering IJV 
Lecture 0915.11.19Clustering IIJV 
Lecture 1020.11.19Sequences I    RSR 
Lecture 1122.11.19Sequences II    RSR     
Lecture 1227.11.19Graphs I    RSR 
Lecture 1329.11.19Graphs IIRSR 
Lecture 1404.12.19Anomaly DetectionRSR 
Lecture 1506.12.19IR BasicsAY 
Lecture 1611.12.19IndexingAY 
Lecture 1713.12.19EvaluationAY 
Lecture 1818.12.19Ranking IAY 
Lecture 1920.12.19Ranking IIAY 
CHRISTMASBREAK   
Lecture 2008.01.20Click Analysis IRSR 
Lecture 2110.01.20Click Analysis IIRSR 
Lecture 2215.01.20Neural IR IAY 
Lecture 2317.01.20Neural IR IIAY 
Lecture 2422.01.20Query expansionAY 
Lecture 2524.01.20Entities in IRAY 
Lecture 2629.01.20Question AnsweringRSR 
Lecture 2431.01.20RecapRSR, AY 

Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

Tutorials and Exercises

After you receive the assignment sheet, you solve the problems (individually) at home, and submit them on the appointed dates to the TAs after the lecture. During the tutorial sessions, the TAs will ask some of you to present your solutions. Every student must present their solutions at least 2 times during the semester. The TAs will also help in clarifying your answers. Your submitted sheets will be graded and handed back to you at the end of the session.

To do the exercises, you have to study the required reading material and go through the slides.

We do not allow plagiarism. The first time you are caught, you will receive 0 points for the specific assignment. The second time, you will be de-registered from the course. 

Grading and Requirements for Passing the Course

The overall grade will be the best result of the end-term and a re-exam (there will be no further attempts). There will be no mid-term exams. The final exam is closed-book and no discussion is allowed.

To participate in the final written exam, the following prerequisites are required:

  • Submit ALL 14 assignments
  • Obtain 50% or more on average over all assignments (80% or more on average will fetch you a bonus point, that results in one grade point jump (if possible) in the final exam)
  • Present solutions at least 2 times in the tutorials

Literature

We will use the following primary textbooks.

For Probability and Statistics,

  • Larry Wasserman: All of Statistics, Springer, 2004.

For Data Mining,

  • Charu Aggarwal: Data Mining - The Textbook, Springer, 2015.
  • Mohammed J. Zaki and Wagner Meira Jr: Data Mining and Analysis, Cambridge University Press, 2014.

For Information Retrieval,

  • Chris Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
  • ChengXiang Zhai and Sean Massung: Text Data Management and Analytics, Morgan & Claypool, 2016

These and additional references are available in the library: