Information Retrieval and Data Mining

Core course, 9 ECTS credits, winter semester 2019 – 2020

Basic Information

TypeCore course, 9 ECTS

Dr. Andrew Yates and Dr. Rishiraj Saha Roy

Coordinators and Contact

Sreyasi Nag Chowdhury and Azin Ghazimatin


Wednesdays, 16-18, E1 3 - Hörsaal II (0.02) and Fridays, 14-16, E1 3 - Hörsaal II (0.02)

(first lecture will be on Wednesday, Oct 16)


Please select your tutorial slot here:


Monday 14-16 (first tutorial is on Nov 4th):

  • Group A - E1.3 SR014
  • Group B - E1.3 SR015
  • Group C - E1.3 SR016
  • Group D - E1.3 SR107

Tuesday 10-12 (first tutorial is on Nov 5th):

  • Group E - E1.3 SR015
  • Group F - E1.3 SR016
  • Group G - E1.3 SR107
  • Group H - E1.1 SR106
ExamsTo be announced

Teaching Assistants

Anna TigunovaVĩnh Thịnh Hồ Sebastian Dalleiger
Anna Guimarães Ghazaleh Haratinezhad TorbatiDavid Kaltenpoth
Shrestha Ghosh Magdalena Kaiser Alexander Marx
Hiba Arnaout Janis Kalofolias 

Lecture Schedule

Lecture 0116.10.19Foundations I RSRAggarwal Ch.  2
Lecture 0218.10.19Foundations IIRSRAggarwal Ch. 12
Lecture 0323.10.19Statistics IAY 
Lecture 0425.10.19Statistics IIAY 
Lecture 0530.10.19Pattern Mining IRSR 
Lecture 0606.11.19Pattern Mining IIRSR 
Lecture 0708.11.19ClassificationAY 
Lecture 0813.11.19Clustering IJV 
Lecture 0915.11.19Clustering IIJV 
Lecture 1020.11.19Sequences I    RSR 
Lecture 1122.11.19Sequences II    RSR     
Lecture 1227.11.19Graphs I    RSR 
Lecture 1329.11.19Graphs IIRSR 
Lecture 1404.12.19Anomaly DetectionRSR 
Lecture 1506.12.19IR BasicsAY 
Lecture 1611.12.19IndexingAY 
Lecture 1713.12.19EvaluationAY 
Lecture 1818.12.19Ranking IAY 
Lecture 1920.12.19Ranking IIAY 
Lecture 2008.01.20Click Analysis IRSR 
Lecture 2110.01.20Click Analysis IIRSR 
Lecture 2215.01.20Neural IR IAY 
Lecture 2317.01.20Neural IR IIAY 
Lecture 2422.01.20Query expansionAY 
Lecture 2524.01.20Entities in IRAY 
Lecture 2629.01.20Question AnsweringRSR 
Lecture 2431.01.20RecapRSR, AY 

Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.


Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

Tutorials and Exercises

After you receive the assignment sheet, you solve the problems (individually) at home, and submit them on the appointed dates to the TAs after the lecture. During the tutorial sessions, the TAs will ask some of you to present your solutions. Every student must present their solutions at least 2 times during the semester. The TAs will also help in clarifying your answers. Your submitted sheets will be graded and handed back to you at the end of the session.

To do the exercises, you have to study the required reading material and go through the slides.

We do not allow plagiarism. The first time you are caught, you will receive 0 points for the specific assignment. The second time, you will be de-registered from the course. 

Grading and Requirements for Passing the Course

The overall grade will be the best result of the end-term and a re-exam (there will be no further attempts). There will be no mid-term exams. The final exam is closed-book and no discussion is allowed.

To participate in the final written exam, the following prerequisites are required:

  • Submit ALL 14 assignments
  • Obtain 50% or more on average over all assignments (80% or more on average will fetch you a bonus point, that results in one grade point jump (if possible) in the final exam)
  • Present solutions at least 2 times in the tutorials


We will use the following primary textbooks.

For Probability and Statistics,

  • Larry Wasserman: All of Statistics, Springer, 2004.

For Data Mining,

  • Charu Aggarwal: Data Mining - The Textbook, Springer, 2015.
  • Mohammed J. Zaki and Wagner Meira Jr: Data Mining and Analysis, Cambridge University Press, 2014.

For Information Retrieval,

  • Chris Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
  • ChengXiang Zhai and Sean Massung: Text Data Management and Analytics, Morgan & Claypool, 2016

These and additional references are available in the library: