Information Retrieval and Data Mining

Core course, 9 ECTS credits, winter semester 2019 – 2020

News

  • 23.10.2019: First assignment is out!
  • 30.10.2019: Slides from Lecture 02 updated.
  • 30.10.2019: First assignment is due today at the lecture by 16:15!
  • 05.11.2019: Hints added to Assignment 2!
  • 15.11.2019: Clarifications added to Assignment 4 Problem 3.
  • 26.11.2019: Tutorial presentation bonuses introduced - see Google Groups!

Basic Information

TypeCore course, 9 ECTS
Lecturers

Dr. Andrew Yates and Dr. Rishiraj Saha Roy

Coordinators and Contact

Sreyasi Nag Chowdhury and Azin Ghazimatin

Lectures

Wednesdays, 16-18, E1 3 - Hörsaal II (0.02) and Fridays, 14-16, E1 3 - Hörsaal II (0.02)

Tutorials

 

Monday 14-16 (first tutorial is on Nov 4th):

  • Group 1 - E1.3 SR014
  • Group 2 - E1.3 SR015
  • Group 3 - E1.3 SR016
  • Group 4 - E1.3 SR107
 

Tuesday 10-12 (first tutorial is on Nov 5th):

  • Group 5 - E1.3 SR015
  • Group 6 - E1.3 SR016
  • Group 7 - E1.3 SR107
  • Group 8 - E1.1 SR106
 
 
Exams

Final Exam: Wednesday 26.02.2020, 14:00 - 17:00, Lecture Hall 001, E2.5.

Re-exam: Friday 27.03.2020, 09:00 - 12:00, Günter-Hotz-Lecture Hall, E2.2.

Teaching Assistants

 
Anna Tigunova                Vĩnh Thịnh Hồ Sebastian Dalleiger
Anna Guimarães Ghazaleh Haratinezhad TorbatiDavid Kaltenpoth (Room 3.05 in E1.7, dkaltenp@mpi-inf.mpg.de)
Shrestha Ghosh Magdalena Kaiser Alexander Marx
Hiba Arnaout Janis Kalofolias 
 
Google GroupIRDM19

Lecture Schedule

LectureDateTopicLecturerReading
Lecture 01Oct 16Foundations I RSRAggarwal Ch.  2
Lecture 02Oct 18Foundations IIRSRAggarwal Ch. 12
Lecture 03Oct 23Statistics IAYWasserman Ch. 1-5
Lecture 04Oct 25Statistics IIAYWasserman Ch. 6, 7, 9, 10
Lecture 05Oct 30Pattern Mining IRSR

Aggarwal Ch. 4,

Zaki&Meira Ch. 8

holidayNov 01   
Lecture 06Nov 06Pattern Mining IIRSRAggarwal Ch. 5,
Zaki&Meira Ch. 9, 12
Lecture 07Nov 08ClassificationAY

Aggarwal Ch. 10,

Zaki&Meira Ch. 18, 19, 22

Lecture 08Nov 13Clustering IJVAggarwal Ch. 6
Lecture 09Nov 15Clustering IIJVAggarwal Ch. 7
Lecture 10Nov 20Sequences I    RSRAggarwal Ch. 14
Lecture 11Nov 22Sequences II    RSR    Aggarwal Ch. 15
Lecture 12Nov 27Graphs I    RSRAggarwal Ch. 17
Lecture 13Nov 29Graphs IIRSRAggarwal Ch. 19
Lecture 14Dec 04Anomaly DetectionRSRAggarwal Ch. 8
Lecture 15Dec 06IR BasicsAY

Manning et al. Ch. 1, 5.1, 6,

Zhai & Massung Ch. 8

Lecture 16Dec 11Ranking IAY

Manning et al. Ch. 6, 12,

Zhai & Massung Ch. 6

Lecture 17Dec 13Preprocessing & EvaluationAY

Manning et al. Ch. 2.1-2.2, 3.3, 8,

Zhai & Massung Ch. 9

Lecture 18Dec 18Ranking IIAY

Manning et al. Ch. 11, 18,

Zhai & Massung Ch. 17

Lecture 19Dec 20IndexingAYManning et al. Ch. (3,) 4, 5
CHRISTMASBREAK   
Lecture 20Jan 08Click Analysis IRSR 
Lecture 21Jan 10Click Analysis IIRSR 
Lecture 22Jan 15Neural IR IAY 
Lecture 23Jan 17Neural IR IIAY 
Lecture 24Jan 22Query expansionAY 
Lecture 25Jan 24Entities in IRAY 
Lecture 26Jan 29QA SystemsRSR 
Lecture 24Jan 31RecapRSR, AY 

Tutorial Schedule

Release Date

Submission Date

Tutorial Date

Topic

Exercice Sheet

Solution

Oct 23

Oct 30

Nov 4/5

Foundations

Assignment 1

Solution 1

Oct 30

Nov 6

Nov 11/12

Statistics

Assignment 2

Solution 2

Nov 6

Nov 13

Nov 18/19

Pattern Mining

Assignment 3

Solution 3

Nov 13

Nov 20

Nov 25/26

Classification

Assignment 4

Solution 4

Nov 20

Nov 27

Dec 2/3

Clustering

Assignment 5

Solution 5

Nov 27

Dec 4

Dec 9/10

Sequences

Assignment 6

Solution 6

Dec 4

Dec 11

Dec 16/17

Graphs

Assignment 7

 

Dec 11

Dec 18

Jan 6/7

IR Basics

Assignment 8

 

Dec 18

Jan 8

Jan 13/14

Ranking and Evaluation

 

 

Jan 8

Jan 15

Jan 20/21

Ranking and Indexing

 

 

Jan 15

Jan 22

Jan 27/28

Click Analysis

  

Jan 22

Jan 29

Feb 3/4

Neural IR

 

 

Jan 29

Fec 5

Feb 10/11

Query Expansion

 

 

Feb 5

Feb 12

Feb 17/18

Entities and QA

 

 

Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

Tutorials and Exercises

After you receive the assignment sheet, you solve the problems (individually) at home, and submit them on the appointed dates to the TAs before the lecture (by 16:15). During the tutorial sessions, the TAs will ask some of you to present your solutions. Every student must present their solutions at least 2 times during the semester. The TAs will also help in clarifying your answers. Your submitted sheets will be graded and handed back to you at the end of the session.

To do the exercises, you have to study the required reading material and go through the slides.

We do not allow plagiarism. The first time you are caught, you will receive 0 points for the specific assignment. The second time, you will be de-registered from the course. 

Grading and Requirements for Passing the Course

The overall grade will be the best result of the end-term and a re-exam (there will be no further attempts). There will be no mid-term exams. The final exam is closed-book and no discussion is allowed.

To participate in the final written exam, the following prerequisites are required:

  • Submit ALL 14 assignments
  • Obtain 50% or more on average over all assignments (80% or more on average will fetch you a bonus point, that results in one grade point jump (if possible) in the final exam)
  • Present solutions at least 2 times in the tutorials

Literature

We will use the following primary textbooks.

For Probability and Statistics,

  • Larry Wasserman: All of Statistics, Springer, 2004.

For Data Mining,

  • Charu Aggarwal: Data Mining - The Textbook, Springer, 2015.
  • Mohammed J. Zaki and Wagner Meira Jr: Data Mining and Analysis, Cambridge University Press, 2014.

For Information Retrieval,

  • Chris Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
  • ChengXiang Zhai and Sean Massung: Text Data Management and Analytics, Morgan & Claypool, 2016

These and additional references are available in the library: