Information Retrieval and Data Mining

Core course, 9 ECTS credits, winter semester 2019 – 2020

News

  • 23.10.2019: First assignment is out!
  • 30.10.2019: Slides from Lecture 02 updated.
  • 30.10.2019: First assignment is due today at the lecture by 16:15!
  • 05.11.2019: Hints added to Assignment 2!
  • 15.11.2019: Clarifications added to Assignment 4 Problem 3.
  • 26.11.2019: Tutorial presentation bonuses introduced - see Google Groups!
  • 09.03.2020: Final Exam grades out - check here. Congrats! :)
  • 09.03.2020: Final Exam inspection will be on the 12th of March. Check email in Google group for details.
  • 09.03.2020: The re-exam topics will largely be based on those in the assignments and the first exam.
  • 06.05.2020: The re-exam date has been decided (see below).
  • 14.05.2021: Re-exam grades out - check here. Congrats! :)

Basic Information

TypeCore course, 9 ECTS
Lecturers

Dr. Andrew Yates and Dr. Rishiraj Saha Roy

Coordinators and Contact

Sreyasi Nag Chowdhury and Azin Ghazimatin

Lectures

Wednesdays, 16-18, E1 3 - Hörsaal II (0.02) and Fridays, 14-16, E1 3 - Hörsaal II (0.02)

Tutorials

Monday 14-16 (first tutorial is on Nov 4th):

  • Group 1 - E1.3 SR014
  • Group 2 - E1.3 SR015
  • Group 3 - E1.3 SR016
  • Group 4 - E1.3 SR107

Tuesday 10-12 (first tutorial is on Nov 5th):

  • Group 5 - E1.3 SR015
  • Group 6 - E1.3 SR016
  • Group 7 - E1.3 SR107
  • Group 8 - E1.1 SR106
Exams

Final Exam: Wednesday 26.02.2020, 14:00 - 17:00, Lecture Hall 001, E2.5.

Re-exam: Monday 19.10.2020, 14:00 - 17:00, Lecture Hall GHH, 001, 002 and 003, E1 3

Teaching Assistants

Google GroupIRDM19

Lecture Schedule

LectureDateTopicLecturerReading
Lecture 01Oct 16Foundations I RSRAggarwal Ch.  2
Lecture 02Oct 18Foundations IIRSRAggarwal Ch. 12
Lecture 03Oct 23Statistics IAYWasserman Ch. 1-5
Lecture 04Oct 25Statistics IIAYWasserman Ch. 6, 7, 9, 10
Lecture 05Oct 30Pattern Mining IRSR

Aggarwal Ch. 4,

Zaki & Meira Ch. 8

holidayNov 01   
Lecture 06Nov 06Pattern Mining IIRSRAggarwal Ch. 5,
Zaki & Meira Ch. 9, 12
Lecture 07Nov 08ClassificationAY

Aggarwal Ch. 10,

Zaki & Meira Ch. 18, 19, 22

Lecture 08Nov 13Clustering IJVAggarwal Ch. 6
Lecture 09Nov 15Clustering IIJVAggarwal Ch. 7
Lecture 10Nov 20Sequences I    RSRAggarwal Ch. 3, 14, 15
Lecture 11Nov 22Sequences II    RSR    Aggarwal Ch. 14, 15
Lecture 12Nov 27Graphs I    RSR

Aggarwal Ch. 17, 19,

Zaki & Meira Ch. 4, 11, 16

Lecture 13Nov 29Graphs IIRSRZaki & Meira Ch. 16
Lecture 14Dec 04Anomaly DetectionRSRAggarwal Ch. 8, 9
Lecture 15Dec 06IR BasicsAY

Manning et al. Ch. 1, 5.1, 6,

Zhai & Massung Ch. 8

Lecture 16Dec 11Ranking IAY

Manning et al. Ch. 6, 12,

Zhai & Massung Ch. 6

Lecture 17Dec 13Preprocessing & EvaluationAY

Manning et al. Ch. 2.1-2.2, 3.3, 8,

Zhai & Massung Ch. 9

Lecture 18Dec 18Ranking IIAY

Manning et al. Ch. 11, 18,

Zhai & Massung Ch. 17

Lecture 19Dec 20IndexingAYManning et al. Ch. (3,) 4, 5
CHRISTMASBREAK   
Lecture 20Jan 08Link AnalysisRSR

Manning et al. Ch. 21,

Aggarwal Ch. 18

Lecture 21Jan 10Click AnalysisRSR

Joachims 2002,

Craswell & Szummer 2007

Lecture 22Jan 15Neural IR IAY

Deep Learning Book Ch 6

Guo et al. 2016

Lecture 23Jan 17Neural IR IIAYDeep Learning Book Ch 9,
MacAvaney et al. 2019,
Dai & Callan 2019
Lecture 24Jan 22Query expansionAYManning 9 and 19.6
Lecture 25Jan 24Entities in IRAY 
Lecture 26Jan 29Question Answering SystemsRSR

Lu et al. 2019Abujabal et al. 2018,

Chen et al. 2017, Clark and Gardner 2018

Lecture 27Jan 31RecapRSR, AY 

Tutorial Schedule

Release Date

Submission Date

Tutorial Date

Topic

Exercice Sheet

Solution

Oct 23

Oct 30

Nov 4/5

Foundations

Assignment 1

Solution 1

Oct 30

Nov 6

Nov 11/12

Statistics

Assignment 2

Solution 2

Nov 6

Nov 13

Nov 18/19

Pattern Mining

Assignment 3

Solution 3

Nov 13

Nov 20

Nov 25/26

Classification

Assignment 4

Solution 4

Nov 20

Nov 27

Dec 2/3

Clustering

Assignment 5

Solution 5

Nov 27

Dec 4

Dec 9/10

Sequences

Assignment 6

Solution 6

Dec 4

Dec 11

Dec 16/17

Graphs

Assignment 7

Solution 7

Dec 11

Dec 18

Jan 6/7

IR Basics

Assignment 8

Solution 8

Dec 18

Jan 8

Jan 13/14

Ranking and Evaluation

Assignment 9

Solution 9

Jan 8

Jan 15

Jan 20/21

Ranking and Indexing

Assignment 10

Solution 10

Jan 15

Jan 22

Jan 27/28

Link and Click Analysis

Assignment 11Solution 11

Jan 22

Jan 29

Feb 3/4

Neural IR

Assignment 12

Solution 12

Jan 29

Fec 5

Feb 10/11

Query Expansion

Assignment 13

Solution 13

Feb 5

Feb 12

Feb 17/18

Entities and QA

Assignment 14

Solution 14

Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

Tutorials and Exercises

After you receive the assignment sheet, you solve the problems (individually) at home, and submit them on the appointed dates to the TAs before the lecture (by 16:15). During the tutorial sessions, the TAs will ask some of you to present your solutions. Every student must present their solutions at least 2 times during the semester. The TAs will also help in clarifying your answers. Your submitted sheets will be graded and handed back to you at the end of the session.

To do the exercises, you have to study the required reading material and go through the slides.

We do not allow plagiarism. The first time you are caught, you will receive 0 points for the specific assignment. The second time, you will be de-registered from the course. 

Grading and Requirements for Passing the Course

The overall grade will be the best result of the end-term and a re-exam (there will be no further attempts). There will be no mid-term exams. The final exam is closed-book and no discussion is allowed.

To participate in the final written exam, the following prerequisites are required:

  • Submit ALL 14 assignments
  • Obtain 50% or more on average over all assignments (80% or more on average will fetch you a bonus point, that results in one grade point jump (if possible) in the final exam)
  • Present solutions at least 2 times in the tutorials

Literature

We will use the following primary textbooks.

For Probability and Statistics,

  • Larry Wasserman: All of Statistics, Springer, 2004.

For Data Mining,

  • Charu Aggarwal: Data Mining - The Textbook, Springer, 2015.
  • Mohammed J. Zaki and Wagner Meira Jr: Data Mining and Analysis, Cambridge University Press, 2014.

For Information Retrieval,

  • Chris Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
  • ChengXiang Zhai and Sean Massung: Text Data Management and Analytics, Morgan & Claypool, 2016

These and additional references are available in the library: