Information Retrieval and Data Mining
Core course, 9 ECTS credits, winter semester 2019 – 2020
News
 23.10.2019: First assignment is out!
 30.10.2019: Slides from Lecture 02 updated.
 30.10.2019: First assignment is due today at the lecture by 16:15!
 05.11.2019: Hints added to Assignment 2!
 15.11.2019: Clarifications added to Assignment 4 Problem 3.
 26.11.2019: Tutorial presentation bonuses introduced  see Google Groups!
Basic Information
Type  Core course, 9 ECTS  
Lecturers  
Coordinators and Contact  
Lectures  Wednesdays, 1618, E1 3  Hörsaal II (0.02) and Fridays, 1416, E1 3  Hörsaal II (0.02)  
Tutorials 
 
Exams  Final Exam: Wednesday 26.02.2020, 14:00  17:00, Lecture Hall 001, E2.5. Reexam: Friday 27.03.2020, 09:00  12:00, GünterHotzLecture Hall, E2.2.  
Teaching Assistants 
 
Google Group  IRDM19 
Lecture Schedule
Lecture  Date  Topic  Lecturer  Reading 
Lecture 01  Oct 16  Foundations I  RSR  Aggarwal Ch. 2 
Lecture 02  Oct 18  Foundations II  RSR  Aggarwal Ch. 12 
Lecture 03  Oct 23  Statistics I  AY  Wasserman Ch. 15 
Lecture 04  Oct 25  Statistics II  AY  Wasserman Ch. 6, 7, 9, 10 
Lecture 05  Oct 30  Pattern Mining I  RSR  Aggarwal Ch. 4, Zaki & Meira Ch. 8 
holiday  
Lecture 06  Nov 06  Pattern Mining II  RSR  Aggarwal Ch. 5, Zaki & Meira Ch. 9, 12 
Lecture 07  Nov 08  Classification  AY  Aggarwal Ch. 10, Zaki & Meira Ch. 18, 19, 22 
Lecture 08  Nov 13  Clustering I  JV  Aggarwal Ch. 6 
Lecture 09  Nov 15  Clustering II  JV  Aggarwal Ch. 7 
Lecture 10  Nov 20  Sequences I  RSR  Aggarwal Ch. 3, 14, 15 
Lecture 11  Nov 22  Sequences II  RSR  Aggarwal Ch. 14, 15 
Lecture 12  Nov 27  Graphs I  RSR  Aggarwal Ch. 17, 19, Zaki & Meira Ch. 4, 11, 16 
Lecture 13  Nov 29  Graphs II  RSR  Zaki & Meira Ch. 16 
Lecture 14  Dec 04  Anomaly Detection  RSR  Aggarwal Ch. 8, 9 
Lecture 15  Dec 06  IR Basics  AY  Manning et al. Ch. 1, 5.1, 6, Zhai & Massung Ch. 8 
Lecture 16  Dec 11  Ranking I  AY  Manning et al. Ch. 6, 12, Zhai & Massung Ch. 6 
Lecture 17  Dec 13  Preprocessing & Evaluation  AY  Manning et al. Ch. 2.12.2, 3.3, 8, Zhai & Massung Ch. 9 
Lecture 18  Dec 18  Ranking II  AY  Manning et al. Ch. 11, 18, Zhai & Massung Ch. 17 
Lecture 19  Dec 20  Indexing  AY  Manning et al. Ch. (3,) 4, 5 
CHRISTMAS  BREAK  
Lecture 20  Jan 08  Link Analysis  RSR  Manning et al. Ch. 21, Aggarwal Ch. 18 
Lecture 21  Jan 10  Click Analysis  RSR  
Lecture 22  Jan 15  Neural IR I  AY  
Lecture 23  Jan 17  Neural IR II  AY  Deep Learning Book Ch 9, MacAvaney et al. 2019, Dai & Callan 2019 
Lecture 24  Jan 22  Query expansion  AY  Manning 9 and 19.6 
Lecture 25  Jan 24  Entities in IR  AY  
Lecture 26  Jan 29  QA Systems  RSR  
Lecture 24  Jan 31  Recap  RSR, AY 
Tutorial Schedule
Release Date  Submission Date  Tutorial Date  Topic  Exercice Sheet  Solution 
Oct 23  Oct 30  Nov 4/5  Foundations  
Oct 30  Nov 6  Nov 11/12  Statistics  
Nov 6  Nov 13  Nov 18/19  Pattern Mining  
Nov 13  Nov 20  Nov 25/26  Classification  
Nov 20  Nov 27  Dec 2/3  Clustering  
Nov 27  Dec 4  Dec 9/10  Sequences  
Dec 4  Dec 11  Dec 16/17  Graphs  
Dec 11  Dec 18  Jan 6/7  IR Basics  
Dec 18  Jan 8  Jan 13/14  Ranking and Evaluation  
Jan 8  Jan 15  Jan 20/21  Ranking and Indexing 
 
Jan 15  Jan 22  Jan 27/28  Link and Click Analysis  Assignment 11  
Jan 22  Jan 29  Feb 3/4  Neural IR  
Jan 29  Fec 5  Feb 10/11  Query Expansion 


Feb 5  Feb 12  Feb 17/18  Entities and QA 


Course Contents
Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.
Prerequisites
Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.
Tutorials and Exercises
After you receive the assignment sheet, you solve the problems (individually) at home, and submit them on the appointed dates to the TAs before the lecture (by 16:15). During the tutorial sessions, the TAs will ask some of you to present your solutions. Every student must present their solutions at least 2 times during the semester. The TAs will also help in clarifying your answers. Your submitted sheets will be graded and handed back to you at the end of the session.
To do the exercises, you have to study the required reading material and go through the slides.
We do not allow plagiarism. The first time you are caught, you will receive 0 points for the specific assignment. The second time, you will be deregistered from the course.
Grading and Requirements for Passing the Course
The overall grade will be the best result of the endterm and a reexam (there will be no further attempts). There will be no midterm exams. The final exam is closedbook and no discussion is allowed.
To participate in the final written exam, the following prerequisites are required:
 Submit ALL 14 assignments
 Obtain 50% or more on average over all assignments (80% or more on average will fetch you a bonus point, that results in one grade point jump (if possible) in the final exam)
 Present solutions at least 2 times in the tutorials
Literature
We will use the following primary textbooks.
For Probability and Statistics,
 Larry Wasserman: All of Statistics, Springer, 2004.
For Data Mining,
 Charu Aggarwal: Data Mining  The Textbook, Springer, 2015.
 Mohammed J. Zaki and Wagner Meira Jr: Data Mining and Analysis, Cambridge University Press, 2014.
For Information Retrieval,
 Chris Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
 ChengXiang Zhai and Sean Massung: Text Data Management and Analytics, Morgan & Claypool, 2016
These and additional references are available in the library: