Information Retrieval and Data Mining

Core course, 9 ECTS credits, winter semester 2017 – 2018

News

  • (updated) 2018-04-04: Find the results of the re-exam here
  • 2018-03-02: Find the results of the exam here.
  • 2018-02-03: The final results (pass/fail) of the Information Retrieval part of the course are here.
  • 2018-01-24: Today's lecture is cancelled due to lecturer's sick leave. All the material needed for the next week's tutorials will be covered on Friday (Jan 26).
  • 2017-12-15: find here the results of Data Mining
  • 2017-11-14: starting from 21st of November, group D will be in room E1.7 0.01
  • 2017-11-13: group C has been replaced by group A and B
  • 2017-11-10: find here the final results of Foundations and Statistics
  • 2017-10-25: find here the quiz grades
  • 2017-10-19: tutorial registration is closed. Find your group assignment below
  • 2017-10-13: you can now register for the tutorials (see below)
  • 2017-10-09: lecture and tutorial schedule (tentative) posted
  • 2017-09-01: more information will follow soon

Basic Information

TypeCore course, 9 ECTS
Lecturers

Dr. Jilles Vreeken  and Dr. Jannik Strötgen

Coordinators and Contact

Asia Biega and Panagiotis Mandros 

Lectures

Wednesdays, 14-16, E1 3 - Hörsaal II (0.02) and Fridays, 12-14, E1 3 - Hörsaal II (0.02)

(first lecture will be on Wednesday, Oct 18)

Tutorials

Monday, 14-16 and Tuesday, 10-12 (first tutorials on Oct 23 and 24.)

Monday:

  • Group A - E1.3 SR014
  • Group B - E1.3 SR015

Tuesday:

  • Group D - E1.7 0.01
  • Group E - E1.3 SR015
Exams
  • February 21st, 2018, 14:00 - 17:00, E2.2, Günter-Hotz hall
  • March 14th, 2018, 14:00 - 17:00, E2.2, Günter-Hotz hall 

Teaching Assistants

Kailash Budhathoki

Cuong Xuan Chu

Sebastian Dalleiger

Jonas Fischer

Azin Ghazimatin

Anna Christina Guimaraes

David Kaltenpoth

Janis Kalofolias

Preethi Lahoti

Alexander Marx

Kashyap Popat

 

Lecture Schedule

WeekDateLectureLecturerReading
42

Oct 18

Oct 20

Introduction, Foundations I

Foundations II

JV & JS

JV

Aggarwal Ch. 12

Aggarwal Ch. 2

43

Oct 25

Oct 27

Statistics I

Statistics II

JV

JV

Wasserman Ch. 1-5

Wasserman Ch. 6,7,9,10

44

Nov 1

Nov 3

yay, holiday, no lecture

Classification

-

JV

 

Aggarwal Ch. 10

45

Nov 8

Nov 10

Pattern Mining I

Pattern Mining II

JV

JV

Aggarwal Ch 4, 5.2

 

46

Nov 15

Nov 17

Clustering I

Clustering II

JV

JV

Aggarwal Ch. 6, 7

 

47

Nov 22

Nov 24

yay, no lecture

Outlier Analysis

-

JV

 

Aggarwal Ch. 8, 9

48

Nov 29

Dec 1

Sequences I

Sequences II

JV

JV

Aggarwal Ch. 3.4, 14, 15

 

49

Dec 6

Dec 8

Graphs I

Graphs II

JV

JV

Aggarwal Ch. 17, 19

 

50

Dec 13

Dec 15

IR Basics

Preprocessing and Evaluation

(DM Wrap Up, Causal Inference)

JS

JS

JV

Manning et al. Ch. 1, 5.1, 6

Manning et al. Ch. 2.1, 2.2, 3.3, 8

51

Dec 20

Dec 22

NLP for IR

yay, almost holiday, no lecture

JS

 

(slides)

 

52

Dec 27

Dec 29

yay, holiday, no lecture

yay, holiday, no lecture

  
1

Jan 3

Jan 5

Ranking I (updated, 2018-01-08)

Ranking II (updated, 2018-01-08)

JS

JS

Manning et al. Ch. 6, 12

Manning et al. Ch. 9, 18

2

Jan 10

Jan 12

Indexing

Query Processing

JS

JS

Manning et al. Ch. (3,) 4, 5

Manning et al. Ch. 7

3

Jan 17

Jan 19

Web Search I

Web Search II

JS

JS

Manning et al. Ch. 19, 20, 21

Manning et al. Ch. 19. 20, 21

4

Jan 24

Jan 26

(Cancelled)

Text Mining

JS

JS

 

Manning et al. Ch 13, 19

5

Jan 31

Feb 2

Semantic Search

Advanced IR

JS

JS

 

Tutorial Schedule

WeekDateTopicSample ExercicesRequired Reading

42

Oct 16/17

no tutorial session

 

 

43

Oct 23/24

Sampling, Pre-Processing, PCA

Solution 

Aggarwal Ch. 2, 12

44

Oct 30/31

yay, holiday, no tutorial

 

 

45

Nov 6/7

Probabilities and Statistics

Solution

Wasserman Ch. 1-7, 9, 10

46

Nov 13/14

Pattern Mining

Solution

Aggarwal Ch 4, 5.2

47

Nov 20/21

Clustering

Solution

Aggarwal Ch. 6, 7

48

Nov 27/28

Classification & Outliers

Solution

Aggarwal Ch. 8, 9, 10

49

Dec 4/5

Sequences

Solution

Aggarwal Ch. 3.4, 14, 15

50

Dec 11/12

Graphs

Solution

Aggarwal Ch. 17, 19

51

Dec 18/19

IR Basics & Evaluation

Solution, script

Manning et al. Ch. 1, 2.1, 2.2, 3.3, 5.1, 6, 8

52

Dec 25/26

yay, holiday, no tutorial

  

1

Jan 1/2

yay, holiday, no tutorial

 

 

2

Jan 8/9

IR Ranking

Solution

Manning et al. Ch. 6, 9, 12, 18

3

Jan 15/16

IR Indexing

Solution

Manning et al. Ch. 3, 4, 5, 7

4

Jan 22/23

IR Web Search

Solution

Manning et al. Ch. 19, 20, 21

5

Jan 29/30

IR Text Mining

Solution

Manning et al. Ch 13, 19

Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

Tutorials and Excercises

During the tutorial sessions, you will work on excercises that cover the topics of the lectures. At the start of the tutorial session, you will receive the excercise sheet, which you solve during the session. During the tutorial, the tutors are there to help and clarify. At the end of the tutorial session you hand in your solutions. These will be graded, and handed back to you the next tutorial session.

To do the exercises within the alloted time, you will have to have studied the required reading material, the slides, and practice the sample exercises before the tutorial.

To be eligible to participate in the exam, you will need to obtain at least 50% of the excercise points for each three parts of the course.

We do not allow plagiarism. The first time you are caught, you will receive 0 points for the full sheet. The second time, you are excluded from the course. 

Grading and Requirements for Passing the Course

The overall grade will be the best result of the end-term and the re-exam.

To participate in the final written exam, the following prerequisites are required:

  • Obtain 50% or more of the points of the exercise sheets on Foundations and Statistics (exercise sheets 1 and 2)
  • Obtain 50% or more of the points of the exercise sheets on Data Mining (exercise sheets 3, 4, 5, 6, and 7)
  • Obtain 50% or more of the points of the exercise sheets on Information Retrieval (exercise sheets 8, 9, 10, 11, and 12)

This edition of IRDM, there will be no mid-term exams.

Literature

We will use the following primary textbooks.

For Probability and Statistics,

  • Larry Wasserman: All of Statistics, Springer, 2004

For Data Mining,

  • Charu Aggarwal: Data Mining - The Textbook, Springer, 2015

For Information Retrieval,

  • Chris Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval, Cambridge, 2008
  • ChengXiang Zhai, Sean Massung: Text Data Management and Analytics, Morgan Claypool, 2016

These and addditional references are available in the library: