Information Retrieval and Data Mining

Core course, 9 ECTS credits, winter semester 2017 – 2018

News

  • (new) 2017-11-14: starting from 21st of November, group D will be in room E1.7 0.01
  • 2017-11-13: group C has been replaced by group A and B
  • 2017-11-10: find here the final results of Foundations and Statistics
  • 2017-10-25: find here the quiz grades
  • 2017-10-19: tutorial registration is closed. Find your group assignment below
  • 2017-10-13: you can now register for the tutorials (see below)
  • 2017-10-09: lecture and tutorial schedule (tentative) posted
  • 2017-09-01: more information will follow soon

Basic Information

TypeCore course, 9 ECTS
Lecturers

Dr. Jilles Vreeken  and Dr. Jannik Strötgen

Coordinators and Contact

Panagiotis Mandros and Asia Biega

Lectures

Wednesdays, 14-16, E1 3 - Hörsaal II (0.02) and Fridays, 12-14, E1 3 - Hörsaal II (0.02)

(first lecture will be on Wednesday, Oct 18)

Tutorials

Monday, 14-16 and Tuesday, 10-12 (first tutorials on Oct 23 and 24.)

Monday:

  • Group A - E1.3 SR014
  • Group B - E1.3 SR015
 

Tuesday:

  • Group D - E1.7 0.01
  • Group E - E1.3 SR015
 
 
Exams

Feb. 21st, 2018 and March 14th, 2018 from 14.00 - 17.00

Teaching Assistants

 

Kailash Budhathoki

Cuong Xuan Chu

Sebastian Dalleiger

(Jonas Fischer)

Azin Ghazimatin

Anna Christina Guimaraes

David Kaltenpoth

Janis Kalofolias

Preethi Lahoti

Alexander Marx

(Kashyap Popat)

 

 

Lecture Schedule

WeekDateLectureLecturerReading
42

Oct 18

Oct 20

Introduction, Foundations I

Foundations II

JV & JS

JV

Aggarwal Ch. 12

Aggarwal Ch. 2

43

Oct 25

Oct 27

Statistics I

Statistics II

JV

JV

Wasserman Ch. 1-5

Wasserman Ch. 6,7,9,10

44

Nov 1

Nov 3

yay, holiday, no lecture

Classification

-

JV

 

Aggarwal Ch. 10

45

Nov 8

Nov 10

Pattern Mining I

Pattern Mining II

JV

JV

Aggarwal Ch 4, 5.2

 

46

Nov 15

Nov 17

Clustering I

Clustering II

JV

JV

Aggarwal Ch. 6, 7

 

47

Nov 22

Nov 24

yay, no lecture

Outlier Analysis

-

JV

 

Aggarwal Ch. 8, 9

48

Nov 29

Dec 1

Sequences I

Sequences II

JV

JV

Aggarwal Ch. 3.4, 14, 15

 

49

Dec 6

Dec 8

Graphs I

Graphs II

JV

JV

Aggarwal Ch. 17, 19

 

50

Dec 13

Dec 15

Capita Selecta DM

Information Retrieval

JV

JS

tba.

tba.

51

Dec 20

Dec 22

Information Retrieval

Information Retrieval

JS

JV

 
52

Dec 27

Dec 29

yay, holiday, no lecture

yay, holiday, no lecture

  
1

Jan 3

Jan 5

Information Retrieval

Information Retrieval

JS

JS

 
.........  
5

Jan 31

Feb 2

Information Retrieval

Information Retrieval

JS

JS

 

Tutorial Schedule

WeekDateTopicSample ExercicesRequired Reading

42

Oct 16/17

no tutorial session

 

 

43

Oct 23/24

Sampling, Pre-Processing, PCA

Solution 

Aggarwal Ch. 2, 12

44

Oct 30/31

yay, holiday, no tutorial

 

 

45

Nov 6/7

Probabilities and Statistics

Solution

Wasserman Ch. 1-7, 9, 10

46

Nov 13/14

Pattern Mining

Solution

Aggarwal Ch 4, 5.2

47

Nov 20/21

Clustering

Click me!

Aggarwal Ch. 6, 7

48

Nov 27/28

Classification & Outliers

 

Aggarwal Ch. 8, 9, 10

49

Dec 4/5

Sequences

 

Aggarwal Ch. 3.4, 14, 15

50

Dec 11/12

Graphs

 

Aggarwal Ch. 17, 19

51

Dec 18/19

Information Retrieval

 

tba.

52

Dec 25/26

yay, holiday, no tutorial

  

1

Jan 1/2

yay, holiday, no tutorial

 

 
.........  

Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

Tutorials and Excercises

During the tutorial sessions, you will work on excercises that cover the topics of the lectures. At the start of the tutorial session, you will receive the excercise sheet, which you solve during the session. During the tutorial, the tutors are there to help and clarify. At the end of the tutorial session you hand in your solutions. These will be graded, and handed back to you the next tutorial session.

To do the exercises within the alloted time, you will have to have studied the required reading material, the slides, and practice the sample exercises before the tutorial.

To be eligible to participate in the exam, you will need to obtain at least 50% of the excercise points for each three parts of the course.

We do not allow plagiarism. The first time you are caught, you will receive 0 points for the full sheet. The second time, you are excluded from the course. 

Grading and Requirements for Passing the Course

The overall grade will be the best result of the end-term and the re-exam.

To participate in the final written exam, the following prerequisites are required:

  • Obtain 50% or more of the points of the exercise sheets on Foundations and Statistics (exercise sheets 1 and 2)
  • Obtain 50% or more of the points of the exercise sheets on Data Mining (exercise sheets 3, 4, 5, 6, and 7)
  • Obtain 50% or more of the points of the exercise sheets on Information Retrieval (exercise sheets 8, 9, 10, 11, and 12)

This edition of IRDM, there will be no mid-term exams.

Literature

We will use the following primary textbooks.

For Probability and Statistics,

  • Larry Wasserman: All of Statistics, Springer, 2004

For Data Mining,

  • Charu Aggarwal: Data Mining - The Textbook, Springer, 2015

For Information Retrieval,

  • Chris Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval, Cambridge, 2008
  • ChengXiang Zhai, Sean Massung: Text Data Management and Analytics, Morgan Claypool, 2016

These and addditional references are available in the library: