# Information Retrieval and Data Mining

## Basic Information

Type

• Core course, 9 ECTS credits

Lecturers

Coordinators

Time & Location

• Tuesday 14-16 in lecture hall 002 in building E1.3
• Thursday 14-16 in lecture hall 002 in buidling E1.3

The first lecture is on Tuesday, October 20.

Tutorial Groups

Group NameTimePlaceGroup NameTimePlace
Group ATuesday 16-18

Room 024

Group BTuesday 16-18

Room 021

Group C

Thursday 16-18Room 021Group DThursday 16-18Room 021
Group EFriday 14-16Room 021

All tutorial rooms are located in building E1.4.

Contact

## News

• Feb 8: Midterm 3 results and sample solutions are online. Check here and here, respectively.
• Feb 8: There will be NO lecture on Thursday 11 February.
• Please register for the Oral Exam.
• Jan 4: Tutorial groups C and D are merged. Meeting room is MPI 021.
• Dec 21: Final grades of midterm 2 are online. Check here.
• Dec 17: Midterm 2 solutions are online. Check here.
• Dec 15: Midterm 2 results are online. Check here.
• Dec 10: Ch5-2 updated. The definition of  epsilon-neighborhood is clarified.
• Dec7: Test1 solutions are online. Check here.
• Dec 5: Sample solutions for Homework 5 are online.
• Dec 3: There will be tutorial sessions in the week of Dec 15 - Dec 17. There will be no tutorial session in the week of Jan 5 - Jan 7.
• Nov 23: Sample solutions for Assignment 3 are online.
• Nov 18: Solutions for Hw2 are updated.
• Nov 13: Sample solutions for Homework 2 are online.
• Nov11: Homework 3, Question 3 is updated.
• Nov 11: Chapter 5.1 slides are updated.
• Nov 10: The tutorial place of Group A, Group B, and Group D are changed!
• Nov 10: Chapter 4 slides are updated.
• Nov 09: Sample solutions for Assignment 1 are online.
• You can discuss these problems with other students, but everybody must hand in their own answers. You can use computers etc. to perform the algebraic operations, but you must show the intermediate steps (and  "computer said so" is never a valid answer).  You can return either legibly hand-written or computer-typeset solutions personally to the lecture. Notice that the deadline is strict. Remember to write your name, tutorial group ID, and matriculation number to every answer sheet! If you want to discuss the solutions with the tutor, the tutorial meeting is the time to do that.
• The first assignments must be submitted in the class on 29 October.
• The lectures on Tuesday 14-16 will also be held in lecture hall 002 in building E1.3.
• Please register for the tutorial group that you prefer from the link below.
• Registration is closed.
• Keep in mind that this is only for  tutorial group registration. To register for the course, use the HISPOS system.

## Tentative Schedule and Lecture Slides

Week/DateSlidesLecturerNotes
• Oct 20: Motivation and Overview
• Oct 22: Data Quality and Data Reduction
JV & GWFirst assignment handed out
• Oct 27: Math 1 - Probability Theory
• Oct 29: Math 2 - Statistics
GWFirst assignment will be submitted.
• Nov 3: Patterns 1: Itemset Mining
• Nov 5: Patterns 2: Rule Mining
JVTutorials on first assignment
• Nov 10: Clusters: Representative-based and Probabilistic
• Nov 12: Clusters: Hierarchical, Density-based, Subspaces
JVTutorials on second assignment
• Nov 17: Labels: Classification
• Nov 19: 1st written test
JVTutorials on Patterns
• Nov 24: Sequences: Time Series
• Nov 26: Sequences: Discrete Sequences
JVTutorials on Clusters
• Dec 1: Graphs: Graph properties and Subgraph Patterns
• Dec 3: Graphs: Community Detection and Graph Clustering
JVTutorials on Classification
• Dec 8: Outliers: Anomaly Detection
• Dec 10: 2nd written test
JVTutorials on Sequences
• Dec 15: Capita Selecta Data Mininga (JV)
• Dec 17: Text Indexing and Compression (GW)

JV & GW

Tutorials on Graphs
Holiday break: Dec 21 - Jan 1

• Jan 5: Text Matching: Similarity Search
• Jan 7: Query Processing
GWNo tutorials
• Jan 12: Ranking 1: Probabilistic IR, Statistical Language Models
• Jan 14: Ranking 2: Latent Topic Models, Learning-to-Rank
GW Tutorials on text indexing and matching
• Jan 19: Graph Models for Link and Query-Click Analysis
• Jan 21: Graph Models for Link and Query-Click Analysis (continued); Information Extraction
GWTutorials on query processing.
• Jan 26: Information Extraction (continued)
• Jan 28: --- (no lecture)
GWTutorials on language models.
• Feb 2: Knowledge Harvesting
• Feb 4: 3rd test
GWTutorials on on web mining.
• Feb 9: Entity Search, Question Answering, and Outlook
• Wrap up
• Feb 11: --- (no lecture)
GW
• Feb 15-16: Final Exam
JV & GW

The dates are preliminary. Type of the exam is currently planned to be oral.

• March 14 (Tentative): Repetitions of oral exams
JV & GWRepetitions of oral exams are only for the students who fail oral exam on Feb 15/16.

## Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

## Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

## Grading and Requirements for Passing the Course

To pass the course and earn 9 credit points, the following is required:

•  Regular attendance of classes and tutor groups
•  Presentation of solutions in tutor groups
•  Passing 2 of 3 written tests (after each third of the semester)
•  Passing the final exam (at the end of the semester)

The overall grade will be determined by the performance in the final exam combined with your bonus points.

The following textbooks will be used:

on data mining:

• primary: Charu Aggarwal: Data Mining - The Textbook
• secondary: Mohamed Zaki and Wagner Meira: Data Mining and Analysis

on information retrieval:

• primary: Stefan Büttcher, Charles Clarke, Gordom Comarck: Information Retrieval
• secondary: Chris Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval

on probability and statistics:

• primary: Larry Wasserman: All of Statistics
• secondary: Arnold Allen: Probability, Statistics, and Queueing Theory

These and addditional references are available in the library: