Information Retrieval and Data Mining

Core course, 9 ECTS credits, winter semester 2015 – 2016

Basic Information

Type

  • Core course, 9 ECTS credits

Lecturers 

Coordinators

 Time & Location

  • Tuesday 14-16 in lecture hall 002 in building E1.3
  • Thursday 14-16 in lecture hall 002 in buidling E1.3

The first lecture is on Tuesday, October 20.

Tutorial Groups

Group NameTimePlaceGroup NameTimePlace
Group ATuesday 16-18

Room 024

Group BTuesday 16-18

Room 021

Group C

Thursday 16-18Room 021Group DThursday 16-18Room 021
Group EFriday 14-16Room 021

All tutorial rooms are located in building E1.4.

 

Contact

Tentative Schedule and Lecture Slides

Week/DateSlidesLecturerNotes
  • Oct 20: Motivation and Overview
  • Oct 22: Data Quality and Data Reduction
JV & GWFirst assignment handed out
  • Oct 27: Math 1 - Probability Theory 
  • Oct 29: Math 2 - Statistics 
GWFirst assignment will be submitted.
  • Nov 3: Patterns 1: Itemset Mining
  • Nov 5: Patterns 2: Rule Mining
JVTutorials on first assignment
  • Nov 10: Clusters: Representative-based and Probabilistic
  • Nov 12: Clusters: Hierarchical, Density-based, Subspaces
JVTutorials on second assignment
  • Nov 17: Labels: Classification
  • Nov 19: 1st written test
JVTutorials on Patterns
  • Nov 24: Sequences: Time Series
  • Nov 26: Sequences: Discrete Sequences
JVTutorials on Clusters
  • Dec 1: Graphs: Graph properties and Subgraph Patterns
  • Dec 3: Graphs: Community Detection and Graph Clustering
JVTutorials on Classification
  • Dec 8: Outliers: Anomaly Detection
  • Dec 10: 2nd written test
JVTutorials on Sequences
  • Dec 15: Capita Selecta Data Mininga (JV)
  • Dec 17: Text Indexing and Compression (GW)

JV & GW

Tutorials on Graphs
Holiday break: Dec 21 - Jan 1

  • Jan 5: Text Matching: Similarity Search
  • Jan 7: Query Processing
GWNo tutorials
  • Jan 12: Ranking 1: Probabilistic IR, Statistical Language Models
  • Jan 14: Ranking 2: Latent Topic Models, Learning-to-Rank
GW Tutorials on text indexing and matching
  • Jan 19: Graph Models for Link and Query-Click Analysis
  • Jan 21: Graph Models for Link and Query-Click Analysis (continued); Information Extraction
GWTutorials on query processing.
  • Jan 26: Information Extraction (continued)
  • Jan 28: --- (no lecture)
GWTutorials on language models. 
  • Feb 2: Knowledge Harvesting
  • Feb 4: 3rd test
GWTutorials on on web mining.
  • Feb 9: Entity Search, Question Answering, and Outlook
  • Wrap up
  • Feb 11: --- (no lecture)
GW
  • Feb 15-16: Final Exam
JV & GW

The dates are preliminary. Type of the exam is currently planned to be oral.

  • March 14 (Tentative): Repetitions of oral exams 
JV & GWRepetitions of oral exams are only for the students who fail oral exam on Feb 15/16.

Information on Exams

Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

Grading and Requirements for Passing the Course

To pass the course and earn 9 credit points, the following is required:

  •  Regular attendance of classes and tutor groups
  •  Presentation of solutions in tutor groups 
  •  Passing 2 of 3 written tests (after each third of the semester)
  •  Passing the final exam (at the end of the semester)

The overall grade will be determined by the performance in the final exam combined with your bonus points.

Suggested Reading

The following textbooks will be used:

on data mining:

  • primary: Charu Aggarwal: Data Mining - The Textbook 
  • secondary: Mohamed Zaki and Wagner Meira: Data Mining and Analysis 

on information retrieval:

  • primary: Stefan Büttcher, Charles Clarke, Gordom Comarck: Information Retrieval 
  • secondary: Chris Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval 

on probability and statistics:

  • primary: Larry Wasserman: All of Statistics 
  • secondary: Arnold Allen: Probability, Statistics, and Queueing Theory 

These and addditional references are available in the library: