# Information Retrieval and Data Mining

## Core course, 9 ECTS credits, winter semester 2015 – 2016

## Basic Information

Teaching Assistants

- Tuesday 14-16 in lecture hall 002 in building E1.3
- Thursday 14-16 in lecture hall 002 in buidling E1.3

The first lecture is on Tuesday, October 20.

Tutorial Groups

Group Name | Time | Place | Group Name | Time | Place |
---|---|---|---|---|---|

Group A | Tuesday 16-18 | Room 024 | Group B | Tuesday 16-18 | Room 021 |

Group C | Thursday 16-18 | Room 021 | Group D | Thursday 16-18 | Room 021 |

Group E | Friday 14-16 | Room 021 |

All tutorial rooms are located in building E1.4.

Contact

## News

- Feb 18: Please check your final exam results from here.
- Feb 11: Please see your final exam slot from here.
- Feb 8: Midterm 3 results and sample solutions are online. Check here and here, respectively.
- Feb 8: There will be NO lecture on Thursday 11 February.
- Please register for the Oral Exam.
- Jan 4: Tutorial groups C and D are merged. Meeting room is MPI 021.
- Dec 21: Final grades of midterm 2 are online. Check here.
- Dec 17: Midterm 2 solutions are online. Check here.
- Dec 15: Midterm 2 results are online. Check here.
- Dec 10: Ch5-2 updated. The definition of epsilon-neighborhood is clarified.
- Dec7: Test1 solutions are online. Check here.
- Dec 5: Sample solutions for Homework 5 are online.
- Dec 3: There will be tutorial sessions in the week of Dec 15 - Dec 17. There will be no tutorial session in the week of Jan 5 - Jan 7.
- Nov 23: Sample solutions for Assignment 3 are online.
- Nov 18: Solutions for Hw2 are updated.
- Nov 13: Sample solutions for Homework 2 are online.
- Nov11: Homework 3, Question 3 is updated.
- Nov 11: Chapter 5.1 slides are updated.
- Nov 10: The tutorial place of Group A, Group B, and Group D are changed!
- Nov 10: Chapter 4 slides are updated.
- Nov 09: Sample solutions for Assignment 1 are online.
- You can discuss these problems with other students, but everybody must hand in their own answers. You can use computers etc. to perform the algebraic operations, but you must show the intermediate steps (and "computer said so" is never a valid answer). You can return either legibly hand-written or computer-typeset solutions personally to the lecture. Notice that the deadline is strict. Remember to write your name, tutorial group ID, and matriculation number to every answer sheet! If you want to discuss the solutions with the tutor, the tutorial meeting is the time to do that.
- The first assignments must be submitted in the class on 29 October.
- The lectures on Tuesday 14-16 will also be held in lecture hall 002 in building E1.3.
- Please register for the tutorial group that you prefer from the link below.
- Registration is closed.
- Keep in mind that this is only for tutorial group registration. To register for the course, use the HISPOS system.

## Tentative Schedule and Lecture Slides

Week/Date | Slides | Lecturer | Notes |
---|---|---|---|

- Oct 20: Motivation and Overview
- Oct 22: Data Quality and Data Reduction
| JV & GW | First assignment handed out | |

- Oct 27: Math 1 - Probability Theory
- Oct 29: Math 2 - Statistics
| GW | First assignment will be submitted. | |

- Nov 3: Patterns 1: Itemset Mining
- Nov 5: Patterns 2: Rule Mining
| JV | Tutorials on first assignment | |

- Nov 10: Clusters: Representative-based and Probabilistic
- Nov 12: Clusters: Hierarchical, Density-based, Subspaces
| JV | Tutorials on second assignment | |

- Nov 17: Labels: Classification
- Nov 19: 1st written test
| JV | Tutorials on Patterns | |

- Nov 24: Sequences: Time Series
- Nov 26: Sequences: Discrete Sequences
| JV | Tutorials on Clusters | |

- Dec 1: Graphs: Graph properties and Subgraph Patterns
- Dec 3: Graphs: Community Detection and Graph Clustering
| JV | Tutorials on Classification | |

- Dec 8: Outliers: Anomaly Detection
- Dec 10: 2nd written test
| JV | Tutorials on Sequences | |

- Dec 15: Capita Selecta Data Mininga (JV)
- Dec 17: Text Indexing and Compression (GW)
| JV & GW | Tutorials on Graphs | |

Holiday break: Dec 21 - Jan 1 | |||

- Jan 5: Text Matching: Similarity Search
- Jan 7: Query Processing
| GW | No tutorials | |

- Jan 12: Ranking 1: Probabilistic IR, Statistical Language Models
- Jan 14: Ranking 2: Latent Topic Models, Learning-to-Rank
| GW | Tutorials on text indexing and matching | |

- Jan 19: Graph Models for Link and Query-Click Analysis
- Jan 21: Graph Models for Link and Query-Click Analysis (continued); Information Extraction
| GW | Tutorials on query processing. | |

- Jan 26: Information Extraction (continued)
- Jan 28: --- (no lecture)
| GW | Tutorials on language models. | |

- Feb 2: Knowledge Harvesting
- Feb 4: 3rd test
| GW | Tutorials on on web mining. | |

- Feb 9: Entity Search, Question Answering, and Outlook
- Wrap up
- Feb 11: --- (no lecture)
| GW | ||

- Feb 15-16: Final Exam
| JV & GW | The dates are preliminary. Type of the exam is currently planned to be oral. | |

- March 14 (Tentative): Repetitions of oral exams
| JV & GW | Repetitions of oral exams are only for the students who fail oral exam on Feb 15/16. |

## Homework Assignments

- Homework 1. Sample Solutions.
- Homework 2. Sample Solutions.
- Homework 3. Sample Solutions.
- Homework 4. Sample Solutions.
- Homework 5. Sample Solutions.
- Homework 6. Sample Solutions.
- Homework 7. Sample Solutions.
- Homework 8. Sample Solutions.
- Homework 9. Sample Solutions.
- Homework 10. Sample Solutions.
- Homework 11. Sample Solutions.

## Information on Exams

- Test1 solutions are online. Check here.
- Test2 solutions are online. Check here.
- Test3 solutions are online. Check here.

## Course Contents

Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.

## Prerequisites

Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.

## Grading and Requirements for Passing the Course

To pass the course and earn 9 credit points, the following is required:

- Regular attendance of classes and tutor groups
- Presentation of solutions in tutor groups
- Passing 2 of 3 written tests (after each third of the semester)
- Passing the final exam (at the end of the semester)

The overall grade will be determined by the performance in the final exam combined with your bonus points.

## Suggested Reading

The following textbooks will be used:

on data mining:

- primary: Charu Aggarwal: Data Mining - The Textbook
- secondary: Mohamed Zaki and Wagner Meira: Data Mining and Analysis

on information retrieval:

- primary: Stefan Büttcher, Charles Clarke, Gordom Comarck: Information Retrieval
- secondary: Chris Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval

on probability and statistics:

- primary: Larry Wasserman: All of Statistics
- secondary: Arnold Allen: Probability, Statistics, and Queueing Theory

These and addditional references are available in the library: