Information Retrieval and Data Mining
Core course, 9 ECTS credits, winter semester 2015 – 2016
Basic Information
Teaching Assistants
- Tuesday 14-16 in lecture hall 002 in building E1.3
- Thursday 14-16 in lecture hall 002 in buidling E1.3
The first lecture is on Tuesday, October 20.
Tutorial Groups
Group Name | Time | Place | Group Name | Time | Place |
---|---|---|---|---|---|
Group A | Tuesday 16-18 | Room 024 | Group B | Tuesday 16-18 | Room 021 |
Group C | Thursday 16-18 | Room 021 | Group D | Thursday 16-18 | Room 021 |
Group E | Friday 14-16 | Room 021 |
All tutorial rooms are located in building E1.4.
Contact
News
- Feb 18: Please check your final exam results from here.
- Feb 11: Please see your final exam slot from here.
- Feb 8: Midterm 3 results and sample solutions are online. Check here and here, respectively.
- Feb 8: There will be NO lecture on Thursday 11 February.
- Please register for the Oral Exam.
- Jan 4: Tutorial groups C and D are merged. Meeting room is MPI 021.
- Dec 21: Final grades of midterm 2 are online. Check here.
- Dec 17: Midterm 2 solutions are online. Check here.
- Dec 15: Midterm 2 results are online. Check here.
- Dec 10: Ch5-2 updated. The definition of epsilon-neighborhood is clarified.
- Dec7: Test1 solutions are online. Check here.
- Dec 5: Sample solutions for Homework 5 are online.
- Dec 3: There will be tutorial sessions in the week of Dec 15 - Dec 17. There will be no tutorial session in the week of Jan 5 - Jan 7.
- Nov 23: Sample solutions for Assignment 3 are online.
- Nov 18: Solutions for Hw2 are updated.
- Nov 13: Sample solutions for Homework 2 are online.
- Nov11: Homework 3, Question 3 is updated.
- Nov 11: Chapter 5.1 slides are updated.
- Nov 10: The tutorial place of Group A, Group B, and Group D are changed!
- Nov 10: Chapter 4 slides are updated.
- Nov 09: Sample solutions for Assignment 1 are online.
- You can discuss these problems with other students, but everybody must hand in their own answers. You can use computers etc. to perform the algebraic operations, but you must show the intermediate steps (and "computer said so" is never a valid answer). You can return either legibly hand-written or computer-typeset solutions personally to the lecture. Notice that the deadline is strict. Remember to write your name, tutorial group ID, and matriculation number to every answer sheet! If you want to discuss the solutions with the tutor, the tutorial meeting is the time to do that.
- The first assignments must be submitted in the class on 29 October.
- The lectures on Tuesday 14-16 will also be held in lecture hall 002 in building E1.3.
- Please register for the tutorial group that you prefer from the link below.
- Registration is closed.
- Keep in mind that this is only for tutorial group registration. To register for the course, use the HISPOS system.
Tentative Schedule and Lecture Slides
Week/Date | Slides | Lecturer | Notes |
---|---|---|---|
| JV & GW | First assignment handed out | |
| GW | First assignment will be submitted. | |
| JV | Tutorials on first assignment | |
| JV | Tutorials on second assignment | |
| JV | Tutorials on Patterns | |
| JV | Tutorials on Clusters | |
| JV | Tutorials on Classification | |
| JV | Tutorials on Sequences | |
| JV & GW | Tutorials on Graphs | |
Holiday break: Dec 21 - Jan 1 | |||
| GW | No tutorials | |
| GW | Tutorials on text indexing and matching | |
| GW | Tutorials on query processing. | |
| GW | Tutorials on language models. | |
| GW | Tutorials on on web mining. | |
| GW | ||
| JV & GW | The dates are preliminary. Type of the exam is currently planned to be oral. | |
| JV & GW | Repetitions of oral exams are only for the students who fail oral exam on Feb 15/16. |
Homework Assignments
- Homework 1. Sample Solutions.
- Homework 2. Sample Solutions.
- Homework 3. Sample Solutions.
- Homework 4. Sample Solutions.
- Homework 5. Sample Solutions.
- Homework 6. Sample Solutions.
- Homework 7. Sample Solutions.
- Homework 8. Sample Solutions.
- Homework 9. Sample Solutions.
- Homework 10. Sample Solutions.
- Homework 11. Sample Solutions.
Information on Exams
- Test1 solutions are online. Check here.
- Test2 solutions are online. Check here.
- Test3 solutions are online. Check here.
Course Contents
Information Retrieval (IR) and Data Mining (DM) are methodologies for organizing, searching and analyzing digital contents from the web, social media and enterprises as well as multivariate datasets in these contexts. IR models and algorithms include text indexing, query processing, search result ranking, and information extraction for semantic search. DM models and algorithms include pattern mining, rule mining, classification and recommendation. Both fields build on mathematical foundations from the areas of linear algebra, graph theory, and probability and statistics.
Prerequisites
Good knowledge of undergraduate mathematics (linear algebra, probability theory) and basic algorithms.
Grading and Requirements for Passing the Course
To pass the course and earn 9 credit points, the following is required:
- Regular attendance of classes and tutor groups
- Presentation of solutions in tutor groups
- Passing 2 of 3 written tests (after each third of the semester)
- Passing the final exam (at the end of the semester)
The overall grade will be determined by the performance in the final exam combined with your bonus points.
Suggested Reading
The following textbooks will be used:
on data mining:
- primary: Charu Aggarwal: Data Mining - The Textbook
- secondary: Mohamed Zaki and Wagner Meira: Data Mining and Analysis
on information retrieval:
- primary: Stefan Büttcher, Charles Clarke, Gordom Comarck: Information Retrieval
- secondary: Chris Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval
on probability and statistics:
- primary: Larry Wasserman: All of Statistics
- secondary: Arnold Allen: Probability, Statistics, and Queueing Theory
These and addditional references are available in the library: