Automated Knowledge Base Construction

This advanced lecture focuses on how to construct knowledge bases using information extraction techniques. Topics will be automated information extraction using patterns, supervised extractors and open information extraction, infobox crawling, entity disambiguation and normalization, learning over knowledge bases, and their use in question answering. We will also touch upon crowdsourced KB construction, evaluation measures, and state-of-the-art knowledge bases. In the tutorials, participants will implement selected topics. There are 8 homework assignments of which 6 need to be passed to be admitted to the exam.

Overview

Type	Advanced lecture, in-person
Credits	6 ECTS
Date	Summer term 2022
Time	Wednesday 12:15-13:45 (lecture), 16:15-17:45 (tutorial)
Teacher	Simon Razniewski (lecture), Hiba Arnaout, Shrestha Ghosh, Tuan-Phong Nguyen, Sneha Singhania (tutorials)
Schedule	weekly lecture+lab
Exam admission requirement	Passing 6/8 assignments
Grading	Oral exam (covering 67% lectures, 33% assignments)
Prerequisites	Basics of data management and algorithms (e.g., via Databases I and Algorithms and data structures lecture), basic programming experience (assignments require intermediate Python coding)
Location lecture	In E1 5 002 (MPI-SWS ground floor), except 11.5., 1.6. and 15.6., which are in E1 5 029
Location lab	E1 5 029 (MPI-SWS ground floor)
Registration/communication	Announcements and questions should be posted on this mailing list, if you (plan to) take the course, please sign up. Exam registration in LSF required until 4.7.

Schedule

Date	Lecture	Tutorial (tutor)
27.4.	1. Introduction (pdf \| ppt)	Data familiarization (Sneha) [pdf, data, solution]
4.5.	2. Crawling and Scraping (pdf \| ppt)	Scraping (Phong) [pdf, Lab02.py, test_Lab02_public.py, Lab02_solutions.py]
11.5.	3. Entity typing (pdf \| ppt)	Typing from first WP sentence (Hiba) [pdf, data, new_data]
18.5.	4. Taxonomy induction + entity disambiguation (pdf \| ppt)	Taxonomy induction (Hiba) [pdf, data, your plots, solutions]
25.5.	5. Relation extraction (pdf \| ppt)	Relation extraction (Shrestha) [pdf, lab05.zip, solutions]
1.6.	6. Relation extraction II (continuation of previous)	Open information extraction (Shrestha) [pdf, lab06.zip]
8.6.	7. Commonsense knowledge (pdf \| ppt)	Commonsense knowledge (Phong) [pdf, sample_solutions]
15.6.	8. Language models and knowledge bases (pdf \| ppt)	KBC from LMs (Sneha) [pdf]
22.6.	9. Applications (pdf \| ppt)	Exam preparation (Simon)
29.6.	10. TBD / Backup slot	TBD / Backup slot
11.7.+12.7.	Oral exam (register till 4.7. in LSF)	-
12.9.	Re-exam (register till 5.9. in LSF)	-

Assignments

There are 8 weekly assignments (weeks 1-8)
Each assignment submission receives a binary pass/fail score
To be admitted to take the final exam, at least 6 assignments have to be passed.
Weekly timeline:
- Assignments are posted on Wednesday morning
- The tutorial on the same day is intended to get started on the assignments
- Assignments are due Monday the week after, at 23:59
- Assessments are available Wednesday morning
Assignment results (link)
Note: This course has no tolerance for plagiarism. Each instance leads to deregistration from the course, and is reported to the dean of studies.

Collaboration policy

All of the content you submit, both code and text, needs to be produced independently. Your work must be in your own words and based on your understanding of the solution.
Do not share code or written materials. Do not look at others' code. You may discuss problems and the project with others, and we encourage it, to help understand the material, but do not share written solutions.
If you find, incorporate, or build off of existing material, for example on the web or from a textbook, you must cite the source.

Literature

This lecture is based on the survey "Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases", FnT 2021 (pdf). Further references will be given in the respective lectures.