Automated Knowledge Base Construction

This advanced lecture focuses on how to construct knowledge bases using information extraction techniques. Topics will be automated information extraction using patterns, supervised extractors and open information extraction, infobox crawling, entity disambiguation and normalization, learning over knowledge bases, and their use in question answering. We will also touch upon crowdsourced KB construction, evaluation measures, and state-of-the-art knowledge bases. In the tutorials, participants will implement selected topics. There are 8 homework assignments of which 6 need to be passed to be admitted to the exam. 


Overview

TypeAdvanced lecture, in-person
Credits6 ECTS
DateSummer term 2022
TimeWednesday 12:15-13:45 (lecture), 16:15-17:45 (tutorial)
TeacherSimon Razniewski (lecture), Hiba Arnaout, Shrestha Ghosh, Tuan-Phong Nguyen, Sneha Singhania (tutorials)
Scheduleweekly lecture+lab
Exam admission requirementPassing 6/8 assignments
GradingOral exam (covering 67% lectures, 33% assignments)
PrerequisitesBasics of data management and algorithms (e.g., via Databases I and Algorithms and data structures lecture), basic programming experience (assignments require intermediate Python coding)
Location lectureIn E1 5 002 (MPI-SWS ground floor), except 11.5., 1.6. and 15.6., which are in E1 5 029
Location labE1 5 029 (MPI-SWS ground floor)
Registration/communicationAnnouncements and questions should be posted on this mailing list, if you (plan to) take the course, please sign up. Exam registration in LSF required until 4.7.

Schedule

DateLectureTutorial (tutor)
27.4.1. Introduction (pdf | ppt)Data familiarization (Sneha) [pdf, data, solution]
4.5.2. Crawling and Scraping (pdf | ppt)Scraping (Phong) [pdf, Lab02.py, test_Lab02_public.py, Lab02_solutions.py]
11.5.3. Entity typing (pdf | ppt)Typing from first WP sentence (Hiba) [pdf, data, new_data]
18.5.4. Taxonomy induction + entity disambiguation (pdf | ppt)Taxonomy induction (Hiba) [pdf, data, your plots, solutions]
25.5.5. Relation extraction (pdf | ppt)Relation extraction (Shrestha) [pdf, lab05.zip, solutions]
1.6.6. Relation extraction II (continuation of previous)Open information extraction (Shrestha) [pdf, lab06.zip]
8.6.7. Commonsense knowledge (pdf | ppt)Commonsense knowledge (Phong) [pdf, sample_solutions]
15.6.8. Language models and knowledge bases (pdf | ppt)KBC from LMs (Sneha) [pdf]
22.6.9. Applications (pdf | ppt)Exam preparation (Simon)
29.6.10. TBD / Backup slotTBD / Backup slot
11.7.+12.7.Oral exam (register till 4.7. in LSF)-
12.9.Re-exam (register till 5.9. in LSF)-

    Assignments

    • There are 8 weekly assignments (weeks 1-8)
    • Each assignment submission receives a binary pass/fail score
    • To be admitted to take the final exam, at least 6 assignments have to be passed.
    • Weekly timeline:
      • Assignments are posted on Wednesday morning
      • The tutorial on the same day is intended to get started on the assignments
      • Assignments are due Monday the week after, at 23:59
      • Assessments are available Wednesday morning
    • Assignment results (link)
    • Note: This course has no tolerance for plagiarism. Each instance leads to deregistration from the course, and is reported to the dean of studies.

    Collaboration policy

    • All of the content you submit, both code and text, needs to be produced independently. Your work must be in your own words and based on your understanding of the solution.
    • Do not share code or written materials. Do not look at others' code. You may discuss problems and the project with others, and we encourage it, to help understand the material, but do not share written solutions.
    • If you find, incorporate, or build off of existing material, for example on the web or from a textbook, you must cite the source.

    Literature

    This lecture is based on the survey "Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases", FnT 2021 (pdf). Further references will be given in the respective lectures.