D5
Databases and Information Systems

Automated Knowledge Base Construction

  • Type: Advanced lecture, in-person
  • Credits: 6 ECTS
  • Date: Summer term 2022
  • Time: Wednesday 12:15-13:45 (lecture), 16:15-17:45 (tutorial)
  • Teacher: Simon Razniewski (lecture), Hiba Arnaout, Shrestha Ghosh, Tuan-Phong Nguyen, Sneha Singhania (tutorials)
  • Schedule: weekly lecture+lab
  • Exam admission requirement: Passing 6/8 assignments
  • Grading: Oral exam (covering 67% lectures, 33% assignments)
  • Prerequisites: Basics of data management and algorithms (e.g., via Databases I and Algorithms and data structures lecture), basic programming experience (assignments require intermediate Python coding)
  • Location lecture: In E1 5 002 (MPI-SWS ground floor), except 11.5., 1.6. and 15.6., which are in E1 5 029
  • Location lab: E1 5 029 (MPI-SWS ground floor)
  • Registration/communication: Announcements and questions should be posted on this mailing list, if you (plan to) take the course, please sign up. Exam registration in LSF required until 4.7.

This advanced lecture focuses on how to construct knowledge bases using information extraction techniques. Topics will be automated information extraction using patterns, supervised extractors and open information extraction, infobox crawling, entity disambiguation and normalization, learning over knowledge bases, and their use in question answering. We will also touch upon crowdsourced KB construction, evaluation measures, and state-of-the-art knowledge bases. In the tutorials, participants will implement selected topics. There are 8 homework assignments of which 6 need to be passed to be admitted to the exam. 

Schedule

DateLectureTutorial (tutor)
27.4.1. Introduction (pdf)Data familiarization (Sneha) [pdf, data, solution]
4.5.2. Crawling and Scraping (pdf)Scraping (Phong) [pdf, Lab02.py, test_Lab02_public.py]
11.5.3. Entity typing (pdf)Typing from first WP sentence (Hiba) [pdf, data, new_data]
18.5.4. Taxonomy induction + entity disambiguation (pdf)Taxonomy induction (Hiba) [pdf, data]
25.5.5. Relation extractionRelation extraction (Shrestha) [pdf, lab05.zip]
1.6.6. Relation extraction IIOpen information extraction (Shrestha)
8.6.7. Commonsense knowledgeCommonsense (Phong)
15.6.8. Language models and knowledge basesKBC from LMs (Sneha)
22.6.9. ApplicationsExam preparation (Simon)
29.6.10. TBD / Backup slotTBD / Backup slot
11.7.+12.7.Oral exam (register till 4.7. in LSF)-
12.9.Re-exam (register till 5.9. in LSF)-

    Assignments

    • There are 8 weekly assignments (weeks 1-8)
    • Each assignment submission receives a binary pass/fail score
    • To be admitted to take the final exam, at least 6 assignments have to be passed.
    • Weekly timeline:
      • Assignments are posted on Wednesday morning
      • The tutorial on the same day is intended to get started on the assignments
      • Assignments are due Monday the week after, at 23:59
      • Assessments are available Wednesday morning
    • Assignment results (link)
    • Note: This course has no tolerance for plagiarism. Each instance leads to deregistration from the course, and is reported to the dean of studies.

    Collaboration policy

    • All of the content you submit, both code and text, needs to be produced independently. Your work must be in your own words and based on your understanding of the solution.
    • Do not share code or written materials. Do not look at others' code. You may discuss problems and the project with others, and we encourage it, to help understand the material, but do not share written solutions.
    • If you find, incorporate, or build off of existing material, for example on the web or from a textbook, you must cite the source.

    Literature

    This lecture is based on the survey "Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases", FnT 2021 (pdf). Further references will be given in the respective lectures.