Information extraction

Advanced lecture, 6 ECTS credits, winter semester 2019–20

Basic Information

  • Type: Advanced lecture
  • Teacher: Simon Razniewski (lecturer), Cuong Xuan Chu (lab)
  • Time: Tuesday 10:00-12:00 (lecture), Tuesday 16:00-18:00 (lab)
  • Place: E1 4 room 024 (lecture), room 021 (lab)
  • Credits: 6 ECTS credits
  • Mailing list for discussion and announcements: https://groups.google.com/d/forum/ie1920
  • Please sign up for the mailing list if you plan to take the course (link now fixed as of October 4)

This advanced lecture focuses on how to construct knowledge bases using information extraction techniques. Topics will be automated information extraction using patterns, supervised extractors and open information extraction, infobox crawling, entity disambiguation and normalization, learning over knowledge bases, and their use in question answering. We will also touch upon crowdsourced KB construction, evaluation measures, and some state-of-the-art knowledge bases. In the labs, participants will implement step-by-step cor components of information extraction, using Wikipedia and Wikis from the Wikia fan community site as source.

Tentative schedule

 Tentative dateLectureLab 
115.10.Introduction (pptx)Dataset familiarization (pdf) 
222.10.Knowledge representationDomain modelling 
329.10.Crawling and ScrapingInfobox scraping 
412.11.*NER, typing and taxonomy inductionEntity typing from Wikipedia first sentence 
519.11.DisambiguationDisambiguation 
626.11.Fact extractionPattern-fact duality exploration 
73.12.OpenIE and evaluationOpenIE coding 
810.12.Rule MiningExhaustive short rule evaluation, crowdsourcing 
917.12.ApplicationsExam preparation 
 (7.1.2020)(Backup slot)  
 14.+15.1.2020Oral exam  

* Note: No lecture/lab on 5.11.

Rules and Grading

Assignments

  • There will be 8 weekly assignments
  • Each assignment submission receives a binary pass/fail score
  • To be admitted to take the final exam, at least 6 assignments have to be passed.
  • Weekly timeline:
    • Assignments are posted on Tuesday morning
    • The lab on Tuesday afternoon is intended to get started on the assignments
    • Assignments are due Saturday in the same week, at 23:59
    • Assessments are available Tuesday morning

Exam

  • Oral exam
  • Covering the topics of lecture and assignments

Further reading

Industry relevance (lecture 1):

  • Industry-Scale Knowledge Graphs: Lessons and Challenges, Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, Jamie Taylor, CACM, 2019 (link)

Knowledge representation (lecture 2):

  • Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases, Fabian M. Suchanek, Jonathan Lajus, Armand Boschin, Gerhard Weikum, RW, 2019 (link)