Information extraction

Advanced lecture, 6 ECTS credits, winter semester 2019–20

Basic Information

This advanced lecture focuses on how to construct knowledge bases using information extraction techniques. Topics will be automated information extraction using patterns, supervised extractors and open information extraction, infobox crawling, entity disambiguation and normalization, learning over knowledge bases, and their use in question answering. We will also touch upon crowdsourced KB construction, evaluation measures, and some state-of-the-art knowledge bases. In the labs, participants will implement step-by-step cor components of information extraction, using Wikipedia and Wikis from the Wikia fan community site as source.

Tentative schedule

 Tentative dateLectureLab 
115.10.IntroductionDataset familiarization 
222.10.Knowledge representationDomain modelling 
329.10.Crawling and ScrapingInfobox scraping 
45.11.NER, typing and taxonomy inductionEntity typing from Wikipedia first sentence 
512.11.DisambiguationDisambiguation 
619.11.Fact extractionPattern-fact duality exploration 
726.11.OpenIE and evaluationOpenIE coding 
83.12.Rule MiningExhaustive short rule evaluation, crowdsourcing 
910.12.ApplicationsExam preparation