
Google selected Rainer Gemulla, Martin Theobald, and Gerhard Weikum as winners of a Google Focused Research Award 2011 , and is generously supporting our research on knowledge harvesting.
Knowledge bases with entity-relationship-oriented facts are valuable assets for making sense of Internet content and for supporting applications like semantic search or text disambiguation. Projects on automatically building such knowledge bases from high-quality Web sources have successfully applied two different paradigms: targeted information extraction with domain-model seeds for high-precision output, and explorative information extraction in an unsupervised manner with high recall but lower precision. Neither of the two has paid attention to the upcoming need of maintaining a knowledge base with evolving content and the entire life-cycle of knowledge management.
This project aims to reconcile the two information-extraction paradigms, combining their strengths and overcoming their limitations. Targeted extraction should become able to discover new relation types, and explorative extraction should be strengthened by expressive consistency reasoning. The combined form of "universal" extraction should be scalable and robust. The project will give particular emphasis to tapping into the long tail of entities and their relationships, and to coping with the dynamic evolution of factual knowledge.