Harvesting, Searching, and Ranking Knowledge from the Web

The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processible representation.

We have already harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. The resulting knowledge base, coined YAGO, has very high precision and is freely available. The facts are represented as RDF triples, and we have developed methods and prototype systems for querying, ranking, and exploring knowledge. Our search engine NAGA provides ranked answers to queries based on statistical models.

Several interlinked sub-projects are growing on the YAGO-NAGA basis. Our vision is a confluence of Semantic Web (Ontologies), Social Web (Web 2.0), and Statistical Web (Information Extraction) assets towards a comprehensive repository of human knowledge. Our methodologies combine concepts, models, and algorithms from several fields, including database systems, information retrieval, statistical learning, and logical reasoning.

Selected Publications