YAGO-NAGA

Harvesting, Searching, and Ranking Knowledge from the Web

The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processible representation.

We have already harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. The resulting knowledge base, coined YAGO, has very high precision and is freely available. The facts are represented as RDF triples, and we have developed methods and prototype systems for querying, ranking, and exploring knowledge. Our search engine NAGA provides ranked answers to queries based on statistical models.

Several interlinked sub-projects are growing on the YAGO-NAGA basis. Our vision is a confluence of Semantic Web (Ontologies), Social Web (Web 2.0), and Statistical Web (Information Extraction) assets towards a comprehensive repository of human knowledge. Our methodologies combine concepts, models, and algorithms from several fields, including database systems, information retrieval, statistical learning, and logical reasoning.


  • AIDA

    AIDA is a method for disambiguating mentions of named entities in text.

    more
  • AMIE

    AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases. This project is developed jointly with the DBWeb team of Télécom ParisTech.

    more
  • ANGIE

    ANGIE is an active knowledge system for interactive exploration.

    more
  • BriQ
  • ClausIE
  • DEANNA

    DEANNA is a framework for natural language question answering over structured knowledge bases.

    more
  • diaNED

    Time-Aware Named Entity Disambiguation for Diachronic Corpora

    more
  • Equity

    Equity is an end-to-end system for canonicalizing mentions of entities, classes, concepts and quantities in ad-hoc tables and their surrounding contexts.

    more
  • Espresso

    Computation of semantically meaningful substructures from knowledge graphs.

    more
  • Fiction and Fantasy

    The long goal of this project is extracting interesting information, mainly related to characters in fictional stories, including personal information (e.g. name, birth/dead, title, etc.), interpersonal relationships (e.g. family relations, business relations, ally/enemy, etc.) and narratives (e.g. battles, who kills whom, etc.).

    more
  • EVIN

    EVIN (EVents In News) is a system that can extract named events from a news corpus, organizes them into ontological classes, and supports interactive exploration. EVIN exploits different kinds of similarities between news items referring to textual contents, entity occurrences, and temporal ordering, and captures these similarities in a multi-view attributed graph.

    more
  • HIGGINS

    HIGGINS project aims to combine Crowdsourcing with automated Information Extraction techniques to enable high-quality fact extraction from complex textual inputs.

    more
  • HYENA

    HYENA is a multi-label classifier for entity types based on hierarchical taxonomies derived from YAGO2.

    more
  • IBEX

    In IBEX, we study the prevalence of unique entity identifiers on the Web. These are, e.g., ISBNs (books), GTINs (commercial products), DOIs (documents), email addresses, and others. We show how these identifiers can be harvested systematically from Web pages. The end result is a database of millions of uniquely identified entities of different types, with an accuracy of 73-96% and a very high coverage.

    more
  • Javatools

    The Javatools are a suite of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They are used in the YAGO-NAGA project and available for download as well.

    more
  • K2

    Gathering and ranking photos of named entities with high precision, high recall, and diversity.

    more
  • Know2Look

    Know2look is an image retrieval framework that uses Commonsense Knowledge to bridge the semantic gap between the query keywords, textual descriptions and the visual content of the images.

    more
  • Le Monde

    Mining History with Le Monde. This project is developed jointly with the DBWeb team of Télécom ParisTech.

    more
  • LEILA

    LEILA is a system that extracts facts from Web sources by linguistic analysis.

    more
  • NAGA

    NAGA is a new semantic search engine supporting keyword search for the casual user as well as graph queries with regular expressions for the expert user.

    more
  • PATTY

    PATTY is a large collection of relations, arranged by synonyms and into subsumptions.

    more
  • PRAVDA

    PRAVDA is a system based on label propagation for knowledge harvesting especially temporal knowledge.

    more
  • PROSPERA

    Large-scale information extraction, a continuation of the SOFIE approach.

    more
  • Quantity Search

    Searching for entities with quantity constraints over web content.

    more
  • RDF-3X

    RDF-3X is an RDF storage and retrieval system that achieves excellent performance by following a RISC-style design philosophy.

    more
  • RuLES

    Rule Learning with Embedding Support

    more
  • SOFIE

    SOFIE extracts information from Web sources.

    more
  • STICS

    Still searching with keywords? "STICS: Searching with Strings, Things, and Cats" is a news search engine based on AIDA technology, which enables search for entities and categories!

    more
  • TimeSEA
  • UWN

    UWN is a multilingual version of WordNet, describing meanings of words in different languages and their relationships.

    more
  • Watermarking

    Watermarking and Provenance for Ontologies. This project is developed jointly with the DBWeb team of Télécom ParisTech.

    more
  • YAGO

    YAGO is a huge semantic knowledge base, derived from Wikipedia, WordNet, and GeoNames. YAGO knows almost 10 million entities (e.g. persons, organizations, cities), and 120 million facts about these entities. Unlike other automatically assembled knowledge bases, YAGO has a manually confirmed accuracy of 95%. YAGO is freely available at yago-knowledge.org.

    more

Selected Publications