What is YAGO?
YAGO is an ontology, i.e., a database with knowledge about the real world. YAGO contains both entities (such as movies, people, cities, countries, etc.) and facts about these entities (who played in which movie, which city is located in which country, etc.). All in all, YAGO contains 10 million entities and 120 million facts.
What is so special about YAGO?
YAGO is special in several ways:
- The accuracy of YAGO has been manually evaluated, proving a confirmed accuracy of 95%. Every relation is annotated with its confidence value.
- YAGO is an ontology that is anchored in time and space. YAGO attaches a temporal dimension and a spacial dimension to many of its facts and entities.
- In addition to a taxonomy, YAGO has thematic domains such as "music" or "science".
What is new in YAGO2s?
While preserving the quality and accuracy of its predecessor YAGO2, YAGO2s improves over it in several ways:
- YAGO2s is stored natively in Turtle, making it completely RDF/OWL compliant while still maintaining the fact identifiers that are unique to YAGO.
- The new YAGO2s architecture enables cooperation of several contributors, facilitates debugging and maintenance. The data is divided into themes, so that users can download only particular pieces of YAGO ("YAGO a la carte").
- YAGO2s contains thematic domains such as "music" or "science", which gives a topic structure to YAGO.
What is new in YAGO3?
YAGO3 taps into multilingual resources of Wikipedia, getting to know more local entities and facts. In the current version, it has been extracted from 10 different Wikipedia versions (English, German, French, Dutch, Italian, Spanish, Polish, Romanian, Persian, and Arabic).
- YAGO3 contains canonical representations of entities appearing in different Wikipedia language editions.
- YAGO3 integrates all non-English entities into the rich type taxonomy of YAGO.
- YAGO3 provides a mapping between non-English infobox attributes and YAGO relations.
Go to the demo page of the new YAGO3!
How is the taxonomy of YAGO structured?
YAGO classifies each entity into a taxonomy of classes. Every entity is an instance of one or multiple classes. Every class (except the root class) is a subclass of one or multiple classes. This yields a hierarchy of classes — the taxonomy. The YAGO taxonomy is the backbone of the ontology, and is designed with much care and attention to correctness.
For those interested in the details of that taxonomy, we provide here a more in-depth explanation of the classes. The taxonomy consists of 4 layers:
- The root node of the taxonomy is rdfs:Resource. It includes entities, but also properties, literals, etc. rdfs:Resource has a subclass owl:Thing, which is the class of things (entities).
- Under owl:Thing, there is the class taxonomy from WordNet. Each class name is of the form <wordnet_XXX_YYY>, where XXX is the name of the concept (e.g., singer), and YYY is the WordNet 3.0 synset id of the concept (e.g., 110599806). For example, the class of singers is <wordnet_singer_110599806>. Each class is connected to its more general class by the rdfs:subclassOf relationship.
- The middle layer of the taxonomy consists of classes that have been derived from Wikipedia categories. For example, one class is <wikicategory_American_rock_singers>, derived from the Wikipedia category American rock singers. Each of these classes is connected to one class of the WordNet layer by a rdfs:subclassOf relationship. In the example, <wikicategory_American_rock_singers> rdfs:subclassOf <wordnet_singer_110599806>. Not all Wikipedia categories become classes in YAGO.
- The lowest layer of the taxonomy is the layer of instances. Instances comprise individual entities such as rivers, people, or movies. For example, this layer contains <Elvis_Presley>. Each instance is connected to one or multiple classes of the higher layers by the relationship rdf:type. In the example: <Elvis_Presley> rdf:type <wikicategory_American_rock_singers>.
This way, you can walk from the instance up to its class by rdf:type, and then further up by rdfs:subclassOf.
Does YAGO have thematic domains?
YAGO provides a class hierarchy in the sense of RDF: Every subclass represents a set of instances that is a subset of the set of instances of the super class. For example, Elvis Presley is in the class of singers (because Elvis is a singer). This class is a subclass of the class of persons, because every singer is a person. This is different from a thematic domain hierarchy! A thematic domain hierarchy contains items such as "Football", "Sports", "Music" etc. In such a hierarchy, Elvis is in the domain "Music".
The new YAGO2s now contains a theme with WordNet Domains, which gives such a thematic domain structure to YAGO.
What is the data format of YAGO3?
The YAGO knowledge base is a set of independent modular full-text files. These files are in the N3 Turtle format, ending in *.ttl. See http://www.w3.org/TeamSubmission/turtle/ for details on this format.
YAGO extends the Turtle format to the "N4 format". In this format, every triple can have an identifier, the fact identifier. The fact identifier is specified as a comment in the line before the triple. As a result, all N4 files are fully backwards compatible with standard Turtle and N3. The fact identifier can appear as a subject in other triples. This is used to annotate YAGO facts with time and space.
All identifiers in YAGO are standard Turtle identifiers. There are a number of prefixes predefined, such as rdf, rdfs, owl, etc. The base is set to the namespace of YAGO, yago-knowledge.org/resource/
YAGO defines its own datatypes, which extend the standard datatypes. Here are examples for identifiers:
- Entities are written in <> : <Elvis_Presley>
- Strings are written in double quotes, with optional language tags: "Elvis", "Elvis"@en
- Literals are written in double quotes with a datatype: "1977-08-16"^^xsd:date, "70"^<m>
(<m> is the YAGO literal datatype "meter", which is a subclass of "quantity")
How do labels work in YAGO?
In line with RDF, YAGO distinguishes between the entity (Elvis_Presley) and names for that entity ("Elvis", "The King", "Mr. Presley", etc.). The reason for this distinction is that one entity can have multiple names. Also, one name can mean multiple entities. Consider, e.g., the name "The King", which is highly ambiguous. YAGO links an entity to its name by the relationship rdfs:label. For example, YAGO contains the fact <Elvis_Presley> rdfs:label "Elvis". In addition, YAGO knows, for each entity, its preferred name. This name is designated by the relationship skos:prefLabel. For example, <Elvis_Presley> skos:prefLabel "Elvis Presley". Even if Elvis has multiple names, his standard name is "Elvis Presley". In addition, YAGO contains for each name its preferred meaning. This meaning is designated by <isPreferredMeaningOf>. In the example, <Elvis_Presley> <isPreferredMeaningOf> "Elvis". Even if the word "Elvis" can refer to multiple entities, its default meaning is Elvis Presley.
How do meta facts work?
YAGO gives a fact identifier to each fact. For example, the fact <Elvis_Presley> rdf:type <person> could have the fact identifier <id_42>. In the native N4/TTL version of YAGO, the fact identifiers are given in a comment line before the actual fact. In the TSV version, they are simply an additional column.
YAGO contains facts about these fact identifiers. For example, YAGO contains
<id_42> <occursSince> "1935-01-08"
<id_42> <occursUntil "1977-08-16"
<id_42> <extractionSource> <http://en.wikipedia.org/Elvis_Presley>
These facts mean that Elvis was a person from the year 1935 to the year 1977, and that this fact was found in Wikipedia.
What is the difference between YAGO and DBpedia?
DBpedia is a community effort to extract structured information from Wikipedia. In this sense, both YAGO and DBpedia share the same goal of generating a structured ontology. The projects differ in their foci. In YAGO, the focus is on precision, the taxonomic structure, and the spatial and temporal dimension. For a detailed comparison of the projects, see Chapter 10.3 of our AI journal paper "YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia".
Where can I find more information about YAGO?
Please have a look at our AI journal paper "YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia"
How can I access YAGO?
There are several ways to access YAGO:
- Online in person on our Web Interface
- Online through the SPARQL interface
- Offline by downloading the TTL version of YAGO, and loading it into an RDF triple store (e.g., Jena)
- Offline by downloading the TSV version of YAGO, loading it into a database with the script provided at the bottom of the page, and using SQL
How does YAGO link to WordNet?
YAGO includes the WordNet class hierarchy. Each class from WordNet is named wordnet_XXX_YYY in YAGO, where XXX is the name of the concept (e.g., singer), and YYY is the id of the synset in WordNet (e.g., 110599806). In addition, the class is linked by the relation <hasSynsetId> to the id of its synset (in the theme yagoWordnetIds). The synset ids are the ids in the Prolog version of WordNet 3.0. If you remove the preceeding "1", you get the ids in the WordNet database files (WNdb-3.0.tar.gz). WordNet uses different, apparently unrelated, ids in its index files.