Fabian M. Suchanek
Ontology Matching with PARIS
In the last decade, the World Wide Web has been extraordinarily successful: For many questions of everyday life, the Web has an answer ready. However, computers cannot “read” or “understand” webpages; they can only display them to us. Therefore, if we are searching for information, we must read through the appropriate webpages ourselves. For example, to search for a French singer’s concerts in Germany, one must first find a list of French singers and then study their tour plans. Parallel to the World Wide Web, in recent years the so-called Semantic Web has been developed. It is the computer- readable counterpart to the World Wide Web. There, information is stored in a form that computers can work with directly. Thus, if a computer, at the request of its user, searches for concerts by a French singer in Germany, the computer can first find a list of French singers in the Semantic Web, and then – elsewhere in the Semantic Web – find the singer’s concerts in Germany in cities near the user. Development of the Semantic Web has been underway for a few years.
In the Semantic Web, information is stored in so-called ontologies. An ontology is a directed graph in which the nodes are entities (for example the singer Alizée or the country France), and their edges are relationships (for example the relation “is a citizen of” between Alizée and France). Furthermore, the entities are grouped in so-called classes. Alizée for instance, is in the class of singers, and France is in the class of European countries. In the Semantic Web there are hundreds of such ontologies. Each one contains entities, classes, and relation- ships. Altogether, they have multiple milliards of entities and multiple milliards of relationships between them. There are ontologies for musicians, geographic terms, books, medical themes, and (through the state support of the USA and Great Britain) also for public themes like schools, finance, and transport.
Since individual ontologies exist independently of one another, anyone can contribute ontologies to the Seman- tic Web. The diffi culty lies in how the information in different ontologies can be linked together. If, for instance, one ontology knows that Alizée is a singer (and not, say, a mandolin ensemble), and another ontology contains the concerts of a certain “A. Jacotey”, then the entity Alizée in the one ontology should be matched to the entity “A. Jacotey” in the other. Otherwise, computers would not be able to integrate them properly in the search for the singer’s concerts. This matching process is anything but simple: Even if both ontologies have only 20 entities, there could be be two trillion potential matches between the ontologies. Real ontologies contain multiple millions of entities. Also, it is not necessarily the case that each entity in one ontology is in the other ontology. Furthermore, not only must the entities be matched, but also the names of the relationships. The relation “is a citizen of” in one ontology could be called “has the nationality of” in another ontology. The situation is similar for the names of classes. Besides, the connections between classes are asymmetric: one ontology could have the special class “French singers”, while the another might have only the generic “singers”.
Up to now, approaches for aligning ontologies have mainly considered only the matching of classes and relations, or the matching of entities. Our idea was that both of these strategies could complement each other: If we know that “is a citizen of” means the same thing as “has the nationality of”, that could help us to connect “Alizée” to “A. Jacotey”. If we have connected these two, then it is simpler to map the classes “singer” and “French singer” to each other. This in turn simplifies the linking of other singers, whereby further relationships can be inferred. Since these links are never certain, but rather always only incremental, we have developed a probabilistic model for this process, in which the probabilities of links depend on one another.
We have begun this project together with the research center INRIA Saclay, which is based near Paris. Our approach is fittingly named “PARIS – Probabilistic Alignment of Relations, Instances, and Schema”. We have implemented this approach and optimized the system so that it can also handle large ontologies. For example, our system can align our YAGO ontology to the DBpedia ontology in a few hours. Both contain multiple millions of entities and facts. Because PARIS takes skillful advantage of the interplay between relationships and entities, it reaches a precision of over 90 % in the alignment of the ontologies.
Through this system we have been able to make a contribution to aligning the ontologies in the Semantic Web to one another. In this way the ontologies can complement each other, and com- puters searching for information can move from one ontology to another – just as hyperlinks allow us to move from one website to another. Thus, the Se- mantic Web is becoming more and more of a truly connected “Web”.