UWN / MENTA: Towards a Universal Multilingual Wordnet

Overview

UWN is an automatically constructed multilingual lexical knowledge base based on WordNet.

The English language represents a constantly decreasing fraction of the Web. China and the EU each have greatly surpassed the U.S. in the number of Internet users, and other regions are expected to follow. Multilingual knowledge bases address this development by providing labels in multiple languages and making the semantic connections between words and names in different languages explicit.

For over 1,500,000 words in over 200 languages, UWN provides a corresponding list of meanings and shows how such meanings are semantically related. Additionally, the new MENTA extension adds a large-scale hierarchical taxonomy of named entities and their classes, drawing on over 200 different language editions of Wikipedia. This leads to a knowledge base with over 15 million words and names in different languages.

Example

For instance, a word like "board" could refer to a wooden panel, to a committee, to a blackboard, as a verb to the process of getting on a vehicle (e.g. "to board a plane"), and so on. For each of these meanings one can obtain the corresponding words in different languages, e.g. the committee sense of "board" corresponds to комитет in Russian and 委員会 in Japanese. Additionally, meanings are connected to related meanings, e.g. the committee meaning is linked to its generalizations administrative unit, social group, etc., and for each of these meanings one can again obtain corresponding words in different languages.

Query

We offer a User Interface that allows you to search and browse UWN.

Publications

Downloads

Java Library

We provide a small Java library that can be used with one or more large plugins, which provide the complete data for offline use (i.e., no need to connect to our servers). More information (and new versions) will follow soon. In the meantime, please contact Gerard de Melo if you have any questions.

Important: A new version of uwnapi.zip was released on 2012-11-23. This version allows you to obtain statement weights and fixes a character encoding issue (thanks to Aya Zoghby for reporting this). Please upgrade!

How to use the library

Take a look at the example code.

Other programming languages

Raw Dump

Alternatively, you can work with a raw dump of the UWN Core. We provide a gzip-compressed TSV file, which is best decompressed on the fly while reading for best performance. Each line contains subject, predicate, object, and weight, separated by tabs.