The multilingual Ambiverse Natural Language Understanding suite (AmbiverseNLU) combines state-of-the-art components for language understanding tasks in a single, easy-to-use, scalable suite: named entity recognition and disambiguation (or entity linking), open information extraction, entity salience estimation, and concept linking.
You can use AmbiverseNLU as web service, call it via command line scripts, or use it in your own code (e.g. as maven dependency).
AmbiverseNLU is jointly developed by the MPI for Informatics and Ambiverse GmbH.
Set up AmbiverseNLU
AmbiverseNLU is available as open source on GitHub, licensed under the Apache 2.0 license.
You can also use our Docker images or files to set up AmbiverseNLU quickly.
- 26th November, 2018: AmbiverseNLU Open Source Release
- Dominic Seyler, Tatiana Dembelova, Luciano Del Corro, Johannes Hoffart, Gerhard Weikum. A Study of the Importance of External Knowledge in the Named Entity Recognition Task. ACL 2018
- Gerhard Weikum, Johannes Hoffart, Fabian M. Suchanek. Ten Years of Knowledge Harvesting: Lessons and Challenges. IEEE Data Eng. Bull. 2016
- Luciano Del Corro, Abdalghani Abujabal, Rainer Gemulla, Gerhard Weikum. FINET: Context-Aware Fine-Grained Named Entity Typing. EMNLP 2015
- Fabio Petroni, Luciano Del Corro, Rainer Gemulla. CORE: Context-Aware Open Relation Extraction with Factorization Machines. EMNLP 2015
- Luciano Del Corro, Rainer Gemulla, Gerhard Weikum. Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning. EMNLP 2014
- Johannes Hoffart, Dragan Milchevski, Gerhard Weikum. STICS: searching with strings, things, and cats. SIGIR 2014.
- Johannes Hoffart, Yasemin Altun, Gerhard Weikum. Discovering emerging entities with ambiguous names. WWW 2014
- Luciano Del Corro, Rainer Gemulla. ClausIE - clause-based open information extraction. WWW 2013
- Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, Gerhard Weikum. KORE: keyphrase overlap relatedness for entity disambiguation. CIKM 2012
- Johannes Hoffart et al. Robust Disambiguation of Named Entities in Text. EMNLP 2011
Download AmbiverseNLU from GitHub
Sign up for the AmbiverseNLU Mailing List
Natural Language Understanding Components
KnowNER: Named Entity Recognition
Named Entity Recognition (NER) identifies mentions of named entities (persons, organizations, locations, songs, products, ...) in text.
KnowNER works on English, Czech, German, Spanish, and Russian texts.
AmbiverseNLU provides KnowNER for NER.
Further Reading: D. Seyler, T. Dembelova, L. Del Corro, J. Hoffart, and G. Weikum, “A Study of the Importance of External Knowledge in the Named Entity Recognition Task,” ACL 2018
AIDA: Named Entity Disambiguation
Named Entity Disambiguation (NED) links mentions recognized by NER (see above) to a unique identifier. Most names are ambiguous, especially family names, and entity disambiguation resolves this ambiguity. Together with NER, NED is often referred to as entity linking.
AIDA works on English, Chinese, Czech, German, Spanish, and Russian texts.
AmbiverseNLU provides an enhanced version of AIDA for NED, mapping mentions to entities registered in the Wikipedia-derived YAGO knowledge base.
Further reading: J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum, “Robust Disambiguation of Named Entities in Text,” EMNLP 2011
ClausIE: Open Information Extraction
Open Information Extraction (OpenIE) is the task of generating a structured output from natural language text in the form of n-ary propositions, consisting of a subject, a relation, and one or more arguments. For example, in the sentence "Albert Einstein was born in Ulm", an open information extraction system will generate the extraction ("Albert Einstein", "was born in", "Ulm"), where the first argument is usually
referred as the subject, the second as the relation, and the last one as the object or argument.
ClausIE works on English texts.
AmbiverseNLU provides an enhanced version of ClausIE for OpenIE.
Further reading: L. Del Corro and R. Gemulla, “ClausIE - clause-based open information extraction,” WWW 2013
Concept linking is similar to entity linking but with a focus on non-named entities (e.g., car, chair, etc.). It identifies relevant concepts in text and links them to a to concepts registered in the Wikipedia-derived YAGO knowledge base.
Concept Linking works on English, Chinese, Czech, German, Spanish, and Russian texts.
AmbiverseNLU provides a new concept linking component based on the original AIDA entity disambiguation with knowledge-informed spotting.
Entity Salience gives each entity in a document a score in [0,1], denoting its importance with respect to the document.
Our Entity Salience is fully multilingual.