HYENA: Hierarchical Type Classification for Entity Names

HYENA is a multi-label classifier for entity types based on hierarchical taxonomies derived from YAGO2 knowledge base.

HYENA types taxonomy is composed of 505 types organized into a directed acyclic graph with 5 main super types in its top level, and 9 levels in its deepts part. HYENA was trained on 1.6 million instances extracted from 50,000 randomly selected Wikipedia articles.

HYENA uses neighboring words and bigrams, part-of-speech tags, and also phrases from a large gazetteer derived from YAGO2 knowledge base.

Publications

  • HYENA: Hierarchical Type Classification for Entity Names   PDF
    Mohamed Amir Yosef, Sandro Bauer, Johannes Hoffart, Marc Spaniol, Gerhard Weikum
    In: Proceedings of the 24th International Conference on Computational Linguistics, Coling 2012, Mumbai, India, 2012
    For scientific works, please cite this paper
  • HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text   PDF
    Mohamed Amir Yosef, Sandro Bauer, Johannes Hoffart, Marc Spaniol, Gerhard Weikum
    In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, 2013

HYENA Fine-grained Type Hierarchy

HYENA type taxonomy was derived from YAGO knolwedge base by starting with five broaad classes namely PERSON, LOCATION, ORGANIZATION, EVENT and ARTIFACT. Under each of these superclasses, the most 100 prominent subclasses are picked based on the population of the classes. Classes are organized in a hierachy which has 9 levels in its deepest parts. <br/> You can browse our hierarchy in the pdf file below or using our Interactive Browser.

Properities of the dataset used to train and evaluate HYENA

data propertytrainingtesting
# of articles50,00010,000
# of instances (all types)1,613,340253,029
# of location instances489,003 (30%)86,936 (34.4%)
# of person instances426,467 (26.4%)62,446 (24.6%)
# of organization instances219,716 (13.6%)38,293 (15.1%)
# of artifact instances204,802 (12.7%)31,899 (12.6%)
# of event instances176,549 (10.9%)28,952 (11.4%)
# instances in 1 top-level class1,131,994 (70.2%)179,240 (70.8%)
# instances in 2 top-level classes182,508 (11.3%)33,399 (13.2%)
# instances in more than 2 top-level classes6,492 (0.4%)828 (0.3%)
# instances not in any class292,346 (18.1%)39,562 (15.6%)

Results

In the Coling 2012 paper, HYENA has been tested on 253,029 instances from 10,000 randomly selected Wikipedia articles. The macro per class, and micro results are shown in the table below.

MacroMicro
PrecisionRecallF1PrecisionRecallF1
HYENA0.8780.8630.87 0.9130.9320.922
HYENA + meta-classifier0.89 0.8370.8620.9160.9140.915

Detailed HYENA results for each type classifier, as well as the output for each testing instance are available here.
Results are downloadable as one compressed archive here.