Machine knowledge about the world’s entities should include quantity properties: heights of buildings, running times of athletes, energy efficiency of car models, energy production of power plants, and more. State-of-the-art knowledge bases, such as Wikidata, cover many relevant entities but often miss the corresponding quantities. Prior work on extracting quantity facts from web contents focused on high precision for top-ranked outputs, but did not tackle the KB coverage issue. This paper presents a recall-oriented approach, which aims to close this gap in knowledge-base coverage. Our method is based on iterative learning for extracting quantity facts, with two novel contributions to boost recall for KB augmentation without sacrificing the quality standards of the knowledge base. The first contribution is a query expansion technique to capture a larger pool of fact candidates. The second contribution is a novel technique for harnessing observations on value distributions for self-consistency. Experiments with extractions from more than 13 million web documents demonstrate the benefits of our method.
Enhancing Knowledge Bases with Quantity Facts
Vinh Thinh Ho, Daria Stepanova, Dragan Milchevski, Jannik Stroetgen, and Gerhard Weikum
In Proc. WWW 2022
- 15.1M processed Qfacts by mapping OpenIE with entity linking and quantity recognition: download
- Supplemental materials: download
- Quantity recognition tool: link, which is a fork of quantulum3, adding an extra feature of linking recognized units to YAGO4
- Code will be made available at: https://github.com/hovinhthinh/Qsearch