PRAVDA: label PRopagated fAct extraction on Very large DAta

PRAVDA is a knowledge-harvesting engine, which is based on label propagation, to extract both base facts and temporal facts.

Dataset for ACL 2012:

The corpus can be downloaded here: corpus.tar.gz.
The extracted facts and the evaluation results reported in the experiments can be downloaded here: results.tar.gz.

Publications

Yafang Wang, Maximilian Dylla, Zhaochun Ren, Marc Spaniol, Gerhard Weikum
PRAVDA-live: Interactive Knowledge Harvesting
Proceedings of the 21^th ACM Conference on Information and Knowledge Management (CIKM 2012), Maui, Hawaii, US, October 29-November 2, 2012, pp. 2674-2676,
Yafang Wang, Maximilian Dylla, Marc Spaniol and Gerhard Weikum:
Coupling Label Propagation and Constraints for Temporal Fact Extraction
Proceedings of the 50^th Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju, Republic of Korea, July 8-14, 2012, pp. 233-237.
BibTeX
Yafang Wang, Bin Yang, Lizhen Qu, Marc Spaniol and Gerhard Weikum:
Harvesting Facts from Textual Web Sources by Constrained Label Propagation
Proceedings of the 20^th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, Scotland, UK, October 24-28, 2011, pp. 837-846, pdf-file.
BibTeX

Pravda-live can be tested online: DEMO.

The corpus for the examples in the demo can be checked from CORPUS.