PRAVDA: label PRopagated fAct extraction on Very large DAta

PRAVDA is a knowledge-harvesting engine, which is based on label propagation, to extract both base facts and temporal facts.

Dataset for ACL 2012:

  • The corpus can be downloaded here: corpus.tar.gz.
  • The extracted facts and the evaluation results reported in the experiments can be downloaded here: results.tar.gz.


  • Yafang Wang, Maximilian Dylla, Zhaochun Ren, Marc Spaniol, Gerhard Weikum
    PRAVDA-live: Interactive Knowledge Harvesting
    Proceedings of the 21th ACM Conference on Information and Knowledge Management (CIKM 2012), Maui, Hawaii, US, October 29-November 2, 2012, pp. 2674-2676,
  • Yafang Wang, Maximilian Dylla, Marc Spaniol and Gerhard Weikum:
    Coupling Label Propagation and Constraints for Temporal Fact Extraction
    Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju, Republic of Korea, July 8-14, 2012, pp. 233-237.
  • Yafang Wang, Bin Yang, Lizhen Qu, Marc Spaniol and Gerhard Weikum:
    Harvesting Facts from Textual Web Sources by Constrained Label Propagation
    Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, Scotland, UK, October 24-28, 2011, pp. 837-846, pdf-file.



Pravda-live can be tested online: DEMO.

The corpus for the examples in the demo can be checked from CORPUS.