HIGGINS

HIGGINS project aims to combine crowdsourcing with automated information extraction techniques to enable high-quality fact extraction from complex textual inputs.

Overview

Ambiguity, complexity, and diversity in natural language textual expressions are major hindrances to automated knowledge extraction. As a result state-of-the-art methods for extracting entities and relationships from unstructured data make incorrect extractions or produce noise. With the advent of human computing, computationally hard tasks have been addressed through human inputs. While text-based knowledge acquisition can benefit from this approach, humans alone cannot bear the burden of extracting knowledge from the vast textual resources that exist today. Even making payments for crowdsourced acquisition can quickly become prohibitively expensive. HIGGINS employs principled methods to effectively garner human computing inputs for improving the extraction of knowledge-base facts from natural language texts. The idea is to complement automatic extraction techniques with human computing to reap the benefits of both while overcoming each others' limitations.

HIGGINS architecture combines an information extraction (IE) engine with a human computing (HC) engine to produce high quality facts. The IE engine combines statistics derived from Web Corpora (Wikipedia and ClueWeb) with semantic resources (WordNet and ConceptNet) to construct a large dictionary of entity and relational phrases. It employs specifically designed statistical language models for phrase relatedness to come up with questions and relevant candidate answers that are presented to human workers. In our experiments we extract relation-centric facts about fictitious characters in narrative text, where the issues of diversity and complexity in expressing relations are far more pronounced.


For scientific works, please cite this paper

Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition. Sarath Kumar Kondreddi, Peter Triantafillou and Gerhard Weikum, In proceedings of International Conference on Data Engineering (ICDE), 2014, Chicago, IL, USA.

 

 

Publications

HIGGINS Data

Dictionary of Relations

Experiments

Higgins Results

  • Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)

       

       

       
      Prominent Set
      Movie Plots Prominent Set CSV HTML
      Movie Cast Prominent Set CSV HTML
      Book Plots Prominent Set CSV HTML
      Book Cast Prominent Set CSV HTML

       

       
      Random Set
      Movie Plots Random Set CSV HTML
      Movie Cast Random Set CSV HTML
      Book Plots Random Set CSV HTML
      Book Cast Random Set CSV HTML

       

       

  • C. Comparison of Higgins Components

    Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)

       

       

       
      Statistics Only
       
      Random Set
      Movie Plots Random Set CSV HTML
      Movie Cast Random Set CSV HTML
      Book Plots Random Set CSV HTML
      Book Cast Random Set CSV HTML

       

       
      Semantics Only
       
      Random Set
      Movie Plots Random Set CSV HTML
      Movie Cast Random Set CSV HTML
      Book Plots Random Set CSV HTML
      Book Cast Random Set CSV HTML

       

       

  • E. HC Only

    Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)

       

       

       
      Prominent Set
      Movie Plots Prominent Set CSV HTML
      Movie Cast Prominent Set CSV HTML
      Book Plots Prominent Set CSV HTML
      Book Cast Prominent Set CSV HTML

       

       
      Random Set
      Movie Plots Random Set CSV HTML
      Movie Cast Random Set CSV HTML
      Book Plots Random Set CSV HTML
      Book Cast Random Set CSV HTML

       

       

  • D. OLLIE extractions

    Format: (<question id> , <entity one> , <entity two> , <sentences> , <number of sentences></\>)
    Note: Each entityone-entitytwo pair may have multiple extractions (can be identified by questionid)

       

       

       
      Prominent Set
      Movie Plots Prominent Set CSV HTML
      Movie Cast Prominent Set CSV HTML
      Book Plots Prominent Set CSV HTML
      Book Cast Prominent Set CSV HTML

       

       
      Random Set
      Movie Plots Random Set CSV HTML
      Movie Cast Random Set CSV HTML
      Book Plots Random Set CSV HTML
      Book Cast Random Set CSV HTML

       

       

HIGGINS Games