HIGGINS

HIGGINS project aims to combine crowdsourcing with automated information extraction techniques to enable high-quality fact extraction from complex textual inputs.

Overview

Ambiguity, complexity, and diversity in natural language textual expressions are major hindrances to automated knowledge extraction. As a result state-of-the-art methods for extracting entities and relationships from unstructured data make incorrect extractions or produce noise. With the advent of human computing, computationally hard tasks have been addressed through human inputs. While text-based knowledge acquisition can benefit from this approach, humans alone cannot bear the burden of extracting knowledge from the vast textual resources that exist today. Even making payments for crowdsourced acquisition can quickly become prohibitively expensive. HIGGINS employs principled methods to effectively garner human computing inputs for improving the extraction of knowledge-base facts from natural language texts. The idea is to complement automatic extraction techniques with human computing to reap the benefits of both while overcoming each others' limitations.

HIGGINS architecture combines an information extraction (IE) engine with a human computing (HC) engine to produce high quality facts. The IE engine combines statistics derived from Web Corpora (Wikipedia and ClueWeb) with semantic resources (WordNet and ConceptNet) to construct a large dictionary of entity and relational phrases. It employs specifically designed statistical language models for phrase relatedness to come up with questions and relevant candidate answers that are presented to human workers. In our experiments we extract relation-centric facts about fictitious characters in narrative text, where the issues of diversity and complexity in expressing relations are far more pronounced.


For scientific works, please cite this paper

Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition. Sarath Kumar Kondreddi, Peter Triantafillou and Gerhard Weikum, In proceedings of International Conference on Data Engineering (ICDE), 2014, Chicago, IL, USA.

 

People

Publications

HIGGINS Data

Dictionary of Relations

Experiments

Higgins Results

  • Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)
     

    Prominent Set

      
    Movie Plots Prominent SetCSVHTML
    Movie Cast Prominent SetCSVHTML
    Book Plots Prominent SetCSVHTML
    Book Cast Prominent SetCSVHTML

     

    Random Set

      
    Movie Plots Random SetCSVHTML
    Movie Cast Random SetCSVHTML
    Book Plots Random SetCSVHTML
    Book Cast Random SetCSVHTML

     

    • C. Comparison of Higgins Components

      Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)
       

      Statistics Only

      Random Set

        
      Movie Plots Random SetCSVHTML
      Movie Cast Random SetCSVHTML
      Book Plots Random SetCSVHTML
      Book Cast Random SetCSVHTML

       

      Semantics Only

      Random Set

        
      Movie Plots Random SetCSVHTML
      Movie Cast Random SetCSVHTML
      Book Plots Random SetCSVHTML
      Book Cast Random SetCSVHTML

       

      • E. HC Only

        Format: (<question id> , <movie/book title> , <entity one> , <entity two> , <sentences> , <fact one> , <fact one ground truth> , <fact one crowdsource> , ... , <no relation> , <no relation ground truth> , <no relation crowdsource> , <other relation> , <other relation ground truth> , <other relation crowdsource></\>)
         

        Prominent Set

          
        Movie Plots Prominent SetCSVHTML
        Movie Cast Prominent SetCSVHTML
        Book Plots Prominent SetCSVHTML
        Book Cast Prominent SetCSVHTML

         

        Random Set

          
        Movie Plots Random SetCSVHTML
        Movie Cast Random SetCSVHTML
        Book Plots Random SetCSVHTML
        Book Cast Random SetCSVHTML

         

        • D. OLLIE extractions

          Format: (<question id> , <entity one> , <entity two> , <sentences> , <number of sentences></\>)
          Note: Each entityone-entitytwo pair may have multiple extractions (can be identified by questionid)
           

          Prominent Set

            
          Movie Plots Prominent SetCSVHTML
          Movie Cast Prominent SetCSVHTML
          Book Plots Prominent SetCSVHTML
          Book Cast Prominent SetCSVHTML

           

          Random Set

            
          Movie Plots Random SetCSVHTML
          Movie Cast Random SetCSVHTML
          Book Plots Random SetCSVHTML
          Book Cast Random SetCSVHTML

          HIGGINS Games