Information Extraction

Block seminar, 7 ECTS credits, winter semester 2016–17

Basic Information

  • Type: block seminar
  • Lecturer: Jannik Strötgen
  • Credits: 7 ECTS credits
  • Registration: The seminar and the waiting list are full. No further registrations can be considered, sorry!
  • Block seminar: February 9 and Feburary 10, 2017 (for details, see schedule below)

News

The block seminar is over. The final grades can be found here. Note that it contains the last four digits of your "Matrikelnummer".

Topics

In this seminar, we will cover topics such as named entity recognition and normalization, temporal information extraction, relation extraction, and fact extraction. A list of topics can be found below.

Organization

Registration and initial information

  • The number of participants is limited, but you can be waitlisted as participation in the kick-off meeting and the first lecture is mandatory. If registered students do not show up, their places are given to waitlisted students.
  • To register, please send me an email with: (i) your name, (ii) Matrikelnummer, (iii) preferred email address, (iv) field of study, and (v) semester (incl. whether BA or MA).
  • You will get a reply whether you are registered or waitlisted.

The seminar is a block seminar and will take place on two (consecutive) days at the end of January or beginning of February - the exact days are agreed with the participants. However, there will also be two meetings at the beginning of the semester:

October 26, 2016 -- Kick-off meeting (participation is mandatory)

  • date and time: October 26, 2016, 2:15 pm - 3:45 pm
  • place: seminar room 23, MPI-Inf building (E 1.4, ground level)

    • explanation of the structure and organization of the seminar
    • brief introduction to information extraction
    • presentation of the topics

November 2, 2016 -- Lecture (participation is mandatory)

  • date and time: November 2, 2016, 2:15 pm - 3:45 pm
  • place: seminar room 0.01, MMCI building (E 1.7)

    • "How to prepare and present a seminar talk"

  • As this is a block seminar, it is particularly crucial that the students' presentations are of high quality. This lecture aims at preparing the participants in such a way that their slides and presentations will be of high quality.

Schedule

slides and material can be found below

  • October 26, 2016:

    • Kick-off meeting

  • November 2, 2016:

    • "How to prepare and present a seminar talk"

  • November 2, 2016:

    • students send a ranked list of their top 3 topics via email

  • November 30, 2016:

    • students send a suggestion of the outline of their seminar paper, including an itemization of the planned content for each section

  • January 11, 2017:

    • students submit their final seminar paper

  • two weeks before the first day of the block seminar:

    • students send preliminary slides

  • two days before the first day of the block seminar:

    • students send their final slides which they will use in the block seminar

  • block seminar, day 1: February 9, 2017
  • block seminar, day 2: February 10, 2017

Schedule of the block seminar

might be subject to change

All seminar papers are now available to the participants. Get them here.

 

Day 1 - February 9, 2017 -- 10:15 to 15:45

  • 10:15 -- 10:25 Introduction
  • 10:25 -- 11:00 Talk 1-1

    • N-1: Named Entity Recognition -- Azamat Mukhamedov [slides]

      • Nadeau / Sekine (2007): A Survey of Named Entity Recognition and Classification, Linguisticae Investigationes. [pdf]
      • Finkel et al (2005): Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. ACL. [pdf]

  • 11:00 -- 11:35 Talk 1-2

    • N-2: Named Entity Disambiguation -- Fariha Nasir [slides]

      • Hoffart et al. (2011): Robust Disambiguation of Named Entities in Text, EMNLP. [pdf]

  • 11:35 -- 12:10 Talk 1-3

    • T-3: Toponym Disambiguation -- Faraz Ahmad [slides]

      • Lieberman / Samet (2012): Adaptive Context Features for Toponym Resolution in Streaming News, SIGIR. [pdf]

  • 12:10 -- 12:45 Talk 1-4

    • T-1: Temporal Tagging -- Lukas Lange [slides]

      • Strötgen / Gertz (2013): Multilingual and Cross-domain Temporal Tagging, Language Resources and Evaluation. [pdf]

  • 12:45 -- 14:00 lunch
  • 14:00 -- 14:35: Talk 1-5

    • T-2: Temponym Tagging -- Nithya Mogane [slides]

      • Kuzey et al. (2016): As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes, WWW. [pdf]

  • 14:35 -- 15:10: Talk 1-6

    • N-4: Named Entity Recognition in Queries -- Boris Wiegand [slides]

      • Guo et al. (2009): Named Entity Recognition in Query, SIGIR. [pdf]

  • 15:10 -- 15:45: Talk 1-7

    • C-1: Coreference Resolution -- Nurzat Rakhmanberdieva [slides]

      • Rahman / Ng (2011): Coreference Resolution with World Knowledge. ACL. [pdf]

Day 2 - February 10, 2017 -- 10:15 to 15:45

  • 10:15 -- 10:50 Talk 2-1

    • K-1: Knowledge Harvesting -- Maha Aburahma [slides]

      • Hoffart et al. (2013): YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia, AI journal. [pdf]

  • 10:50 -- 11:25 Talk 2-2

    • R-2: Relation Extraction -- Shahrzad Kananizadeh [slides]

      • del Corro / Gemulla (2013): ClausIE: Clause-Based Open Information Extraction, WWW. [pdf]

  • 11:25 -- 12:00 Talk 2-3
  • E-1: Open Domain Event Extraction -- Alexander Mohr [slides]

    • Ritter et al (2012): Open Domain Event Extraction from Twitter, KDD. [pdf]

  • 12:00 -- 12:35 Talk 2-4

    • E-2: Network-based Event Extraction -- Prabal Agarwal [slides]

      • Spitz / Gertz (2016): Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events, SIGIR. [pdf]

  • 12:35 -- 13:45 lunch breack
  • 13:45 -- 14:20 Talk 2-5

    • K-2: Commonsense Harvesting -- Amina Durakovic [slides]

      • Tandon et al. (2014): WebChild: Harvesting and Organizing Commonsense Knowledge from the Web, WSDM. [pdf]

  • 14:20 -- 14:55 Talk 2-6

    • R-3: Computational History -- Julian Sahner [slides]

      • Yeung / Jatowt (2011): Studying How the Past is Remembered: Towards Computational History through Large Scale Text Mining, CIKM. [pdf]

  • 14:55 -- 15:30 Talk 2-7

    • W-1: Word Representations -- Vinh Thinh Ho [slides]

      • Mikolov et al. (2013): Distributed Representations of Words and Phrases and their Compositionality, NIPS. [pdf]

  • 15:30 -- 15:45 Final Words, Discussion, Wrap-up

Rules and Grading

  • participation in the kick-off meeting, the lecture and both days of the block seminar is mandatory
  • students will be assigned a particular topic and have to hand in a seminar paper (template will be provided) and give a presentation (20 minutes + 10 minutes for discussion)
  • grading will be based on

    • the report
    • the presentation
    • knowledge on the subject (as evidenced in the discussion after the presentation)
    • activity in the discussions
    • ability to stick to deadlines

  • Attention: According to the study regulations, you are only allowed to withdraw from the seminar within three weeks after the kick-off meeting, i.e., until November 16. Later withdrawal counts as "failed".

Slides, Templates, and Material

all documents are password protected

Slides (If you are not one of my students but interested in my slides, just contact me)

2016-10-26: organization and introduction [pdf]

2016-11-02: "How to present" lecture slides [pdf]

 

Templates

2016-11-02: Latex beamer template for presentations [pdf] [tar.gz]

2016-11-02: Latex seminar paper template [pdf] [tar.gz]

 

Material

  • Introduction

    • Sarawagi (2008): Information Extraction, Foundations and Trends in Databases, 2008. [pdf]


  • Named Entity Recognition and Disambiguation

    • N-1: Nadeau and Sekine (2007): A Survey of Named Entity Recognition and Classification, Linguisticae Investigationes, 2007. [pdf]
    • N-2: Hoffart et al. (2011): Robust Disambiguation of Named Entities in Text, EMNLP, 2011. [pdf]
    • N-3: Luo et al. (2015): Joint Named Entity Recognition and Disambiguation, EMNLP, 2015. [pdf]
    • N-4: Guo et al. (2009): Named Entity Recognition in Query, SIGIR, 2009. [pdf]


  • Temporal and Geographic Information Extraction

    • T-1: Strötgen, Gertz (2013): Multilingual and Cross-domain Temporal Tagging, Language Resources and Evaluation, 2013. [pdf]
    • T-2: Kuzey et al. (2016): As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes, WWW, 2016. [pdf]
    • T-3: Lieberman, Samet (2012): Adaptive Context Features for Toponym Resolution in Streaming News, SIGIR, 2012. [pdf]


  • Relation Extraction and Text Mining

    • R-1: Zhu et al. (2009): StatSnowball: a Statistical Approach to Extracting Entity Relationships, WWW, 2009. [pdf]
    • R-2: del Corro, Gemulla (2013): ClausIE: Clause-Based Open Information Extraction, WWW, 2013. [pdf]
    • R-3: Yeung and Jatowt (2011): Studying How the Past is Remembered: Towards Computational History through Large Scale Text Mining, CIKM, 2011. [pdf]


  • Knowledge and Commonsense Harvesting

    • K-1: Hoffart et al. (2013): YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia, AI journal, 2013. [pdf]
    • K-2: Tandon et al. (2014): WebChild: Harvesting and Organizing Commonsense Knowledge from the Web, WSDM, 2014. [pdf]


  • Event Extraction

    • E-1: Ritter et al (2012): Open Domain Event Extraction from Twitter, KDD, 2012. [pdf]
    • E-2: Spitz and Gertz (2016): Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events, SIGIR, 2016. [pdf]

  • Word Representations

    • W-1: Mikolov et al. (2013): Distributed Representations of Words and Phrases and their Compositionality, NIPS, 2013. [pdf]
    • W-2: Pennington et al. (2014): GloVe: Global Vectors for Word Representation, EMNLP, 2014. [pdf]