Question Answering Systems

Advanced lecture, 6 ECTS credits, Summer semester 2020

Lectures and assignments

In a typical lecture, we will go through two full research papers per week, and this will constitute the reading material for that week. No additional textbooks are necesary. There will be no tutorials, and presentations by students are not necessary. The lecturer will describe the methods in the research papers during the class. This is not a seminar course.

As assignments for each week, students will be asked to write a short summary of each paper discussed in the lecture (after the lecture), and comment on their advantages and disadvantages. Assignments will be individual and group work will not be allowed. Assignment grades will contribute to the final exam grade.

The report on each paper is recommended to be crisp and around ten lines: four lines on the summary, and three sentences each on positives and negatives. You can include more points if really necessary, but overly verbose reports are discouraged. There are no correct or wrong answers, and no appropriate sentence lengths. The assignment is be put in a text file names as 2020-qa-firstname-lastname-week-nn.txt (nn = 01, 02, ... ,11) and attached in an email to the lecturer and the TA in cc. Please be consistent in how you write your name in the files, and indicate your immatriculation number within the attachment text. The email subject is to be 2020 QA Assignment Week nn.

The deadline for submitting the assignments before the start of the next class (14:00 on Tuesdays). Failure to meet this deadline will result in de-registration from the course.

To do the exercises, you have to study the required reading material and go through the slides.

We do not allow plagiarism. The first time you are caught, you will receive zero points for the specific assignment. The second time, you will be de-registered from the course.

There will be an additional toy programming assignment at the end of the course.

Basic Information

Type: Advanced lecture
Lecturer: Dr. Rishiraj Saha Roy
Credits: 6 ECTS credits
Time: Tuesdays, 14:00 - 16:00 in HS 001, E1.3 / Online via Zoom
Please register for the Google Group if you'd like to take the class! [Registration deadline: 12 May 2020]
Final exam type: Oral
Course duration: 05 May - 14 July 2020 (truncated by one month due to the corona crisis)
Load per week: 1 lecture (typically covering two research papers) and 1 writing assignment
Teaching assistant: Magdalena Kaiser

Course Contents

In this research-oriented advanced lecture, we will cover topics around automated question answering (QA) systems over knowledge graphs, textual sources, and potential combinations. Sample topics include template-based methods, neural methods, named entity disambiguation, and harnessing paraphrases for QA. The last few years have seen an explosion of research on the topic of QA, spanning the communities of information retrieval, natural language processing, and artificial intelligence. This course would cover the highlights of this really active period of growth for QA to give the participants a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the course with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward.

By the end of this course, students will be able to describe and contrast state-of-the-art approaches for question answering. They will also be capable of critically examining current methods in the field with respect to their contributions and vulnerabilities. More generally, they will gain experience in analyzing relevant scientific literature.

Prerequisites

A basic knowledge of database management systems, information retrieval, natural language processing, and machine learning will be helpful. Additional knowledge of probability and statistics, linear algebra, and optimization techniques is reocmmended, but not absolutely necessary.

Exams and eligibility

At the end of the course, there will be one oral exam and one oral re-exam. There will be no mid-term exams. The oral exam is (tentatively) online, closed book, and individual.

To be eligible for the exams, students must submit all assignments within stipulated deadlines. The overall grade will be the best result of the main exam and the re-exam (there will be no further attempts).

Lecture Schedule

Lecture	Date	Topic	Slides	Videos	Reading
01	05 May 2020	Introduction to templates and KG-QA	PDF	Part 1, Part 2	[1]
02	12 May 2020	Templates: From text to curated KGs	PDF	Part 1, Part 2	[2, 3]
03	19 May 2020	Open KGs: Templates, paraphrases and graphs	PDF	Part 1, Part 2	[4, 5]
04	26 May 2020	Named entity recognition and disambiguation	PDF	Part 1, Part 2	[6, 7]
05	02 June 2020	Keeping efficiency in mind	PDF	Part 1, Part 2	[8, 9]
06	09 June 2020	Benchmarks that made a difference	PDF	Part 1, Part 2	[10, 11]
07	16 June 2020	Neural KG-QA systems	PDF	Part 1, Part 2	[12, 13]
08	23 June 2020	Reading comprehension and open-domain QA	PDF	Part 1, Part 2	[14, 15]
09	30 June 2020	QA over heterogeneous sources	PDF	Part 1, Part 2	[16, 17]
10	07 July 2020	Reinforcement learning in QA	PDF	Part 1, Part 2	[18, 19]
11	14 July 2020	Conversational question answering	PDF	Part 1, Part 2	[20, 21]
-	21 July 2020	Main exam (oral)	-	-	-
-	04 August 2020	Re-exam (oral)	-	-	-

Each class has 1-2 papers as related readings. It is recommended that you read the required material before the lecture.

References

[1] Unger, Christina, Lorenz Bühmann, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, and Philipp Cimiano. "Template-based question answering over RDF data." In Proceedings of the 21st international conference on World Wide Web, pp. 639-648. 2012.

[2] Abujabal, Abdalghani, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum. "Never-ending learning for open-domain question answering over knowledge bases." In Proceedings of the 2018 World Wide Web Conference, pp. 1053-1062. 2018.

[3] Ravichandran, Deepak, and Eduard Hovy. "Learning surface text patterns for a question answering system." In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 41-47. Association for Computational Linguistics, 2002.

[4] Fader, Anthony, Luke Zettlemoyer, and Oren Etzioni. "Paraphrase-driven learning for open question answering." In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1608-1618. 2013.

[5] Lu, Xiaolu, Soumajit Pramanik, Rishiraj Saha Roy, Abdalghani Abujabal, Yafang Wang, and Gerhard Weikum. "Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs." In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 105-114. 2019.

[6] Hoffart, Johannes, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. "Robust disambiguation of named entities in text." In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 782-792. Association for Computational Linguistics, 2011.

[7] Ferragina, Paolo, and Ugo Scaiella. "Fast and accurate annotation of short texts with wikipedia pages." IEEE software 29, no. 1 (2011): 70-75.

[8] Bast, Hannah, and Elmar Haussmann. "More accurate question answering on Freebase." In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1431-1440. 2015.

[9] Diefenbach, Dennis, Andreas Both, Kamal Singh, and Pierre Maret. "Towards a question answering system over the semantic Web." Semantic Web (2018): 1-19.

[10] Berant, Jonathan, Andrew Chou, Roy Frostig, and Percy Liang. "Semantic parsing on freebase from question-answer pairs." In Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 1533-1544. 2013.

[11] Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang. "SQuAD: 100,000+ Questions for Machine Comprehension of Text." In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383-2392. 2016.

[12] Yih, Wen-tau, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. "Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base." In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1321-1331. 2015.

[13] Huang, Xiao, Jingyuan Zhang, Dingcheng Li, and Ping Li. "Knowledge graph embedding based question answering." In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 105-113. 2019.

[14] Chen, Danqi, Adam Fisch, Jason Weston, and Antoine Bordes. "Reading Wikipedia to Answer Open-Domain Questions." In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1870-1879. 2017.

[15] Clark, Christopher, and Matt Gardner. "Simple and Effective Multi-Paragraph Reading Comprehension." In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 845-855. 2018.

[16] Sun, Haitian, Tania Bedrax-Weiss, and William Cohen. "PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2380-2390. 2019.

[17] Sydorova, Alona, Nina Poerner, and Benjamin Roth. "Interpretable Question Answering on Knowledge Bases and Text." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4943-4951. 2019.

[18] Buck, Christian, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, and Wei Wang. "Ask the Right Questions: Active Question Reformulation with Reinforcement Learning." In Proceedings of the Sixth International Conference on Learning Representations (ICLR). 2018.

[19] Das, Rajarshi, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. "Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning." In Proceedings of the Sixth International Conference on Learning Representations (ICLR). 2018.

[20] Christmann, Philipp, Rishiraj Saha Roy, Abdalghani Abujabal, Jyotsna Singh, and Gerhard Weikum. "Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion." In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 729-738. 2019.

[21] Shen, Tao, Xiubo Geng, Q. I. N. Tao, Daya Guo, Duyu Tang, Nan Duan, Guodong Long, and Daxin Jiang. "Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2442-2451. 2019.