Databases and Information Systems

Question Answering: Scope and Vision

Research Group Leader:  Rishiraj Saha Roy [Mentor:  Gerhard Weikum]

Knowledge bases have become valuable assets for search and analytics. However, they have become so large and heterogeneous that users struggle with formulating queries - even when supported by form-based or faceted user interfaces. This calls for new modes of interactive search and exploration of knowledge bases and associated datasets. We believe the most effective way of relieving the user from the necessity to cope with the complex structure of the data, is by means of natural language for question answering and other interactions. User inputs such as Which Nolan movies won an Oscar? can be translated into structured SPARQL queries. The key difficulty here is to understand the question structure and to bridge the gap between the user's input vocabulary and the terminology in the knowledge base, for example, mapping Nolan to the entity Christopher Nolan and won to an award-received predicate. Starting with our work on the DEANNA system, published in the EMNLP 2012 and CIKM 2013 conferences, and recent works published in the WSDM 2022, CIKM 2021, and SIGIR 2021 conferences, we have been pursuing this objective of translating user questions into structured graph queries.


Major challenges that we address in our ongoing work are complex questions and questions that cannot be answered by the underlying knowledge base alone. For example, the question Which Nolan movies won an Oscar but missed a Golden Globe? involves joining entities of different types across different relations like winning an award and directed (implicit in Nolan movies). The corresponding SPARQL query would then necessarily require multiple variables and triple patterns. The incompleteness of knowledge bases is often an obstacle for such complex questions, as evidence on interesting cases of not winning an award would be perhaps be captured only in textual form on the Web, in news, or in online forums. The other emerging direction has been to address conversational utterances, where the user assumes the system to automatically understand implicit context in follow-up utterances. In such a conversational setting, a user may only say ... and the music was by? as an alternative to Who composed the soundtrack for Inception?, when the utterance was preceded by the well-formed initial question Who was the lead actor in the movie Inception?.In latest work in this direction, we use a reinforcement learning model to leverage question reformulations as a primary source of implicit feedback in conversational question answering over knowledge bases.


While pushing the state-of-the-art in QA along multiple dimensions, our key driving criteria have been handling diversity in question formulations, complexity in information needs, and providing robust, efficient and interpretable solutions that are not constrained to specific settings and benchmarks. Please find details of our projects and associated publications listed below.


Relevant book:

Rishiraj Saha Roy and Avishek Anand, Question Answering for the Curated Web Tasks and Methods in QA over Knowledge Bases and Text Collections, Morgan & Claypool Publishers, 2022.


Relevant courses:

9 ECTS core course on Information Retrieval and Data Mining at the Saarland University, Winter Semester 2019/20

6 ECTS advanced course on Question Answering Systems at the Saarland University, Summer Semester 2020

7 ECTS seminar course on Selected Topics in Question Answering at the Saarland University, Winter Semester 2020/21

CLOCQ: Search Space Reduction for Complex Question Answering over Knowledge Bases

Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique or doing this is to apply named entity disambiguation (NED) systems to the question, and retrieve KB facts for the disambiguated entities. This work presents CLOCQ, an efficient method that prunes irrelevant parts of the search space using {KB-aware signals}. CLOCQ uses a top-k query processor over score-ordered lists of KB items that combine signals about lexical matching, relevance to the question, coherence among candidate items, and connectivity in the KB graph. Experiments with two recent QA benchmarks for complex questions demonstrate the superiority of CLOCQ over state-of-the-art baselines with respect to answer presence, size of the search space, and runtimes.


Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases,

Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum, WSDM 2022.

[Preprint] [Data] [Code] [Slides] [Poster] [Video]

EXAQT: Complex Temporal Question Answering on Knowledge Graphs

Questions with temporal intent are a special class of practical importance, but have not received much attention in research. This project presents EXAQT, the first end-to-end system for answering complex temporal questions that have multiple entities and predicates, and associated temporal conditions. EXAQT answers natural language questions over KGs in two stages, one geared towards high recall, the other towards precision at top ranks. The first step computes question-relevant compact subgraphs within the KG, and judiciously enhances them with pertinent temporal facts, both using fine-tuned BERT models. The second step constructs relational graph convolutional networks (R-GCN) from the first step's output, and enhances the R-GCNs with time-aware entity embeddings and attention over temporal relations. We evaluate EXAQT on a large dataset of 16k temporal questions compiled from a variety of general purpose KG-QA benchmarks. Results show that it outperforms three state-of-the-art systems for answering complex questions over KGs, thereby justifying specialized treatment of temporal QA.


Complex Temporal Question Answering on Knowledge Graphs,

Zhen Jia, Soumajit Pramanik, Rishiraj Saha Roy, and Gerhard Weikum, CIKM 2021.

[Preprint] [Data+Demo] [Code] [Slides] [Poster] [Video]

CONQUER: Reinforcement Learning from Reformulations in Conversational Question Answering

Conversational question answering (ConvQA) is becoming popular for interaction with personal assistants. State-of-the-art methods for ConvQA over knowledge graphs can only learn from crisp question-answer pairs found in popular benchmarks. In reality, however, such training data is hard to come by: Web users would rarely mark answers explicitly as correct or wrong. In this project, we take a step towards a more natural learning paradigm - from noisy and implicit feedback via question reformulations. A reformulation is likely to be triggered by an incorrect system response, whereas a new follow-up information need could often be indicative of correctness in the previous turn. We present a reinforcement learning model, termed CONQUER (Conversational Question answering with Reformulations), that is naturally suitable for modeling a stream of such reformulations. CONQUER models the answering process as multiple agents walking in parallel on the knowledge graph, where the walks are determined by actions sampled using a policy network. This policy network takes the question along with the conversational context as inputs, and is trained via noisy rewards obtained from the reformulation likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark with about 11k natural conversations containing around 205k reformulations. Experiments show that CONQUER successfully learns to answer conversational questions from noisy reward signals, significantly improving over the state-of-the-art baseline CONVEX.


Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs,

Magdalena Kaiser, Rishiraj Saha Roy, and Gerhard Weikum, SIGIR 2021.

[Preprint] [Data+Demo] [Code] [Slides] [Video] [Poster] [ACM Badge]

UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Text Sources

Question answering over knowledge graphs and other RDF data has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, systems from the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents the first QA system that can seamlessly operate over RDF datasets and text corpora, or both together, in a unified framework. Our method, called UNIQORN, builds a context graph on-the-fly, by retrieving question-relevant triples from the RDF data and/or snippets from a text corpus, using a fine-tuned BERT model. The resulting graph is typically rich but highly noisy. UNIQORN copes with this input by advanced graph algorithms for Group Steiner Trees, that identify the best answer candidates in the context graph. Experimental results on several benchmarks of  complex questions with multiple entities and relations, show that UNIQORN produces results comparable to the state-of-the-art on KGs, text corpora, and heterogeneous sources. The graph-based methodology provides user-interpretable evidence for the complete answering process.


UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text,

Soumajit Pramanik, Jesujoba Alabi, Rishiraj Saha Roy, and Gerhard Weikum, arXiv 2021.

[Preprint] [Data+Demo] [Code] [Slides] [Poster] [Video]

Tutorial on Question Answering over Curated and Open Web Sources

The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora.  We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.


Question Answering for the Curated Web Tasks and Methods in QA over Knowledge Bases and Text Collections,

Rishiraj Saha Roy and Avishek Anand, Morgan & Claypool Publishers, 2022.


Question Answering over Curated and Open Web Sources,

Rishiraj Saha Roy and Avishek Anand, SIGIR 2020.

[Website] [Preprint] [Slides] [Video Part 1] [Video Part 2]

CROWN: Conversational Question Answering over Passages

Question answering (QA) over text passages is a problem of long-standing interest in information retrieval. Recently, the conversational setting has attracted attention, where a user asks a sequence of questions to satisfy her information needs around a topic. While this setup is a natural one and similar to humans conversing with each other, it introduces a key research challenge: understanding the context left implicit by the user in follow-up questions. In this work, we demonstrate CROWN (Conversational passage ranking by Reasoning Over Word Networks): an unsupervised yet effective system for conversational QA with passage responses, that supports several modes of context propagation over multiple turns. To this end, CROWN first builds a word proximity network (WPN) from large corpora to store statistically significant term co-occurrences. At answering time, passages are ranked by a combination of their similarity to the question, and coherence of query terms within: these factors are measured by reading off node and edge weights from the WPN. CROWN provides an interface that is both intuitive for end-users, and insightful for experts for reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods.


Conversational Question Answering over Passages by Leveraging Word Proximity Networks,

Magdalena Kaiser, Rishiraj Saha Roy, and Gerhard Weikum, SIGIR 2020.

[Preprint] [Demo] [Code] [Video


CROWN: Conversational Passage Ranking by Reasoning over Word Networks,

Magdalena Kaiser, Rishiraj Saha Roy, and Gerhard Weikum, TREC 2019.

[Preprint] [Slides] [Poster] [BibTeX]

CONVEX: Conversational Question Answering over Knowledge Graphs

Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user’s inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, in this project, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies.


Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion,

Philipp Christmann, Rishiraj Saha Roy, Abdalghani Abujabal, Jyotsna Singh, and Gerhard Weikum, CIKM 2019.

[Preprint] [Data+Demo] [Code] [Slides] [Poster] [BibTeX]

QUEST: Answering Complex Questions by Joining Multi-Document Evidence

Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This project presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines.


Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs,

Xiaolu Lu, Soumajit Pramanik, Rishiraj Saha Roy, Abdalghani Abujabal, Yafang Wang, and Gerhard Weikum, SIGIR 2019.

[Preprint] [Video] [Slides] [Code+Data] [Demo] [Amazon Blog] [TechCrunch] [VentureBeat] [BibTeX]

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering

To bridge the gap between capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real questions that capture the various phenomena of interest, and the associated diversity in formulation patterns. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions are selected from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by current search engines. Through a large crowdsourcing effort, we (i) extract factoid questions from the platform and group them into paraphrase clusters (such interrogative paraphrases have been showed to be very useful in developing robustness to syntactic variations), and (ii) annotate these question clusters with their answers from Wikipedia. ComQA contains 11, 214 questions grouped into 4, 834 paraphrase clusters. We describe this construction process in detail, highlighting measures taken to ensure high quality of the output. We also present an extensive analysis of our dataset, including performances of state-of-the-art systems, that demonstrate how ComQA can effectively drive future research.


ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,

Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, NAACL-HLT 2019.

[Data] [Poster] [BibTeX]

TEQUILA: Temporal Question Answering over Knowledge Bases

Question answering over knowledge bases (KB-QA) poses challenges in handling complex questions that need to be decomposed into sub-questions. An important case, addressed in this project, is that of temporal questions, where cues for temporal relations need to be discovered and handled. We propose TEQUILA, an enabler method for temporal QA that can run on top of any KB-QA engine. TEQUILA has four stages. It detects if a question has temporal intent. It decomposes and rewrites the question into non-temporal sub-questions and temporal constraints. Answers to sub-questions are then retrieved from the underlying KB-QA engine. Finally, TEQUILA uses constraint reasoning on temporal intervals to compute final answers to the full question. Comparisons against state-of-the-art baselines show the viability of our method.


TEQUILA: Temporal Question Answering over Knowledge Bases,

Zhen Jia, Abdalghani Abujabal, Rishiraj Saha Roy, Jannik Strötgen, and Gerhard Weikum, CIKM 2018.

[Preprint] [Poster] [Data] [Demo] [Code] [BibTeX]


TempQuestions: A Benchmark for Temporal Question Answering,

Zhen Jia, Abdalghani Abujabal, Rishiraj Saha Roy, Jannik Strötgen, and Gerhard Weikum, HQA 2018 (WWW Workshop).

[Slides] [Data] [BibTeX]

NEQA: Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases

Translating natural language questions to semantic representations such as SPARQL is a core challenge in open-domain question answering over knowledge bases (KB-QA). Existing methods rely on a clear separation between an offline training phase, where a model is learned, and an online phase where this model is deployed. Two major shortcomings of such methods are that (i) they require access to a large annotated training set that is not always readily available and (ii) they fail on questions from before-unseen domains. To overcome these limitations, this project presents NEQA, a continuous learning paradigm for KB-QA. Offline, NEQA automatically learns templates mapping syntactic structures to semantic ones from a small number of training question-answer pairs. Once deployed, continuous learning is triggered on cases where templates are insufficient. Using a semantic similarity function between questions and by judicious invocation of non-expert user feedback, NEQA learns new templates that capture previously-unseen syntactic structures. This way, NEQA gradually extends its template repository. NEQA periodically re-trains its underlying models, allowing it to adapt to the language used after deployment. Our experiments demonstrate NEQA’s viability, with steady improvement in answering quality over time, and the ability to answer questions from new domains.


Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases,

Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, WWW 2018.

[Slides] [Data] [Tech@Bloomberg] [BibTeX]

TIPI: Answer Type Prediction for Answering Compositional Questions

This project investigates the problem of answering compositional factoid questions over knowledge bases (KB) under efficiency constraints. The method, called TIPI, (i) decomposes compositional questions, (ii) predicts answer types for individual sub-questions, (iii) reasons over the compatibility of joint types, and finally, (iv) formulates compositional SPARQL queries respecting type constraints. TIPI's answer type predictor is trained using distant supervision, and exploits lexical, syntactic and embedding-based features to compute context- and hierarchy-aware candidate answer types for an input question. Experiments on a recent benchmark show that TIPI results in state-of-the-art performance under the real-world assumption that only a single SPARQL query can be executed over the KB, and substantial reduction in the number of queries in the more general case.


Efficiency-aware Answering of Compositional Questions using Answer Type Prediction,

David Ziegler, Abdalghani Abujabal, Rishiraj Saha Roy, and Gerhard Weikum, IJCNLP 2017.

[Poster] [BibTeX]

QUINT: Automated Template Generation for Question Answering over Knowledge Graphs

Templates are an important asset for question answering over knowledge graphs, simplifying the semantic parsing of input utterances and generating structured queries for interpretable answers. Stateof-the-art methods rely on hand-crafted templates with limited coverage. This project presents QUINT, a system that automatically learns utterance-query templates solely from user questions paired with their answers. Additionally, QUINT is able to harness language compositionality for answering complex questions without having any templates for the entire question. Experiments with different benchmarks demonstrate the high quality of QUINT.


Automated Template Generation for Question Answering over Knowledge Graphs,

Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, and Gerhard Weikum, WWW 2017.

[Slides] [Data]


QUINT: Interpretable Question Answering over Knowledge Bases,

Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, EMNLP 2017.

[Demo] [Poster] [BibTeX]

TriniT: Relationship Queries on Extended Knowledge Graphs

Entity search over text corpora is not geared for relationship queries where answers are tuples of related entities and where a query often requires joining cues from multiple documents. With large knowledge graphs, structured querying on their relational facts is an alternative, but often suffers from poor recall because of mismatches between user queries and the knowledge graph or because of weakly populated relations. This project presents the TriniT search engine for querying and ranking on extended knowledge graphs that combine relational facts with textual web contents. Our query language is designed on the paradigm of SPO triple patterns, but is more expressive, supporting textual phrases for each of the SPO arguments. We present a model for automatic query relaxation to compensate for mismatches between the data and a user’s query. Query answers - tuples of entities - are ranked by a statistical language model. We present experiments with different benchmarks, including complex relationship queries, over a combination of the YAGO knowledge graph and the entity-annotated ClueWeb09 corpus.


Relationship Queries on Extended Knowledge Graphs,

Mohamed Yahya, Denilson Barbosa, Klaus Berberich, Qiuyue Wang, and Gerhard Weikum, WSDM 2016.

DEANNA: Robust Question Answering over the Web of Linked Data

Knowledge bases and the Web of Linked Data have become important assets for search, recommendation, and analytics. Natural-language questions are a user-friendly mode of tapping this wealth of knowledge and data. However, question answering technology does not work robustly in this setting as questions have to be translated into structured queries and users have to be careful in phrasing their questions. This project advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the user's input. To compensate for the omissions, we exploit textual sources associated with entities and relational facts. Our system translates user questions into an extended form of structured SPARQL queries, with text predicates attached to triple patterns. Our solution is based on a novel optimization model, cast into an integer linear program, for joint decomposition and disambiguation of the user question. We demonstrate the quality of our methods through experiments with the QALD benchmark.


Robust Question Answering over the Web of Linked Data,

Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, and Gerhard Weikum, CIKM 2013.


Natural Language Questions for the Web of Data,

Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, and Gerhard Weikum, EMNLP 2012.

External Collaborators

  • Zhen Jia, Southwest Jiaotong University, China
  • Soumajit Pramanik, IIT Bhilai, India
  • Abdalghani Abujabal, Amazon Alexa, Germany
  • Xiaolu Lu, Microsoft, Australia
  • Jannik Strötgen, Bosch Center for AI, Germany
  • Yafang Wang, Ant Financial Services Group, China
  • Mohamed Yahya, Bloomberg, UK
  • Mirek Riedewald, Northeastern University, USA


  • TimeQuestions: A benchmark of complex temporal questions collated from 8 general purpose KB-QA datasets [CIKM 2021]
  • ConvRef: A benchmark for with reformulations by real users for conversational question-answering [SIGIR 2021]
  • ConvQuestions: A benchmark for conversational question-answering over knowledge graphs from five domains [CIKM 2019]
  • ComQA: A benchmark of real complex questions with interrogative paraphrases [NAACL-HLT 2019]
  • TempQuestions: A benchmark of temporal questions collated from multiple question-answering benchmarks [CIKM 2018]
  • ComplexQuestions: A benchmark of real questions with multiple entities and relations [WWW 2017]