Directly providing crisp responses to fact-based questions has become the de facto standard in Web search engines and digital assistants today. This saves users the trouble of browsing through one or more documents to locate the correct answers, or listening to long and verbose spoken responses. Our research designs mechanisms for enabling such direct answering over the Web by leveraging, but not being limited to, the power of large curated knowledge graphs. Our overarching goal is to provide robust, efficient, and interpretable solutions to the major tasks in this paradigm today: conversational, complex, and heterogeneous question answering. Read more here. Explore our work here.
Please find the details of our projects and their associated publications listed below.
Project explorer: https://qa.mpi-inf.mpg.de/projects/
Book: Rishiraj Saha Roy and Avishek Anand, Question Answering for the Curated Web: Tasks and Methods in QA over Knowledge Bases and Text Collections, Springer, 2022.
Models for conversational question answering (ConvQA) over knowledge graphs (KGs) are usually trained and tested on benchmarks of gold QA pairs. This implies that training is limited to surface forms seen in the respective datasets, and evaluation is on a small set of held-out questions. Through our proposed framework REIGN, we take several steps to remedy this restricted learning setup. First, we systematically generate reformulations of training questions to increase robustness of models to surface form variations. This is a particularly challenging problem, given the incomplete nature of such questions. Second, we guide ConvQA models towards higher performance by feeding it only those reformulations that help improve their answering quality, using deep reinforcement learning. Third, we demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another. Finally, for a rigorous evaluation of robustness for trained models, we use and release large numbers of diverse reformulations generated by prompting GPT for benchmark test sets (resulting in 20x increase in sizes). Our findings show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only.
Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation, Magdalena Kaiser, Rishiraj Saha Roy, and Gerhard Weikum, WSDM 2024.
[Website] [Preprint] [Code] [Slides] [Video]
In conversational question answering (ConvQA), users express their information needs through a series of utterances with incomplete context and ad hoc style. Existing ConvQA methods typically rely on a single information source, like a curated knowledge base (KB), a text collection, or a set of Web tables, thereby reducing the overall answer recall. Further, none of them provide explanations that support the answer derivation process. We propose EXPLAIGNN: a method that overcomes these limitations by integrating information from a mixture of sources with user-comprehensible explanations for answers. Our technique constructs a heterogeneous graph from entities and evidence snippets retrieved from a KB, a text corpus, infoboxes, and Web tables. This large graph is then iteratively reduced via graph neural networks that incorporate question-level attention, until the best answers and their explanations are distilled. Comprehensive experiments show that EXPLAIGNN improves answering performance over state-of-the-art ConvQA baselines. A crowdsourced user study demonstrates that answers derived by the proposed framework are understandable by end users.
Conversational question answering (ConvQA) tackles sequential information needs where contexts in follow-up questions are left implicit. Current ConvQA systems operate over homogeneous sources of information: either a knowledge base (KB), or a text corpus, or a collection of tables. This project addresses the novel issue of jointly tapping into all of these together, this way boosting answer coverage. We present CONVINSE, an end-to-end pipeline for ConvQA over heterogeneous sources, operating in three stages: i) learning an explicit structured representation of an incoming question and its conversational context, ii) harnessing this frame-like representation to uniformly capture relevant evidences from KB, text, and tables, and iii) running a fusion-in-decoder model to generate the answer. We construct and release the first benchmark, ConvMix, for ConvQA over heterogeneous sources, comprising 3000 real-user conversations with over 15000 questions, along with entity annotations, completed question utterances, and question paraphrases. Experiments demonstrate the viability and advantages of our method, compared to state-of-the-art baselines.
Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique or doing this is to apply named entity disambiguation (NED) systems to the question, and retrieve KB facts for the disambiguated entities. This work presents CLOCQ, an efficient method that prunes irrelevant parts of the search space using KB-aware signals. CLOCQ uses a top-k query processor over score-ordered lists of KB items that combine signals about lexical matching, relevance to the question, coherence among candidate items, and connectivity in the KB graph. Experiments with two recent QA benchmarks for complex questions demonstrate the superiority of CLOCQ over state-of-the-art baselines with respect to answer presence, size of the search space, and runtimes.
Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases, Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum, WSDM 2022.
[Preprint] [Website] [Code] [Slides] [Poster] [Video]
CLOCQ: A Toolkit for Fast and Easy Access to Knowledge Bases, Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum, BTW 2023.
[Code] [Poster] [Slides]
Question Entity and Relation Linking to Knowledge Bases via CLOCQ, Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum, SMART@ISWC '22.
[Code] [Slides] [Video]
Questions with temporal intent are a special class of practical importance, but have not received much attention in research. This project presents EXAQT, the first end-to-end system for answering complex temporal questions that have multiple entities and predicates, and associated temporal conditions. EXAQT answers natural language questions over KGs in two stages, one geared towards high recall, the other towards precision at top ranks. The first step computes question-relevant compact subgraphs within the KG, and judiciously enhances them with pertinent temporal facts, both using fine-tuned BERT models. The second step constructs relational graph convolutional networks (R-GCN) from the first step's output, and enhances the R-GCNs with time-aware entity embeddings and attention over temporal relations. We evaluate EXAQT on a large dataset of 16k temporal questions compiled from a variety of general purpose KG-QA benchmarks. Results show that it outperforms three state-of-the-art systems for answering complex questions over KGs, thereby justifying specialized treatment of temporal QA.
Conversational question answering (ConvQA) is becoming popular for interaction with personal assistants. State-of-the-art methods for ConvQA over knowledge graphs can only learn from crisp question-answer pairs found in popular benchmarks. In reality, however, such training data is hard to come by: Web users would rarely mark answers explicitly as correct or wrong. In this project, we take a step towards a more natural learning paradigm - from noisy and implicit feedback via question reformulations. A reformulation is likely to be triggered by an incorrect system response, whereas a new follow-up information need could often be indicative of correctness in the previous turn. We present a reinforcement learning model, termed CONQUER (Conversational Question answering with Reformulations), that is naturally suitable for modeling a stream of such reformulations. CONQUER models the answering process as multiple agents walking in parallel on the knowledge graph, where the walks are determined by actions sampled using a policy network. This policy network takes the question along with the conversational context as inputs, and is trained via noisy rewards obtained from the reformulation likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark with about 11k natural conversations containing around 205k reformulations. Experiments show that CONQUER successfully learns to answer conversational questions from noisy reward signals, significantly improving over the state-of-the-art baseline CONVEX.
Question answering over knowledge graphs and other RDF data has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, systems from the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents the first QA system that can seamlessly operate over RDF datasets and text corpora, or both together, in a unified framework. Our method, called UNIQORN, builds a context graph on-the-fly, by retrieving question-relevant triples from the RDF data and/or snippets from a text corpus, using a fine-tuned BERT model. The resulting graph is typically rich but highly noisy. UNIQORN copes with this input by advanced graph algorithms for Group Steiner Trees, that identify the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that UNIQORN produces results comparable to the state-of-the-art on KGs, text corpora, and heterogeneous sources. The graph-based methodology provides user-interpretable evidence for the complete answering process.
The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.
Question Answering for the Curated Web Tasks and Methods in QA over Knowledge Bases and Text Collections, Rishiraj Saha Roy and Avishek Anand, Springer, 2022.
Question Answering over Curated and Open Web Sources, Rishiraj Saha Roy and Avishek Anand, SIGIR 2020.
[Website] [Preprint] [Slides] [Video Part 1] [Video Part 2]
Question answering (QA) over text passages is a problem of long-standing interest in information retrieval. Recently, the conversational setting has attracted attention, where a user asks a sequence of questions to satisfy her information needs around a topic. While this setup is a natural one and similar to humans conversing with each other, it introduces a key research challenge: understanding the context left implicit by the user in follow-up questions. In this work, we demonstrate CROWN (Conversational passage ranking by Reasoning Over Word Networks): an unsupervised yet effective system for conversational QA with passage responses, that supports several modes of context propagation over multiple turns. To this end, CROWN first builds a word proximity network (WPN) from large corpora to store statistically significant term co-occurrences. At answering time, passages are ranked by a combination of their similarity to the question, and coherence of query terms within: these factors are measured by reading off node and edge weights from the WPN. CROWN provides an interface that is both intuitive for end-users, and insightful for experts for reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods.
Conversational Question Answering over Passages by Leveraging Word Proximity Networks, Magdalena Kaiser, Rishiraj Saha Roy, and Gerhard Weikum, SIGIR 2020.
[Preprint] [Demo] [Code] [Video]
CROWN: Conversational Passage Ranking by Reasoning over Word Networks, Magdalena Kaiser, Rishiraj Saha Roy, and Gerhard Weikum, TREC 2019.
[Preprint] [Slides] [Poster] [BibTeX]
Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user’s inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, in this project, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies.
Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This project presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines.
To bridge the gap between capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real questions that capture the various phenomena of interest, and the associated diversity in formulation patterns. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions are selected from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by current search engines. Through a large crowdsourcing effort, we (i) extract factoid questions from the platform and group them into paraphrase clusters (such interrogative paraphrases have been showed to be very useful in developing robustness to syntactic variations), and (ii) annotate these question clusters with their answers from Wikipedia. ComQA contains 11, 214 questions grouped into 4, 834 paraphrase clusters. We describe this construction process in detail, highlighting measures taken to ensure high quality of the output. We also present an extensive analysis of our dataset, including performances of state-of-the-art systems, that demonstrate how ComQA can effectively drive future research.
ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters, Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, NAACL-HLT 2019.
Question answering over knowledge bases (KB-QA) poses challenges in handling complex questions that need to be decomposed into sub-questions. An important case, addressed in this project, is that of temporal questions, where cues for temporal relations need to be discovered and handled. We propose TEQUILA, an enabler method for temporal QA that can run on top of any KB-QA engine. TEQUILA has four stages. It detects if a question has temporal intent. It decomposes and rewrites the question into non-temporal sub-questions and temporal constraints. Answers to sub-questions are then retrieved from the underlying KB-QA engine. Finally, TEQUILA uses constraint reasoning on temporal intervals to compute final answers to the full question. Comparisons against state-of-the-art baselines show the viability of our method.
TempQuestions: A Benchmark for Temporal Question Answering, Zhen Jia, Abdalghani Abujabal, Rishiraj Saha Roy, Jannik Strötgen, and Gerhard Weikum, HQA 2018 (WWW Workshop).
Translating natural language questions to semantic representations such as SPARQL is a core challenge in open-domain question answering over knowledge bases (KB-QA). Existing methods rely on a clear separation between an offline training phase, where a model is learned, and an online phase where this model is deployed. Two major shortcomings of such methods are that (i) they require access to a large annotated training set that is not always readily available and (ii) they fail on questions from before-unseen domains. To overcome these limitations, this project presents NEQA, a continuous learning paradigm for KB-QA. Offline, NEQA automatically learns templates mapping syntactic structures to semantic ones from a small number of training question-answer pairs. Once deployed, continuous learning is triggered on cases where templates are insufficient. Using a semantic similarity function between questions and by judicious invocation of non-expert user feedback, NEQA learns new templates that capture previously-unseen syntactic structures. This way, NEQA gradually extends its template repository. NEQA periodically re-trains its underlying models, allowing it to adapt to the language used after deployment. Our experiments demonstrate NEQA’s viability, with steady improvement in answering quality over time, and the ability to answer questions from new domains.
Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases, Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, WWW 2018.
[Slides] [Templates] [Tech@Bloomberg]
This project investigates the problem of answering compositional factoid questions over knowledge bases (KB) under efficiency constraints. The method, called TIPI, (i) decomposes compositional questions, (ii) predicts answer types for individual sub-questions, (iii) reasons over the compatibility of joint types, and finally, (iv) formulates compositional SPARQL queries respecting type constraints. TIPI's answer type predictor is trained using distant supervision, and exploits lexical, syntactic and embedding-based features to compute context- and hierarchy-aware candidate answer types for an input question. Experiments on a recent benchmark show that TIPI results in state-of-the-art performance under the real-world assumption that only a single SPARQL query can be executed over the KB, and substantial reduction in the number of queries in the more general case.
Efficiency-aware Answering of Compositional Questions using Answer Type Prediction, David Ziegler, Abdalghani Abujabal, Rishiraj Saha Roy, and Gerhard Weikum, IJCNLP 2017.
Templates are an important asset for question answering over knowledge graphs, simplifying the semantic parsing of input utterances and generating structured queries for interpretable answers. Stateof-the-art methods rely on hand-crafted templates with limited coverage. This project presents QUINT, a system that automatically learns utterance-query templates solely from user questions paired with their answers. Additionally, QUINT is able to harness language compositionality for answering complex questions without having any templates for the entire question. Experiments with different benchmarks demonstrate the high quality of QUINT.
Automated Template Generation for Question Answering over Knowledge Graphs, Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, and Gerhard Weikum, WWW 2017.
QUINT: Interpretable Question Answering over Knowledge Bases, Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, EMNLP 2017.
Entity search over text corpora is not geared for relationship queries where answers are tuples of related entities and where a query often requires joining cues from multiple documents. With large knowledge graphs, structured querying on their relational facts is an alternative, but often suffers from poor recall because of mismatches between user queries and the knowledge graph or because of weakly populated relations. This project presents the TriniT search engine for querying and ranking on extended knowledge graphs that combine relational facts with textual web contents. Our query language is designed on the paradigm of SPO triple patterns, but is more expressive, supporting textual phrases for each of the SPO arguments. We present a model for automatic query relaxation to compensate for mismatches between the data and a user’s query. Query answers - tuples of entities - are ranked by a statistical language model. We present experiments with different benchmarks, including complex relationship queries, over a combination of the YAGO knowledge graph and the entity-annotated ClueWeb09 corpus.
Relationship Queries on Extended Knowledge Graphs, Mohamed Yahya, Denilson Barbosa, Klaus Berberich, Qiuyue Wang, and Gerhard Weikum, WSDM 2016.
Knowledge bases and the Web of Linked Data have become important assets for search, recommendation, and analytics. Natural-language questions are a user-friendly mode of tapping this wealth of knowledge and data. However, question answering technology does not work robustly in this setting as questions have to be translated into structured queries and users have to be careful in phrasing their questions. This project advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the user's input. To compensate for the omissions, we exploit textual sources associated with entities and relational facts. Our system translates user questions into an extended form of structured SPARQL queries, with text predicates attached to triple patterns. Our solution is based on a novel optimization model, cast into an integer linear program, for joint decomposition and disambiguation of the user question. We demonstrate the quality of our methods through experiments with the QALD benchmark.
Robust Question Answering over the Web of Linked Data, Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, and Gerhard Weikum, CIKM 2013.
Natural Language Questions for the Web of Data, Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, and Gerhard Weikum, EMNLP 2012.
- Zhen Jia, Southwest Jiaotong University, China
- Soumajit Pramanik, IIT Bhilai, India
- Abdalghani Abujabal, Amazon Alexa, Germany
- Xiaolu Lu, Microsoft, Australia
- Jannik Strötgen, Bosch Center for AI, Germany
- Yafang Wang, Ant Financial Services Group, China
- Mohamed Yahya, Bloomberg, UK
- Mirek Riedewald, Northeastern University, USA
- CompMix: A benchmark of complex QA over heterogeneous sources (KG+Text+Table+Infobox) [arXiv 2023]
- ConvMix: A benchmark of conversational QA over heterogeneous sources (KG+Text+Table+Infobox) [SIGIR 2022]
- TimeQuestions: A benchmark of complex temporal questions collated from 8 general purpose KB-QA datasets [CIKM 2021]
- ConvRef: A benchmark for with reformulations by real users for conversational question-answering [SIGIR 2021]
- ConvQuestions: A benchmark for conversational question-answering over knowledge graphs from five domains [CIKM 2019]
- ComQA: A benchmark of real complex questions with interrogative paraphrases [NAACL-HLT 2019]
- TempQuestions: A benchmark of temporal questions collated from multiple question-answering benchmarks [CIKM 2018]
- ComplexQuestions: A benchmark of real questions with multiple entities and relations [WWW 2017]