Coordinator: Rishiraj Saha Roy [Mentor: Gerhard Weikum]
Knowledge bases have become valuable assets for search and analytics. However, they have become so large and heterogeneous that users struggle with formulating queries - even when supported by form-based or faceted user interfaces. This calls for new modes of interactive search and exploration of knowledge bases and associated datasets. We believe the most effective way of relieving the user from the necessity to cope with the complex structure of the data, is by means of natural language for question answering and other interactions. User inputs such as Which Nolan movies won an Oscar? can be translated into structured SPARQL queries. The key difficulty here is to understand the question structure and to bridge the gap between the user's input vocabulary and the terminology in the knowledge base, for example, mapping Nolan to the entity Christopher Nolan and won to an award-received predicate. Starting with our work on the DEANNA system, published in the EMNLP 2012 and CIKM 2013 conferences, and recent works published in the NAACL 2019, SIGIR 2019, and CIKM 2019 conferences, we have been pursuing this objective of translating user questions into structured graph queries.
Major challenges that we address in our ongoing work are complex questions and questions that cannot be answered by the underlying knowledge base alone. For example, the question Which Nolan movies won an Oscar but missed a Golden Globe? involves joining entities of different types across different relations like winning an award and directed (implicit in Nolan movies). The corresponding SPARQL query would then necessarily require multiple variables and triple patterns. The incompleteness of knowledge bases is often an obstacle for such complex questions, as evidence on interesting cases of not winning an award would be perhaps be captured only in textual form on the Web, in news, or in online forums. The other emerging direction has been to address conversational utterances, where the user assumes the system to automatically understand implicit context in follow-up utterances. In such a conversational setting, a user may only say ... and the music was by? as an alternative to Who composed the soundtrack for Inception?, when the utterance was preceded by the well-formed initial question Who was the lead actor in the movie Inception?.
While pushing the state-of-the-art in QA along multiple dimensions, our key driving criteria have been handling diversity in question formulations, complexity in information needs, and providing unsupervised, interpretable, robust, and efficient solutions that are not constrained to specific settings and benchmarks.
Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user’s inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, in this project, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies.
Philipp Christmann, Rishiraj Saha Roy, Abdalghani Abujabal, Jyotsna Singh, and Gerhard Weikum, CIKM 2019.
Information needs around a topic cannot be satisfied in a single turn; users typically ask follow-up questions referring to the same theme and a system must be capable of understanding the conversational context of a request to retrieve correct answers. In this project, we present our submission to the TREC Conversational Assistance Track 2019, in which such a conversational setting is explored. We propose a simple unsupervised method for conversational passage ranking by formulating the passage score for a query as a combination of similarity and coherence. To be specific, passages are preferred that contain words semantically similar to the words used in the question, and where such words appear close by. We built a word-proximity network (WPN) from a large corpus, where words are nodes and there is an edge between two nodes if they co-occur in the same passages in a statistically significant way, within a context window. Our approach, named CROWN, improved nDCG scores over a provided Indri baseline on the CAsT training data. On the evaluation data for CAsT, three out of four submitted runs were better than the median performance with respect to AP@5 and nDCG@1000.
Magdalena Kaiser, Rishiraj Saha Roy, and Gerhard Weikum, TREC 2019.
Quantities appear in search queries in numerous forms: companies with annual revenue of at least 50 Mio USD, athletes who ran 200 meters faster than 19.5 s, electric cars with range above 400 miles, and so on. Processing such queries requires the understanding of numbers present in the query to capture the contextual information about the queried entities. Modern search engines and QA systems can handle queries that involve entities and types, but they often fail on properly interpreting quantities in queries and candidate answers when the specifics of the search condition (less than, above, etc.), the units of interest (seconds, miles, meters, etc.) and the context of the quantity matter (annual or quarterly revenue, etc.). In this paper, we present a search and QA system, called Qsearch, that can effectively answer advanced queries with quantity conditions. Our solution is based on a deep neural network for extracting quantity-centric
tuples from text sources, and a novel matching model to retrieve and rank answers from news articles and other web pages. Experiments demonstrate the effectiveness of Qsearch on benchmark queries collected by crowdsourcing.
Entities with Quantities: Extraction, Search, and Ranking,
Vinh Thinh Ho, Koninika Pal, Niko Kleer, Klaus Berberich, and Gerhard Weikum, WSDM 2020.
Qsearch: Answering Quantity Queries from Text,
Vinh Thinh Ho, Yusra Ibrahim, Koninika Pal, Klaus Berberich, and Gerhard Weikum, ISWC 2019.
Try our demo: https://qsearch.mpi-inf.mpg.de
Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This project presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines.
Xiaolu Lu, Soumajit Pramanik, Rishiraj Saha Roy, Abdalghani Abujabal, Yafang Wang, and Gerhard Weikum, SIGIR 2019.
To bridge the gap between capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real questions that capture the various phenomena of interest, and the associated diversity in formulation patterns. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions are selected from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by current search engines. Through a large crowdsourcing effort, we (i) extract factoid questions from the platform and group them into paraphrase clusters (such interrogative paraphrases have been showed to be very useful in developing robustness to syntactic variations), and (ii) annotate these question clusters with their answers from Wikipedia. ComQA contains 11, 214 questions grouped into 4, 834 paraphrase clusters. We describe this construction process in detail, highlighting measures taken to ensure high quality of the output. We also present an extensive analysis of our dataset, including performances of state-of-the-art systems, that demonstrate how ComQA can effectively drive future research.
Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, NAACL-HLT 2019.
Translating natural language questions to semantic representations such as SPARQL is a core challenge in open-domain question answering over knowledge bases (KB-QA). Existing methods rely on a clear separation between an offline training phase, where a model is learned, and an online phase where this model is deployed. Two major shortcomings of such methods are that (i) they require access to a large annotated training set that is not always readily available and (ii) they fail on questions from before-unseen domains. To overcome these limitations, this project presents NEQA, a continuous learning paradigm for KB-QA. Offline, NEQA automatically learns templates mapping syntactic structures to semantic ones from a small number of training question-answer pairs. Once deployed, continuous learning is triggered on cases where templates are insufficient. Using a semantic similarity function between questions and by judicious invocation of non-expert user feedback, NEQA learns new templates that capture previously-unseen syntactic structures. This way, NEQA gradually extends its template repository. NEQA periodically re-trains its underlying models, allowing it to adapt to the language used after deployment. Our experiments demonstrate NEQA’s viability, with steady improvement in answering quality over time, and the ability to answer questions from new domains.
Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, WWW 2018.
Templates are an important asset for question answering over knowledge graphs, simplifying the semantic parsing of input utterances and generating structured queries for interpretable answers. Stateof-the-art methods rely on hand-crafted templates with limited coverage. This project presents QUINT, a system that automatically learns utterance-query templates solely from user questions paired with their answers. Additionally, QUINT is able to harness language compositionality for answering complex questions without having any templates for the entire question. Experiments with different benchmarks demonstrate the high quality of QUINT.
Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, and Gerhard Weikum, WWW 2017.
Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, EMNLP 2017.
Question answering over knowledge bases (KB-QA) poses challenges in handling complex questions that need to be decomposed into sub-questions. An important case, addressed in this project, is that of temporal questions, where cues for temporal relations need to be discovered and handled. We propose TEQUILA, an enabler method for temporal QA that can run on top of any KB-QA engine. TEQUILA has four stages. It detects if a question has temporal intent. It decomposes and rewrites the question into non-temporal sub-questions and temporal constraints. Answers to sub-questions are then retrieved from the underlying KB-QA engine. Finally, TEQUILA uses constraint reasoning on temporal intervals to compute final answers to the full question. Comparisons against state-of-the-art baselines show the viability of our method.
Zhen Jia, Abdalghani Abujabal, Rishiraj Saha Roy, Jannik Strötgen, and Gerhard Weikum, CIKM 2018.
Zhen Jia, Abdalghani Abujabal, Rishiraj Saha Roy, Jannik Strötgen, and Gerhard Weikum, HQA 2018 (WWW Workshop).
This project investigates the problem of answering compositional factoid questions over knowledge bases (KB) under efficiency constraints. The method, called TIPI, (i) decomposes compositional questions, (ii) predicts answer types for individual sub-questions, (iii) reasons over the compatibility of joint types, and finally, (iv) formulates compositional SPARQL queries respecting type constraints. TIPI's answer type predictor is trained using distant supervision, and exploits lexical, syntactic and embedding-based features to compute context- and hierarchy-aware candidate answer types for an input question. Experiments on a recent benchmark show that TIPI results in state-of-the-art performance under the real-world assumption that only a single SPARQL query can be executed over the KB, and substantial reduction in the number of queries in the more general case.
David Ziegler, Abdalghani Abujabal, Rishiraj Saha Roy, and Gerhard Weikum, IJCNLP 2017.
Entity search over text corpora is not geared for relationship queries where answers are tuples of related entities and where a query often requires joining cues from multiple documents. With large knowledge graphs, structured querying on their relational facts is an alternative, but often suffers from poor recall because of mismatches between user queries and the knowledge graph or because of weakly populated relations. This project presents the TriniT search engine for querying and ranking on extended knowledge graphs that combine relational facts with textual web contents. Our query language is designed on the paradigm of SPO triple patterns, but is more expressive, supporting textual phrases for each of the SPO arguments. We present a model for automatic query relaxation to compensate for mismatches between the data and a user’s query. Query answers - tuples of entities - are ranked by a statistical language model. We present experiments with different benchmarks, including complex relationship queries, over a combination of the YAGO knowledge graph and the entity-annotated ClueWeb09 corpus.
Mohamed Yahya, Denilson Barbosa, Klaus Berberich, Qiuyue Wang, and Gerhard Weikum, WSDM 2016.
Knowledge bases and the Web of Linked Data have become important assets for search, recommendation, and analytics. Natural-language questions are a user-friendly mode of tapping this wealth of knowledge and data. However, question answering technology does not work robustly in this setting as questions have to be translated into structured queries and users have to be careful in phrasing their questions. This project advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the user's input. To compensate for the omissions, we exploit textual sources associated with entities and relational facts. Our system translates user questions into an extended form of structured SPARQL queries, with text predicates attached to triple patterns. Our solution is based on a novel optimization model, cast into an integer linear program, for joint decomposition and disambiguation of the user question. We demonstrate the quality of our methods through experiments with the QALD benchmark.
Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, and Gerhard Weikum, CIKM 2013.
Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, and Gerhard Weikum, EMNLP 2012.
- Zhen Jia, Southwest Jiaotong University, China
- Xiaolu Lu, RMIT University, Australia
- Mirek Riedewald, Northeastern University, USA
- Jannik Strötgen, Bosch Center for AI, Germany
- Yafang Wang, Ant Financial Services Group, China
- ConvQuestions: A benchmark for conversational question-answering over knowledge graphs from five domains [CIKM 2019]
- ComQA: A benchmark of real complex questions with interrogative paraphrases [NAACL-HLT 2019]
- TempQuestions: A benchmark of temporal questions collated from multiple question-answering benchmarks [CIKM 2018]
- ComplexQuestions: A benchmark of real questions with multiple entities and relations [WWW 2017]