Scope and Vision

Knowledge bases have become valuable assets for search and analytics. However, they have become so large and heterogeneous that users struggle with formulating queries -- even when supported by form-based or faceted user interfaces. This calls for new modes of interactive search and exploration of knowledge bases and associated datasets. For example, a life scientist or political scientist may rapidly collect tens of interesting datasets for a specific study, but would then drastically lose her productivity when trying to join different data items and search for patterns, trends and insight.

We believe the most effective way of relieving the user from the necessity to cope with the complex structure of the data, is by means of natural language for question answering and other interactions. User inputs such as "Which love songs did Bob Dylan write?" can be translated into structured SPARQL queries. The key difficulty here is to understand the question structure and to bridge the gap between the user's input vocabulary and the terminology in the knowledge base, for example, mapping "write" to a composer predicate. Starting with our work on the DEANNA system, published in the WWW 2012 and EMNLP 2012 conferences, and recent works published in the EMNLP 2017, WWW 2017 and WWW 2018 conferences, we have been pursuing this objective of translating user questions into structured queries.

Major challenges that we address in our ongoing work are complex questions and questions that cannot be answered by the underlying knowledge base alone. For example, the question "Which European singers covered Bob Dylan?" involves joining entities of different types across different relations like composer and performed. The corresponding SPARQL query would necessarily require multiple variables. The incompleteness of knowledge bases is an obstacle for questions such as "Which love songs did Bob Dylan write about his wife?", as the lyrics and themes of songs would be captured only in textual form in Web documents and online communities.

Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases

Translating natural language questions to semantic representations such as SPARQL is a core challenge in open-domain question answering over knowledge bases (KB-QA). Existing methods rely on a clear separation between an offline training phase, where a model is learned, and an online phase where this model is deployed. Two major shortcomings of such methods are that (i) they require access to a large annotated training set that is not always readily available and (ii) they fail on questions from before-unseen domains. To overcome these limitations, this paper presents NEQA, a continuous learning paradigm for KB-QA. Offline, NEQA automatically learns templates mapping syntactic structures to semantic ones from a small number of training question-answer pairs. Once deployed, continuous learning is triggered on cases where templates are insufficient. Using a semantic similarity function between questions and by judicious invocation of non-expert user feedback, NEQA learns new templates that capture previously-unseen syntactic structures. This way, NEQA gradually extends its template repository. NEQA periodically re-trains its underlying models, allowing it to adapt to the language used after deployment. Our experiments demonstrate NEQA’s viability, with steady improvement in answering quality over time, and the ability to answer questions from new domains.

Automated Template Generation for Question Answering over Knowledge Graphs

Templates are an important asset for question answering over knowledge graphs, simplifying the semantic parsing of input utterances and generating structured queries for interpretable answers. Stateof-the-art methods rely on hand-crafted templates with limited coverage. Our system, coined QUINT, automatically learns utterance-query templates solely from user questions paired with their answers. Additionally, QUINT is able to harness language compositionality for answering complex questions without having any templates for the entire question.

Demo: Interpretable Question Answering over Knowledge Bases

We present QUINT, a live system for question answering over knowledge bases. QUINT automatically learns role-aligned utterance-query templates from user questions paired with their answers. When QUINT answers a question, it visualizes the complete derivation sequence from the natural language utterance to the final answer. The derivation provides an explanation of how the syntactic structure of the question was used to derive the structure of a SPARQL query, and how the phrases in the question were used to instantiate different parts of the query. When an answer seems unsatisfactory, the derivation provides valuable insights towards reformulating the question.

Publications