D5
Databases and Information Systems

Commonsense Knowledge

Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources.

Uncommonsense

Commonsense knowledge about everyday concepts is an important asset for AI applications, such as question answering and chatbots. Recently, we have seen an increasing interest in the construction of structured commonsense knowledge bases (CSKBs). An important part of human commonsense is about properties that do not apply to concepts, yet existing CSKBs only store positive statements. Moreover, since CSKBs operate under the open-world assumption, absent statements are considered to have unknown truth rather than being invalid. We present the UNCOMMONSENSE framework for materializing informative negative commonsense statements. Given a target concept, comparable concepts are identified in the CSKB, for which a local closed-world assumption is postulated. This way, positive statements about comparable concepts that are absent for the target concept become seeds for negative statement candidates. The large set of candidates is then scrutinized, pruned and ranked by informativeness. 

Demohttps://uncommonsense.mpi-inf.mpg.de/

Project page: https://www.mpi-inf.mpg.de/uncommonsense

Publication

  • Hiba Arnaout, Simon Razniewski, Gerhard Weikum, and Jeff Z. Pan, UnCommonSense: Informative Negative Knowledge about Everyday Concepts. CIKM'22 [PDF]

Ascent++

Ascent++, a successor of the previous Ascent method, is a pipeline for automatically collecting, extracting and consolidating commonsense knowledge (CSK) from any English text corpus. Ascent++ is capable of extracting facet-enriched assertions, overcoming the common limitations of the triple-based knowledge model in traditional knowledge bases (KBs). Ascent++ also captures composite concepts with subgroups and related aspects, supplying even more expressiveness to CSK assertions.

Ascent++ KB is a CSKB extracted from the C4 crawl using the Ascent++ pipeline. It consists of 2 million CSK assertions about 10K popular concepts. The CSKB comes with two variants: one with open predicates (e.g., "be", "have", "live in", etc.) and one with the established ConceptNet schema with 19 pre-specified predicates (e.g., AtLocation, CapableOf, HasProperty, etc.).

Website: https://ascentpp.mpi-inf.mpg.de/

Publications:

  • Tuan-Phong Nguyen, Simon Razniewski, Julien Romero, Gerhard Weikum, "Refined Commonsense Knowledge from Large-Scale Web Contents," in IEEE Transactions on Knowledge and Data Engineering, 2022, doi: 10.1109/TKDE.2022.3206505. [pdf]

Ascent

Ascent (Advanced Semantics for Commonsense Knowledge Extraction) is a pipeline for automatically collecting, extracting and consolidating commonsense knowledge (CSK) from the web. Ascent is capable of extracting facet-enriched assertions, overcoming the common limitations of the triple-based knowledge model in traditional knowledge bases (KBs). Ascent also captures composite concepts with subgroups and related aspects, supplying even more expressiveness to CSK assertions.

Further links:

Publications:

  • Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum. Advanced Semantics for Commonsense Knowledge Extraction. WWW 2021. [pdf]
  • Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum. Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering. ACL 2021 - System Demonstrations. [pdf]

Children's texts for commonsense

For compiling CSK based on text extraction, many concerns revolve around the issue of reporting bias, i.e., that frequency in text sources is not a good proxy for relevance or truth, especially for fundamental pieces of knowledge. This paper explores whether children's texts hold the key to commonsense knowledge extraction, based on the hypothesis that such content might make fewer assumptions on the reader's knowledge and therefore spell out commonsense more explicitly. An analysis with several corpora shows that children's texts indeed contain much more, and more typical commonsense assertions. Moreover, experiments show that this advantage can be leveraged in popular language-model-based commonsense knowledge extraction settings, where task-unspecific fine-tuning on small amounts of children texts already yields significant improvements. This provides a refreshing perspective different from the common trend of deriving progress from ever larger models and corpora.

Publication:

  • Do Children Texts Hold The Key To Commonsense Knowledge? Julien Romero and Simon Razniewski, EMNLP 2022 [pdf]

Quasimodo

Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.

Further links

Dice

Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge.  

Further links:

WebChild

WebChild is a large collection of commonsense knowledge, automatically extracted and disambiguated from Web contents. WebChild contains triples that connect nouns with adjectives via fine-grained relations like hasShape, hasTaste, evokesEmotion, etc. The arguments of these assertions, nouns and adjectives, are disambiguated by mapping them onto their proper WordNet senses.

Large-scale experiments demonstrate the high accuracy (more than 80 percent) and coverage (more than four million fine grained disambiguated assertions) of WebChild.

Further links:

  • Dedicated WebChild page
  • WebChild: Harvesting and Organizing Commonsense Knowledge from the Web, Niket Tandon, Gerard de Melo, Fabian Suchanek, Gerhard Weikum (2014) WSDM [PDF

HowToKB

HowToKB is the first large-scale knowledge base which represents how-to (task) knowledge. Each task is represented by a frame with attributes for parent task, preceding sub-task, following sub-task, required tools or other items, and linkage to visual illustrations.  

Further links:

  • Distilling Task Knowledge from How-to Communities, Cuong Xuan Chu, Niket Tandon, Gerhard Weikum, WWW 2017 [pdf]

Other resources