Tutorial at VLDB'21, August 2021
General-purpose knowledge bases (KBs) are a cornerstone of the Semantic Web. Pragmatically constructed from available web sources, these KBs are far from complete, which poses a set of challenges in curation as well as consumption.
In this tutorial we present how knowledge about completeness, recall and negation in KBs can be expressed, extracted, and inferred. We proceed in 5 parts: (i) We introduce the logical foundations of knowledge representation and querying under partial closed-world semantics. (ii) We show how information about recall can be identified inside KBs and in text, and (iii) how it can be estimated via statistical patterns. (iv) We show how interesting negative statements can be identified, and (v) how recall can be targeted in a comparative notion.
1. Introduction (10 minutes): We outline the gaps in existing web-scale KBs , and motivate the importance for capturing information about completeness, recall and salient negations in KBs with several application use cases.
2. Logical foundations (20 minutes): We outline the logical framework in which KBs operate, the partial-closed world assumption (PCWA) [4,6], the implications this framework has for query answering , as well as the formal semantics of completeness assertions, and how it can be practically represented in RDF .
3. Cardinalities from KBs and text as ground truth (45 min): We highlight the challenges in obtaining human ground truth, and the role that relation cardinality information plays in recall assessment. In particular, we show how existing cardinality information inside KBs can be identiﬁed and linked to predicates for which completeness shall be estimated , as well as how this information can be identiﬁed and extracted from natural language documents .
4. Predictive recall assessment (45 min): We present three lines of approaches: (i) Supervised machine learning to identify complete or incomplete regions of KBs , (ii) unsupervised statistical techniques like species sampling techniques from ecology [21,12], density-based estimators  or statistical invariants about number distributions , (iii) linguistic theories about human conversations, which tell in which contexts information is likely complete, and in which not .
5. Identifying salient negations (30 min): We show why explicit negations are needed in open-world settings, and how they can be automatically mined by locally inferring closed-world topics from reference peer entities . We contrast this approach with text extraction based on search engine query logs or Wikipedia text revisions , and outline open issues in terms of ontology modelling.
6. Relative recall (30 min): We ﬁnally relax stricter absolute notions of recall, and show how recall can be measured in a relative manner, especially via extrinsic use cases like question answering and entity summarization [16,9], by comparison with open information extraction or external reference resources [7,14], or by comparison with other comparable entities inside the KB .
- Simon Razniewski (primary contact) - Max Planck Institute for Informatics, simonrazniewski.com. Simon Razniewski is a senior researcher at the Max Planck Institute for Informatics in Saarbruecken, Germany, where he heads the Knowledge Base Construction and Quality research area. He has been a driver behind recent research around completeness, recall and negation in KBs, and has ample didactical experience from university teaching, and conference tutorials on commonsense knowledge (e.g., AAAI’21, WSDM’21).
- Hiba Arnaout - Max Planck Institute for Informatics, people.mpi-inf.mpg.de/~harnaout/. Hiba Arnaout is a PhD student at the Max Planck Insitute for Informatics, in Saarbruecken, Germany. Her primary academic interests include Knowledge Base quality and negation in Knowledge Bases. Hiba has authored an award-wining paper on interesting negative statements in Knowledge Bases, published at AKBC’20, and presented at ISWC’20, as signiﬁcant Web Semantic related work in a sister-conference.
- Shrestha Ghosh - Max Planck Institute for Informatics, people.mpi-inf.mpg.de/~ghoshs/. Shrestha Ghosh is a PhD student at the Max Planck Institute for Informatics in Saarbrueucken, Germany. Her primary research is on exploring set information in Knowledge Bases and text to improve recall on count queries. She has published her work in JWS’20, ESWC’20 and presented at the doctoral consortium track of ISWC’20.
- Fabian Suchanek - Institut Polytechnique de Paris, suchanek.name. Fabian Suchanek is a professor at Institut Polytechnique de Paris in France, and the creator of the YAGO knowledge base. He has authored more than 100 publications in the area of knowledge bases (with 12k citations in total), and several of these speciﬁcally concern completeness.
1. Arnaout, H., Razniewski, S., Weikum, G.: Enriching knowledge bases with interesting negative statements. In: AKBC (2020)
2. Balaraman, V., Razniewski, S., Nutt, W.: Recoin: relative completeness in Wikidata. In: Wiki workshop at WWW (2018)
3. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E., Mitchell, T.: Toward an architecture for never-ending language learning. In: AAAI (2010)
4. Darari, F., Nutt, W., Pirro, G., Razniewski, S.: Completeness statements about RDF data sources and their use for query answering. In: ISWC (2013)
5. Galarraga, L., Razniewski, S., Amarilli, A., Suchanek, F.M.: Predicting completeness in knowledge bases. In: WSDM (2017)
6. Galarraga, L.A., Teﬂioudi, C., Hose, K., Suchanek, F.: Amie: association rule mining under incomplete evidence in ontological knowledge bases. In: WWW (2013)
7. Gashteovski, K., Gemulla, R., Kotnis, B., Hertling, S., Meilicke, C.: On aligning OpenIE extractions with knowledge bases: A case study. In: Eval4NLP (2020)
8. Ghosh, S., Razniewski, S., Weikum, G.: Uncovering hidden semantics of set information in knowledge bases. JWS (2020)
9. Hopkinson, A., Gurdasani, A., Palfrey, D., Mittal, A.: Demand-weighted completeness prediction for a knowledge base. In: NAACL (2018)
10. Karagiannis, G., Trummer, I., et al.: Mining an “anti-knowledge base” from Wikipedia updates with applications to fact checking and beyond. VLDB (2019)
11. Lajus, J., Suchanek, F.M.: Are all people married? determining obligatory attributes in knowledge bases. In: WWW (2018)
12. Luggen, M., Difallah, D., Sarasua, C., Demartini, G., Cudr´e-Mauroux, P.: Non-parametric class completeness estimators for collaborative knowledge graphs - the case of Wikidata. In: ISWC (2019)
13. Mirza, P., Razniewski, S., Darari, F., Weikum, G.: Enriching knowledge bases with counting quantiﬁers. In: ISWC (2018)
14. Mishra, B.D., Tandon, N., Clark, P.: Domain-targeted, high precision knowledge extraction. TACL (2017)
15. Paulheim, H.: Knowledge graph reﬁnement: A survey of approaches and evaluation methods. SWJ (2017)
16. Razniewski, S., Das, P.: Structured knowledge: Have we made progress? an extrinsic study of KB coverage over 19 years. In: CIKM (2020)
17. Razniewski, S., Jain, N., Mirza, P., Weikum, G.: Coverage of information extraction from sentences and paragraphs. In: EMNLP (2019)
18. Reiter, R.: On closed world data bases. In: Readings in artiﬁcial intelligence (1981)
19. Ringler, D., Paulheim, H.: One knowledge graph to rule them all? analyzing the diﬀerences between DBpedia, YAGO, Wikidata & co. In: KI (2017)
20. Soulet, A., Giacometti, A., Markhoﬀ, B., Suchanek, F.M.: Representativeness of knowledge bases with the generalized Benford’s law. In: ISWC (2018)
21. Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: ICDE (2013)
22. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. SWJ (2016)