b'@online{Hu_2411.04920,'b'\nTITLE = {{GPTKB}: Building Very Large Knowledge Bases from Language Models},\nAUTHOR = {Hu, Yujia and Ghosh, Shrestha and Nguyen, Tuan-Phong and Razniewski, Simon},\nLANGUAGE = {enn},\nURL = {https://arxiv.org/abs/2411.04920},\nEPRINT = {2411.04920},\nEPRINTTYPE = {arXiv},\nYEAR = {2024},\nMARGINALMARK = {$\\bullet$},\nABSTRACT = {General-domain knowledge bases (KB), in particular the "big three" --<br>Wikidata, Yago and DBpedia -- are the backbone of many intelligent<br>applications. While these three have seen steady development, comprehensive KB<br>construction at large has seen few fresh attempts. In this work, we propose to<br>build a large general-domain KB entirely from a large language model (LLM). We<br>demonstrate the feasibility of large-scale KB construction from LLMs, while<br>highlighting specific challenges arising around entity recognition, entity and<br>property canonicalization, and taxonomy construction. As a prototype, we use<br>GPT-4o-mini to construct GPTKB, which contains 105 million triples for more<br>than 2.9 million entities, at a cost 100x less than previous KBC projects. Our<br>work is a landmark for two fields: For NLP, for the first time, it provides<br>\\textit{constructive} insights into the knowledge (or beliefs) of LLMs. For the<br>Semantic Web, it shows novel ways forward for the long-standing challenge of<br>general-domain KB construction. GPTKB is accessible at http://gptkb.org.<br>},\n}\n'