Publications - Last Year

2024

Conference paper

P. Christmann, S. Vakulenko, I. T. Sorodoc, B. Byrne, and A. de Gispert

“Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision,” in Findings of the EMNLP 2024, Miami, FL, USA, 2024.

@inproceedings{Christmann_EMNLP24,
TITLE = {Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision},
AUTHOR = {Christmann, Philipp and Vakulenko, Svitlana and Sorodoc, Ionut Teodor and Byrne, Bill and de Gispert, Adri{\`a}},
LANGUAGE = {eng},
ISBN = {979-8-89176-168-1},
DOI = {10.18653/v1/2024.findings-emnlp.835},
PUBLISHER = {Association for Computational Linguistics},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Findings of the EMNLP 2024},
EDITOR = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
PAGES = {14301--14310},
ADDRESS = {Miami, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Vakulenko, Svitlana
%A Sorodoc, Ionut Teodor
%A Byrne, Bill
%A de Gispert, Adri&#224;
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5AA-2
%R 10.18653/v1/2024.findings-emnlp.835
%D 2024
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2024-11-12 - 2024-11-16
%C Miami, FL, USA
%B Findings of the EMNLP 2024
%E Al-Onaizan , Yaser ; Bansal, Mohit ; Chen, Yun-Nung
%P 14301 - 14310
%I Association for Computational Linguistics
%@ 979-8-89176-168-1

Paper

P. Christmann and G. Weikum

“RAG-based Question Answering over Heterogeneous Data and Text,” 2024. [Online]. Available: https://arxiv.org/abs/2412.07420.

Abstract

This article presents the QUASAR system for question answering over
unstructured text, structured tables, and knowledge graphs, with unified
treatment of all sources. The system adopts a RAG-based architecture, with a
pipeline of evidence retrieval followed by answer generation, with the latter
powered by a moderate-sized language model. Additionally and uniquely, QUASAR
has components for question understanding, to derive crisper input for evidence
retrieval, and for re-ranking and filtering the retrieved evidence before
feeding the most informative pieces into the answer generation. Experiments
with three different benchmarks demonstrate the high answering quality of our
approach, being on par with or better than large GPT models, while keeping the
computational cost and energy consumption orders of magnitude lower.

BibTeX

@online{Christmann_2412.07420,
TITLE = {{RAG}-based Question Answering over Heterogeneous Data and Text},
AUTHOR = {Christmann, Philipp and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2412.07420},
EPRINT = {2412.07420},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {This article presents the QUASAR system for question answering over<br>unstructured text, structured tables, and knowledge graphs, with unified<br>treatment of all sources. The system adopts a RAG-based architecture, with a<br>pipeline of evidence retrieval followed by answer generation, with the latter<br>powered by a moderate-sized language model. Additionally and uniquely, QUASAR<br>has components for question understanding, to derive crisper input for evidence<br>retrieval, and for re-ranking and filtering the retrieved evidence before<br>feeding the most informative pieces into the answer generation. Experiments<br>with three different benchmarks demonstrate the high answering quality of our<br>approach, being on par with or better than large GPT models, while keeping the<br>computational cost and energy consumption orders of magnitude lower.<br>},
}

Endnote

%0 Report
%A Christmann, Philipp
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T RAG-based Question Answering over Heterogeneous Data and Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-546F-4
%U https://arxiv.org/abs/2412.07420
%D 2024
%X   This article presents the QUASAR system for question answering over<br>unstructured text, structured tables, and knowledge graphs, with unified<br>treatment of all sources. The system adopts a RAG-based architecture, with a<br>pipeline of evidence retrieval followed by answer generation, with the latter<br>powered by a moderate-sized language model. Additionally and uniquely, QUASAR<br>has components for question understanding, to derive crisper input for evidence<br>retrieval, and for re-ranking and filtering the retrieved evidence before<br>feeding the most informative pieces into the answer generation. Experiments<br>with three different benchmarks demonstrate the high answering quality of our<br>approach, being on par with or better than large GPT models, while keeping the<br>computational cost and energy consumption orders of magnitude lower.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR

Conference paper

P. Christmann, R. Saha Roy, and G. Weikum

“CompMix: A Benchmark for Heterogeneous Question Answering,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

@inproceedings{ChristmannWWW24,
TITLE = {{CompMix}: A Benchmark for Heterogeneous Question Answering},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.3651444},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady},
PAGES = {1091--1094},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CompMix: A Benchmark for Heterogeneous Question Answering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-579D-1
%R 10.1145/3589335.3651444
%D 2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady
%P 1091 - 1094
%I ACM
%@ 979-8-4007-0172-6

Thesis

S. Ghosh

“Count Information: Retrieving and Estimating Cardinality of Entity Sets from the Web,” Universität des Saarlandes, Saarbrücken, 2024.

@phdthesis{ThesisPhDGhosh24,
TITLE = {Count Information: Retrieving and Estimating Cardinality of Entity Sets from the Web},
AUTHOR = {Ghosh, Shrestha},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-430580},
DOI = {10.22028/D291-43058},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
}

Endnote

%0 Thesis
%A Ghosh, Shrestha
%Y Razniewski, Simon
%A referee: Weikum, Gerhard
%A referee: Hose, Katja
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Count Information: Retrieving and Estimating Cardinality of Entity Sets from the Web : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-6153-3
%R 10.22028/D291-43058
%U urn:nbn:de:bsz:291--ds-430580
%F OTHER: hdl:20.500.11880/38841
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2024
%P XI, 128 p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/38841

Conference paper

S. Ghosh, S. Razniewski, D. Graux, and G. Weikum

“CardiO: Predicting Cardinality from Online Sources,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

@inproceedings{Ghosh_WWW2024,
TITLE = {{CardiO}: Predicting Cardinality from Online Sources},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Graux, Damien and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.365147},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady},
PAGES = {573--576},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Graux, Damien
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CardiO: Predicting Cardinality from Online Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-13B6-E
%R 10.1145/3589335.365147
%D 2024
%8 13.05.2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady
%P 573 - 576
%I ACM
%@ 979-8-4007-0172-6

Paper

Y. Hu, S. Ghosh, T.-P. Nguyen, and S. Razniewski

“GPTKB: Building Very Large Knowledge Bases from Language Models,” 2024. [Online]. Available: https://arxiv.org/abs/2411.04920.

Abstract

General-domain knowledge bases (KB), in particular the "big three" --
Wikidata, Yago and DBpedia -- are the backbone of many intelligent
applications. While these three have seen steady development, comprehensive KB
construction at large has seen few fresh attempts. In this work, we propose to
build a large general-domain KB entirely from a large language model (LLM). We
demonstrate the feasibility of large-scale KB construction from LLMs, while
highlighting specific challenges arising around entity recognition, entity and
property canonicalization, and taxonomy construction. As a prototype, we use
GPT-4o-mini to construct GPTKB, which contains 105 million triples for more
than 2.9 million entities, at a cost 100x less than previous KBC projects. Our
work is a landmark for two fields: For NLP, for the first time, it provides
\textit{constructive} insights into the knowledge (or beliefs) of LLMs. For the
Semantic Web, it shows novel ways forward for the long-standing challenge of
general-domain KB construction. GPTKB is accessible at gptkb.org.

BibTeX

@online{Hu_2411.04920,
TITLE = {{GPTKB}: Building Very Large Knowledge Bases from Language Models},
AUTHOR = {Hu, Yujia and Ghosh, Shrestha and Nguyen, Tuan-Phong and Razniewski, Simon},
LANGUAGE = {enn},
URL = {https://arxiv.org/abs/2411.04920},
EPRINT = {2411.04920},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {General-domain knowledge bases (KB), in particular the "big three" --<br>Wikidata, Yago and DBpedia -- are the backbone of many intelligent<br>applications. While these three have seen steady development, comprehensive KB<br>construction at large has seen few fresh attempts. In this work, we propose to<br>build a large general-domain KB entirely from a large language model (LLM). We<br>demonstrate the feasibility of large-scale KB construction from LLMs, while<br>highlighting specific challenges arising around entity recognition, entity and<br>property canonicalization, and taxonomy construction. As a prototype, we use<br>GPT-4o-mini to construct GPTKB, which contains 105 million triples for more<br>than 2.9 million entities, at a cost 100x less than previous KBC projects. Our<br>work is a landmark for two fields: For NLP, for the first time, it provides<br>\textit{constructive} insights into the knowledge (or beliefs) of LLMs. For the<br>Semantic Web, it shows novel ways forward for the long-standing challenge of<br>general-domain KB construction. GPTKB is accessible at http://gptkb.org.<br>},
}

Endnote

%0 Report
%A Hu, Yujia
%A Ghosh, Shrestha
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T GPTKB: Building Very Large Knowledge Bases from Language Models : 
%G enn
%U http://hdl.handle.net/21.11116/0000-0010-133A-8
%U https://arxiv.org/abs/2411.04920
%D 2024
%X   General-domain knowledge bases (KB), in particular the "big three" --<br>Wikidata, Yago and DBpedia -- are the backbone of many intelligent<br>applications. While these three have seen steady development, comprehensive KB<br>construction at large has seen few fresh attempts. In this work, we propose to<br>build a large general-domain KB entirely from a large language model (LLM). We<br>demonstrate the feasibility of large-scale KB construction from LLMs, while<br>highlighting specific challenges arising around entity recognition, entity and<br>property canonicalization, and taxonomy construction. As a prototype, we use<br>GPT-4o-mini to construct GPTKB, which contains 105 million triples for more<br>than 2.9 million entities, at a cost 100x less than previous KBC projects. Our<br>work is a landmark for two fields: For NLP, for the first time, it provides<br>\textit{constructive} insights into the knowledge (or beliefs) of LLMs. For the<br>Semantic Web, it shows novel ways forward for the long-standing challenge of<br>general-domain KB construction. GPTKB is accessible at http://gptkb.org.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB

Conference paper

Z. Jia, P. Christmann, and G. Weikum

“Faithful Temporal Question Answering over Heterogeneous Sources,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

@inproceedings{Jia_WWW2024,
TITLE = {Faithful Temporal Question Answering over Heterogeneous Sources},
AUTHOR = {Jia, Zhen and Christmann, Philipp and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589334.3645547},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-We and Kumar, Ravi and Lauw, Hady W.},
PAGES = {2052--2063},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Jia, Zhen
%A Christmann, Philipp
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Faithful Temporal Question Answering over Heterogeneous Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-7D1D-7
%R 10.1145/3589334.3645547
%D 2024
%8 13.05.2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-We; Kumar, Ravi; Lauw, Hady W.
%P 2052 - 2063
%I ACM
%@ 979-8-4007-0172-6

Conference paper

Z. Jia, P. Christmann, and G. Weikum

“TIQ: A Benchmark for Temporal Question Answering with Implicit Time Constraints,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

@inproceedings{Jia_WWW24,
TITLE = {{TIQ}: {A} Benchmark for Temporal Question Answering with Implicit Time Constraints},
AUTHOR = {Jia, Zhen and Christmann, Philipp and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.3651895},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady W.},
PAGES = {1394--1399},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Jia, Zhen
%A Christmann, Philipp
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TIQ: A Benchmark for Temporal Question Answering with Implicit Time Constraints : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-4EDB-4
%R 10.1145/3589335.3651895
%D 2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady W.
%P 1394 - 1399
%I ACM
%@ 979-8-4007-0172-6

Conference paper

M. Kaiser, P. Ernst, and G. Szarvas

“Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems,” in Findings of EMNLP 2024, Miami, FL, USA, 2024.

@inproceedings{Kaiser_EMNLP24,
TITLE = {Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems},
AUTHOR = {Kaiser, Magdalena and Ernst, Patrick and Szarvas, Gy{\"o}rgy},
LANGUAGE = {eng},
ISBN = {979-8-89176-168-1},
DOI = {10.18653/v1/2024.findings-emnlp.362},
PUBLISHER = {Association for Computational Linguistics},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Findings of EMNLP 2024},
EDITOR = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
PAGES = {6236--6246},
ADDRESS = {Miami, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Ernst, Patrick
%A Szarvas, Gy&#246;rgy
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5B9-1
%R 10.18653/v1/2024.findings-emnlp.362
%D 2024
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2024-11-12 - 2024-11-16
%C Miami, FL, USA
%B Findings of EMNLP 2024
%E Al-Onaizan , Yaser ; Bansal, Mohit ; Chen, Yun-Nung
%P 6236 - 6246
%I Association for Computational Linguistics
%@ 979-8-89176-168-1

Conference paper

M. Kaiser, R. Saha Roy, and G. Weikum

“Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation,” in WSDM ’24, Merida, Mexico, 2024.

@inproceedings{KaiserWSDM24,
TITLE = {Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1145/3616855.3635822},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {WSDM '24},
EDITOR = {Ang{\'e}lica, Luz and Lattanzi, Silvio and Mu{\~n}oz Medina, Andr{\'e}s and Akoglu, Leman and Gionis, Aristides and Vassilvitskii, Sergei},
PAGES = {322--331},
ADDRESS = {Merida, Mexico},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Robust Training for Conversational Question Answering Models with
  Reinforced Reformulation Generation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-E9D1-0
%R 10.1145/3616855.3635822
%D 2024
%8 04.03.2024
%B 17th ACM International Conference on Web Search and Data Mining
%Z date of event: 2024-03-04 - 2024-03-08
%C Merida, Mexico
%B WSDM '24
%E Ang&#233;lica, Luz; Lattanzi, Silvio; Mu&#241;oz Medina, Andr&#233;s; Akoglu, Leman; Gionis, Aristides; Vassilvitskii, Sergei
%P 322 - 331
%I ACM

Conference paper

J.-C. Kalo, T.-P. Nguyen, S. Razniewski, and B. Zhang

“Preface: LM-KBC Challenge 2024,” in Joint proceedings of the KBC-LM workshop and the LM-KBC challenge 2024, Baltimore, MD, USA, 2024, vol. 3853.

@inproceedings{Kalo_Preface24,
TITLE = {Preface: {LM}-{KBC} Challenge 2024},
AUTHOR = {Kalo, Jan-Christoph and Nguyen, Tuan-Phong and Razniewski, Simon and Zhang, Bohui},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {https://ceur-ws.org/Vol-3853/paper0.pdf},
PUBLISHER = {CEUR.ws},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Joint proceedings of the KBC-LM workshop and the LM-KBC challenge 2024},
EDITOR = {Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Pan, Jeff Z. and Nguyen, Tuan-Phong and Zhang, Bohui},
VOLUME = {3853},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3853},
ADDRESS = {Baltimore, MD, USA},
}

Endnote

%0 Conference Proceedings
%A Kalo, Jan-Christoph
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Zhang, Bohui
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Preface: LM-KBC Challenge 2024 : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-6E9D-3
%U https://ceur-ws.org/Vol-3853/paper0.pdf
%D 2024
%B 2nd Workshop on Knowledge Base Construction from Pre-Trained Language Models
%Z date of event: 2024-11-12 - 2024-11-12
%C Baltimore, MD, USA
%B Joint proceedings of the KBC-LM workshop and the LM-KBC challenge 2024
%E Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Pan, Jeff Z.; Nguyen, Tuan-Phong; Zhang, Bohui
%V 3853
%I CEUR.ws
%@ false
%B CEUR Workshop Proceedings
%N 3853

Conference paper

L. Lange, M. Müller, G. H. Torbati, D. Milchevski, P. Grau, S. C. Pujari, and A. Friedrich

“AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports,” in The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 2024.

@inproceedings{Lange_LREC24,
TITLE = {{AnnoCTR}: {A} Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports},
AUTHOR = {Lange, Lukas and M{\"u}ller, Marc and Torbati, Ghazaleh Haratinezhad and Milchevski, Dragan and Grau, Patrick and Pujari, Subhash Chandra and Friedrich, Annemarie},
LANGUAGE = {eng},
ISBN = {978-2-493814-10-4},
URL = {https://aclanthology.org/2024.lrec-main.103/},
PUBLISHER = {ELRA Language Resources Association},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
EDITOR = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},
PAGES = {1147--1160},
ADDRESS = {Torino, Italy},
}

Endnote

%0 Conference Proceedings
%A Lange, Lukas
%A M&#252;ller, Marc
%A Torbati, Ghazaleh Haratinezhad
%A Milchevski, Dragan
%A Grau, Patrick
%A Pujari, Subhash Chandra
%A Friedrich, Annemarie
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5B3-7
%U https://aclanthology.org/2024.lrec-main.103/
%D 2024
%B Joint International Conference on Computational Linguistics,
Language Resources and Evaluation
%Z date of event: 2024-05-20 - 2024-05-25
%C Torino, Italy
%B The 2024 Joint International Conference on Computational Linguistics,
Language Resources and Evaluation
%E Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen
%P 1147 - 1160
%I ELRA Language Resources Association
%@ 978-2-493814-10-4

Conference paper

T.-P. Nguyen, S. Razniewski, and G. Weikum

“Cultural Commonsense Knowledge for Intercultural Dialogues,” in CIKM ’24, 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 2024.

@inproceedings{Nguyen_CIKM24,
TITLE = {Cultural Commonsense Knowledge for Intercultural Dialogues},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0436-9},
DOI = {10.1145/3627673.3679768},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {CIKM '24, 33rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Serra, Edoardo and Spezzano, Francesca},
PAGES = {1774--1784},
ADDRESS = {Boise, ID, USA},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cultural Commonsense Knowledge for Intercultural Dialogues : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-7348-0
%R 10.1145/3627673.3679768
%D 2024
%B 33rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2024-10-21 - 2024-10-25
%C Boise, ID, USA
%B CIKM '24
%E Serra, Edoardo; Spezzano, Francesca
%P 1774 - 1784
%I ACM
%@ 979-8-4007-0436-9

Conference paper

K. Pal, H. Arnaout, S. Razniewski, and G. Weikum

“FASETS: Discovering Faceted Sets of Entities,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

@inproceedings{Pal_WWW24,
TITLE = {{FASETS}: {D}iscovering Faceted Sets of Entities},
AUTHOR = {Pal, Koninika and Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.3651924},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady W.},
PAGES = {1521--1529},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Pal, Koninika
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T FASETS: Discovering Faceted Sets of Entities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-4ED6-9
%R 10.1145/3589335.3651924
%D 2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady W.
%P 1521 - 1529
%I ACM
%@ 979-8-4007-0172-6

Article

S. Pramanik, J. Alabi, R. Saha Roy, and G. Weikum

“UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text,” Journal of Web Semantics, vol. 83, 2024.

Abstract

Question answering over knowledge graphs and other RDF data has been greatly
advanced, with a number of good systems providing crisp answers for natural
language questions or telegraphic queries. Some of these systems incorporate
textual sources as additional evidence for the answering process, but cannot
compute answers that are present in text alone. Conversely, systems from the IR
and NLP communities have addressed QA over text, but barely utilize semantic
data and knowledge. This paper presents the first QA system that can seamlessly
operate over RDF datasets and text corpora, or both together, in a unified
framework. Our method, called UNIQORN, builds a context graph on the fly, by
retrieving question-relevant triples from the RDF data and/or the text corpus,
where the latter case is handled by automatic information extraction. The
resulting graph is typically rich but highly noisy. UNIQORN copes with this
input by advanced graph algorithms for Group Steiner Trees, that identify the
best answer candidates in the context graph. Experimental results on several
benchmarks of complex questions with multiple entities and relations, show that
UNIQORN, an unsupervised method with only five parameters, produces results
comparable to the state-of-the-art on KGs, text corpora, and heterogeneous
sources. The graph-based methodology provides user-interpretable evidence for
the complete answering process.

BibTeX

@article{Pramanik24c,
TITLE = {{UNIQORN}: {U}nified Question Answering over {RDF} Knowledge Graphs and Natural Language Text},
AUTHOR = {Pramanik, Soumajit and Alabi, Jesujoba and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1873-7749},
DOI = {10.1016/j.websem.2024.100833},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
ABSTRACT = {Question answering over knowledge graphs and other RDF data has been greatly<br>advanced, with a number of good systems providing crisp answers for natural<br>language questions or telegraphic queries. Some of these systems incorporate<br>textual sources as additional evidence for the answering process, but cannot<br>compute answers that are present in text alone. Conversely, systems from the IR<br>and NLP communities have addressed QA over text, but barely utilize semantic<br>data and knowledge. This paper presents the first QA system that can seamlessly<br>operate over RDF datasets and text corpora, or both together, in a unified<br>framework. Our method, called UNIQORN, builds a context graph on the fly, by<br>retrieving question-relevant triples from the RDF data and/or the text corpus,<br>where the latter case is handled by automatic information extraction. The<br>resulting graph is typically rich but highly noisy. UNIQORN copes with this<br>input by advanced graph algorithms for Group Steiner Trees, that identify the<br>best answer candidates in the context graph. Experimental results on several<br>benchmarks of complex questions with multiple entities and relations, show that<br>UNIQORN, an unsupervised method with only five parameters, produces results<br>comparable to the state-of-the-art on KGs, text corpora, and heterogeneous<br>sources. The graph-based methodology provides user-interpretable evidence for<br>the complete answering process.<br>},
JOURNAL = {Journal of Web Semantics},
VOLUME = {83},
EID = {100833},
}

Endnote

%0 Journal Article
%A Pramanik, Soumajit
%A Alabi, Jesujoba
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6365-6
%R 10.1016/j.websem.2024.100833
%7 2024-09-10
%D 2024
%X   Question answering over knowledge graphs and other RDF data has been greatly<br>advanced, with a number of good systems providing crisp answers for natural<br>language questions or telegraphic queries. Some of these systems incorporate<br>textual sources as additional evidence for the answering process, but cannot<br>compute answers that are present in text alone. Conversely, systems from the IR<br>and NLP communities have addressed QA over text, but barely utilize semantic<br>data and knowledge. This paper presents the first QA system that can seamlessly<br>operate over RDF datasets and text corpora, or both together, in a unified<br>framework. Our method, called UNIQORN, builds a context graph on the fly, by<br>retrieving question-relevant triples from the RDF data and/or the text corpus,<br>where the latter case is handled by automatic information extraction. The<br>resulting graph is typically rich but highly noisy. UNIQORN copes with this<br>input by advanced graph algorithms for Group Steiner Trees, that identify the<br>best answer candidates in the context graph. Experimental results on several<br>benchmarks of complex questions with multiple entities and relations, show that<br>UNIQORN, an unsupervised method with only five parameters, produces results<br>comparable to the state-of-the-art on KGs, text corpora, and heterogeneous<br>sources. The graph-based methodology provides user-interpretable evidence for<br>the complete answering process.<br>
%J Journal of Web Semantics
%V 83
%Z sequence number: 100833
%I Elsevier
%C Amsterdam
%@ false

Article

S. Razniewski, H. Arnaout, S. Ghosh, and F. Suchanek

“Completeness, Recall, and Negation in Open-world Knowledge Bases: A Survey,” ACM Computing Surveys, vol. 56, no. 6, 2024.

@article{Razniewski24,
TITLE = {Completeness, Recall, and Negation in Open-world Knowledge Bases: {A} Survey},
AUTHOR = {Razniewski, Simon and Arnaout, Hiba and Ghosh, Shrestha and Suchanek, Fabian},
LANGUAGE = {eng},
ISSN = {0360-0300; 1557-7341},
DOI = {10.1145/3639563},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
JOURNAL = {ACM Computing Surveys},
VOLUME = {56},
NUMBER = {6},
PAGES = {1--42},
EID = {150},
}

Endnote

%0 Journal Article
%A Razniewski, Simon
%A Arnaout, Hiba
%A Ghosh, Shrestha
%A Suchanek, Fabian
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Completeness, Recall, and Negation in Open-world Knowledge Bases: A Survey : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-76FD-1
%R 10.1145/3639563
%D 2024
%J ACM Computing Surveys
%O ACM Comput. Surv. Computing surveys CSUR
%V 56
%N 6
%& 1
%P 1 - 42
%Z sequence number: 150
%I ACM
%C New York, NY
%@ false

Proceedings

S. Razniewski, J.-C. Kalo, S. Singhania, J. Z. Pan, T.-P. Nguyen, and B. Zhang

Eds., Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge 2024. CEUR-WS, 2024.

@proceedings{RazniewskiKBC24,
TITLE = {Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge 2024 (KBC-LM-LM-KBC 2024)},
EDITOR = {Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Pan, Jeff Z. and Nguyen, Tuan-Phong and Zhang, Bohui},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-3853-0},
PUBLISHER = {CEUR-WS},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3853},
ADDRESS = {Baltimore, MD, USA},
}

Endnote

%0 Conference Proceedings
%E Razniewski, Simon
%E Kalo, Jan-Christoph
%E Singhania, Sneha
%E Pan, Jeff Z.
%E Nguyen, Tuan-Phong
%E Zhang, Bohui
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge 2024 : Joint proceedings of the 2nd workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM 2024) and the 3rd challenge on Language Models for Knowledge Base Construction (LM-KBC 2024)
co-located with the 23nd International Semantic Web Conference (ISWC 2024)
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-6E90-0
%U urn:nbn:de:0074-3853-0
%I CEUR-WS
%D 2024
%B 2nd Workshop on Knowledge Base Construction from Pre-Trained Language Models
%Z date of event: 2024-11-12 - 2024-11-12
%D 2024
%C Baltimore, MD, USA
%S CEUR Workshop Proceedings
%V 3853
%@ false

Conference paper

T. P. Schrader, L. Lange, S. Razniewski, and A. Friedrich

“QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Miami, FL, USA, 2024.

@inproceedings{Schrader_EMNLP24,
TITLE = {{QUITE}: {Q}uantifying Uncertainty in Natural Language Text in {B}ayesian Reasoning Scenarios},
AUTHOR = {Schrader, Timo Pierre and Lange, Lukas and Razniewski, Simon and Friedrich, Annemarie},
LANGUAGE = {eng},
ISBN = {979-8-89176-164-3},
URL = {https://aclanthology.org/2024.emnlp-main.153},
PUBLISHER = {Association for Computational Linguistics},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)},
EDITOR = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
PAGES = {2634--2652},
ADDRESS = {Miami, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Schrader, Timo Pierre
%A Lange, Lukas
%A Razniewski, Simon
%A Friedrich, Annemarie
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-2E26-1
%U https://aclanthology.org/2024.emnlp-main.153
%D 2024
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2024-11-12 - 2024-11-16
%C Miami, FL, USA
%B Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
%E Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung
%P 2634 - 2652
%I Association for Computational Linguistics
%@ 979-8-89176-164-3
%U https://aclanthology.org/2024.emnlp-main.153.pdf

Paper

S. Singhania, S. Cucerzan, A. Herring, and S. K. Jauhar

“Neon: News Entity-Interaction Extraction for Enhanced Question Answering,” 2024. [Online]. Available: https://arxiv.org/abs/2411.12449.

Abstract

Capturing fresh information in near real-time and using it to augment
existing large language models (LLMs) is essential to generate up-to-date,
grounded, and reliable output. This problem becomes particularly challenging
when LLMs are used for informational tasks in rapidly evolving fields, such as
Web search related to recent or unfolding events involving entities, where
generating temporally relevant responses requires access to up-to-the-hour news
sources. However, the information modeled by the parametric memory of LLMs is
often outdated, and Web results from prototypical retrieval systems may fail to
capture the latest relevant information and struggle to handle conflicting
reports in evolving news. To address this challenge, we present the NEON
framework, designed to extract emerging entity interactions -- such as events
or activities -- as described in news articles. NEON constructs an
entity-centric timestamped knowledge graph that captures such interactions,
thereby facilitating enhanced QA capabilities related to news events. Our
framework innovates by integrating open Information Extraction (openIE) style
tuples into LLMs to enable in-context retrieval-augmented generation. This
integration demonstrates substantial improvements in QA performance when
tackling temporal, entity-centric search queries. Through NEON, LLMs can
deliver more accurate, reliable, and up-to-date responses.

BibTeX

@online{Singhania2411.12449,
TITLE = {Neon: News Entity-Interaction Extraction for Enhanced Question Answering},
AUTHOR = {Singhania, Sneha and Cucerzan, Silviu and Herring, Allen and Jauhar, Sujay Kumar},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2411.12449},
EPRINT = {2411.12449},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Capturing fresh information in near real-time and using it to augment<br>existing large language models (LLMs) is essential to generate up-to-date,<br>grounded, and reliable output. This problem becomes particularly challenging<br>when LLMs are used for informational tasks in rapidly evolving fields, such as<br>Web search related to recent or unfolding events involving entities, where<br>generating temporally relevant responses requires access to up-to-the-hour news<br>sources. However, the information modeled by the parametric memory of LLMs is<br>often outdated, and Web results from prototypical retrieval systems may fail to<br>capture the latest relevant information and struggle to handle conflicting<br>reports in evolving news. To address this challenge, we present the NEON<br>framework, designed to extract emerging entity interactions -- such as events<br>or activities -- as described in news articles. NEON constructs an<br>entity-centric timestamped knowledge graph that captures such interactions,<br>thereby facilitating enhanced QA capabilities related to news events. Our<br>framework innovates by integrating open Information Extraction (openIE) style<br>tuples into LLMs to enable in-context retrieval-augmented generation. This<br>integration demonstrates substantial improvements in QA performance when<br>tackling temporal, entity-centric search queries. Through NEON, LLMs can<br>deliver more accurate, reliable, and up-to-date responses.<br>},
}

Endnote

%0 Report
%A Singhania, Sneha
%A Cucerzan, Silviu
%A Herring, Allen
%A Jauhar, Sujay Kumar
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Neon: News Entity-Interaction Extraction for Enhanced Question Answering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5F6-C
%U https://arxiv.org/abs/2411.12449
%D 2024
%X   Capturing fresh information in near real-time and using it to augment<br>existing large language models (LLMs) is essential to generate up-to-date,<br>grounded, and reliable output. This problem becomes particularly challenging<br>when LLMs are used for informational tasks in rapidly evolving fields, such as<br>Web search related to recent or unfolding events involving entities, where<br>generating temporally relevant responses requires access to up-to-the-hour news<br>sources. However, the information modeled by the parametric memory of LLMs is<br>often outdated, and Web results from prototypical retrieval systems may fail to<br>capture the latest relevant information and struggle to handle conflicting<br>reports in evolving news. To address this challenge, we present the NEON<br>framework, designed to extract emerging entity interactions -- such as events<br>or activities -- as described in news articles. NEON constructs an<br>entity-centric timestamped knowledge graph that captures such interactions,<br>thereby facilitating enhanced QA capabilities related to news events. Our<br>framework innovates by integrating open Information Extraction (openIE) style<br>tuples into LLMs to enable in-context retrieval-augmented generation. This<br>integration demonstrates substantial improvements in QA performance when<br>tackling temporal, entity-centric search queries. Through NEON, LLMs can<br>deliver more accurate, reliable, and up-to-date responses.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR

Paper

S. Singhania, S. Razniewski, and G. Weikum

“Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents,” 2024. .

Abstract

Methods for relation extraction from text mostly focus on high precision, at
the cost of limited recall. High recall is crucial, though, to populate long
lists of object entities that stand in a specific relation with a given
subject. Cues for relevant objects can be spread across many passages in long
texts. This poses the challenge of extracting long lists from long texts. We
present the L3X method which tackles the problem in two stages: (1)
recall-oriented generation using a large language model (LLM) with judicious
techniques for retrieval augmentation, and (2) precision-oriented
scrutinization to validate or prune candidates. Our L3X method outperforms
LLM-only generations by a substantial margin.

BibTeX

@online{Singhania_2405.02732,
TITLE = {Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents},
AUTHOR = {Singhania, Sneha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
EPRINT = {2405.02732},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Methods for relation extraction from text mostly focus on high precision, at<br>the cost of limited recall. High recall is crucial, though, to populate long<br>lists of object entities that stand in a specific relation with a given<br>subject. Cues for relevant objects can be spread across many passages in long<br>texts. This poses the challenge of extracting long lists from long texts. We<br>present the L3X method which tackles the problem in two stages: (1)<br>recall-oriented generation using a large language model (LLM) with judicious<br>techniques for retrieval augmentation, and (2) precision-oriented<br>scrutinization to validate or prune candidates. Our L3X method outperforms<br>LLM-only generations by a substantial margin.<br>},
}

Endnote

%0 Report
%A Singhania, Sneha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-75A0-8
%D 2024
%X   Methods for relation extraction from text mostly focus on high precision, at<br>the cost of limited recall. High recall is crucial, though, to populate long<br>lists of object entities that stand in a specific relation with a given<br>subject. Cues for relevant objects can be spread across many passages in long<br>texts. This poses the challenge of extracting long lists from long texts. We<br>present the L3X method which tackles the problem in two stages: (1)<br>recall-oriented generation using a large language model (LLM) with judicious<br>techniques for retrieval augmentation, and (2) precision-oriented<br>scrutinization to validate or prune candidates. Our L3X method outperforms<br>LLM-only generations by a substantial margin.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR

Conference paper

A. Tigunova, G. H. Torbati, A. Yates, and G. Weikum

“STAR: Sparse Text Approach for Recommendation,” in CIKM ’24, 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 2024.

@inproceedings{Tigunova_CIKM24,
TITLE = {{STAR}: {S}parse Text Approach for Recommendation},
AUTHOR = {Tigunova, Anna and Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0436-9},
DOI = {10.1145/3627673.3679999},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {CIKM '24, 33rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Serra, Edoardo and Spezzano, Francesca},
PAGES = {4086--4090},
ADDRESS = {Boise, ID, USA},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%A Torbati, Ghazaleh Haratinezhad
%A Yates, Andrew
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T STAR: Sparse Text Approach for Recommendation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-FD24-C
%R 10.1145/3627673.3679999
%D 2024
%B 33rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2024-10-21 - 2024-10-25
%C Boise, ID, USA
%B CIKM '24
%E Serra, Edoardo; Spezzano, Francesca
%P 4086 - 4090
%I ACM
%@ 979-8-4007-0436-9

Conference paper

G. H. Torbati, A. Tigunova, G. Weikum, and A. Yates

“Recommendations in Sparse-Data Low-Resource Settings by Constructing Concise User Profiles from Review Text,” in 3rd International Workshop on Industrial Recommendation Systems (at CIKM 2024) (IRS 2024), Boise, ID, USA, 2024.

@inproceedings{Torbati_IRS24,
TITLE = {Recommendations in Sparse-Data Low-Resource Settings by Constructing Concise User Profiles from Review Text},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Tigunova, Anna and Weikum, Gerhard and Yates, Andrew},
LANGUAGE = {eng},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {3rd International Workshop on Industrial Recommendation Systems (at CIKM 2024) (IRS 2024)},
ADDRESS = {Boise, ID, USA},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Tigunova, Anna
%A Weikum, Gerhard
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Recommendations in Sparse-Data Low-Resource Settings by Constructing Concise User Profiles from Review Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-0BD1-6
%D 2024
%8 21.10.2024
%B 3rd International Workshop on Industrial Recommendation Systems
%Z date of event: 2024-10-25 - 2024-10-25
%C Boise, ID, USA
%B 3rd International Workshop on Industrial Recommendation Systems (at CIKM 2024)
%I ACM

Conference paper

G. H. Torbati, A. Tigunova, and G. Weikum

“SIRUP: Search-based Book Recommendation Playground,” in WSDM ’24, 17th ACM International Conference on Web Search and Data Mining, Merida, Mexico, 2024.

@inproceedings{TorbatiWSDM24,
TITLE = {{SIRUP}: {S}earch-based Book Recommendation Playground},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Tigunova, Anna and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0371-3},
DOI = {10.1145/3616855.3635692},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {WSDM '24, 17th ACM International Conference on Web Search and Data Mining},
EDITOR = {Ang{\'e}lica Caudillo Mata, Luz and Lattanzi, Silvio and Mu{\~n}oz Medina, Andr{\'e}s and Akoglu, Leman and Gionis, Aristides and Vassilvitskii, Sergei},
PAGES = {1062--1065},
ADDRESS = {Merida, Mexico},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Tigunova, Anna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T SIRUP: Search-based Book Recommendation Playground : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-A663-7
%R 10.1145/3616855.3635692
%D 2024
%B 17th ACM International Conference on Web Search and Data Mining
%Z date of event: 2024-03-04 - 2024-03-08
%C Merida, Mexico
%B WSDM '24
%E Ang&#233;lica Caudillo Mata, Luz; Lattanzi, Silvio; Mu&#241;oz Medina, Andr&#233;s; Akoglu, Leman; Gionis, Aristides; Vassilvitskii, Sergei
%P 1062 - 1065
%I ACM
%@ 979-8-4007-0371-3

Conference paper

H. D. Tran, A. Yates, and G. Weikum

“Conversational Search with Tail Entities,” in Advances in Information Retrieval (ECIR 2024), Glasgow, UK, 2024.

@inproceedings{Tran_ECIR24,
TITLE = {Conversational Search with Tail Entities},
AUTHOR = {Tran, Hai Dang and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-031-56059-0},
DOI = {10.1007/978-3-031-56060-6_20},
PUBLISHER = {Springer},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2024)},
EDITOR = {Goharian, Nazli and Tonellotto, Nicola and He, Yulan and Lipani, Aldo and McDonald, Graham and Macdonald, Craig and Ounis, Iadh},
PAGES = {303--317},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {14609},
ADDRESS = {Glasgow, UK},
}

Endnote

%0 Conference Proceedings
%A Tran, Hai Dang
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Conversational Search with Tail Entities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-042C-C
%R 10.1007/978-3-031-56060-6_20
%D 2024
%B 46th European Conference on Information Retrieval
%Z date of event: 2024-03-24 - 2024-03-28
%C Glasgow, UK
%B Advances in Information Retrieval
%E Goharian, Nazli; Tonellotto, Nicola; He, Yulan; Lipani, Aldo; McDonald, Graham; Macdonald, Craig; Ounis, Iadh
%P 303 - 317
%I Springer
%@ 978-3-031-56059-0
%B Lecture Notes in Computer Science
%N 14609

Article

A. Varde, D. Karthikeyan, and W. Wang

“Facilitating COVID Recognition from X-Rays with Computer Vision Models and Transfer Learning,” Multimedia Tools and Applications, vol. 83, 2024.

@article{Varde23,
TITLE = {Facilitating {COVID} Recognition from {X}-Rays with Computer Vision Models and Transfer Learning},
AUTHOR = {Varde, Aparna and Karthikeyan, Divydharshini and Wang, Weitian},
LANGUAGE = {eng},
ISSN = {1380-7501},
DOI = {10.1007/s11042-023-15744-9},
PUBLISHER = {Springer Nature},
ADDRESS = {New York, NY},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
JOURNAL = {Multimedia Tools and Applications},
VOLUME = {83},
PAGES = {807--838},
}

Endnote

%0 Journal Article
%A Varde, Aparna
%A Karthikeyan, Divydharshini
%A Wang, Weitian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Facilitating COVID Recognition from X-Rays with Computer Vision Models and Transfer Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-578B-5
%R 10.1007/s11042-023-15744-9
%7 2023-05-26
%D 2024
%J Multimedia Tools and Applications
%V 83
%& 807
%P 807 - 838
%I Springer Nature
%C New York, NY
%@ false
%U https://rdcu.be/d7bKu