Publications

2025

Paper

A. Hogan, X. L. Dong, D. Vrandečić, and G. Weikum

“Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users’ Questions,” 2025. [Online]. Available: https://arxiv.org/abs/2501.06699.

mehr

Abstract

Much has been discussed about how Large Language Models, Knowledge Graphs and
Search Engines can be combined in a synergistic manner. A dimension largely
absent from current academic discourse is the user perspective. In particular,
there remain many open questions regarding how best to address the diverse
information needs of users, incorporating varying facets and levels of
difficulty. This paper introduces a taxonomy of user information needs, which
guides us to study the pros, cons and possible synergies of Large Language
Models, Knowledge Graphs and Search Engines. From this study, we derive a
roadmap for future research.

BibTeX

@online{Hogan_2501.06699,
TITLE = {Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions},
AUTHOR = {Hogan, Aidan and Dong, Xin Luna and Vrande{\v c}i{\'c}, Denny and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2501.06699},
EPRINT = {2501.06699},
EPRINTTYPE = {arXiv},
YEAR = {2025},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Much has been discussed about how Large Language Models, Knowledge Graphs and<br>Search Engines can be combined in a synergistic manner. A dimension largely<br>absent from current academic discourse is the user perspective. In particular,<br>there remain many open questions regarding how best to address the diverse<br>information needs of users, incorporating varying facets and levels of<br>difficulty. This paper introduces a taxonomy of user information needs, which<br>guides us to study the pros, cons and possible synergies of Large Language<br>Models, Knowledge Graphs and Search Engines. From this study, we derive a<br>roadmap for future research.<br>},
}

Endnote

%0 Report
%A Hogan, Aidan
%A Dong, Xin Luna
%A Vrande&#269;i&#263;, Denny
%A Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-7837-A
%U https://arxiv.org/abs/2501.06699
%D 2025
%X   Much has been discussed about how Large Language Models, Knowledge Graphs and<br>Search Engines can be combined in a synergistic manner. A dimension largely<br>absent from current academic discourse is the user perspective. In particular,<br>there remain many open questions regarding how best to address the diverse<br>information needs of users, incorporating varying facets and levels of<br>difficulty. This paper introduces a taxonomy of user information needs, which<br>guides us to study the pros, cons and possible synergies of Large Language<br>Models, Knowledge Graphs and Search Engines. From this study, we derive a<br>roadmap for future research.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR,Computer Science, Symbolic Computation, cs.SC

Conference paper

M. Kaiser and G. Weikum

“Preference-based Learning with Retrieval Augmented Generation for Conversational Question Answering,” in The ACM Web Conference 2025 (WWW 2025), Sydney, Australia.

mehr

BibTeX

@inproceedings{Kaiser_WWW25,
TITLE = {Preference-based Learning with Retrieval Augmented Generation for Conversational Question Answering},
AUTHOR = {Kaiser, Magdalena and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1145/3701716.3715544},
PUBLISHER = {ACM},
YEAR = {2025},
PUBLREMARK = {Accepted},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2025 (WWW 2025)},
ADDRESS = {Sydney, Australia},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Preference-based Learning with Retrieval Augmented Generation for
  Conversational Question Answering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-FBD2-6
%R 10.1145/3701716.3715544
%D 2025
%B ACM Web Conference
%Z date of event: 2025-04-28 - 2025-05-02
%C Sydney, Australia
%B The ACM Web Conference 2025
%I ACM

Conference paper

G. H. Torbati, A. Tigunova, G. Weikum, and A. Yates

“CUP: A Framework for Resource-Efficient Review-Based Recommenders,” in Advances in Information Retrieval (ECIR 2025), Lucca, Italy, 2025.

mehr

BibTeX

@inproceedings{Torbati_ECIR25,
TITLE = {{CUP}: {A} Framework for Resource-Efficient Review-Based Recommenders},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Tigunova, Anna and Weikum, Gerhard and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-3-031-88710-9},
DOI = {10.1007/978-3-031-88711-6_23},
PUBLISHER = {Springer},
YEAR = {2025},
MARGINALMARK = {$\bullet$},
DATE = {2025},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2025)},
EDITOR = {Hauff, Claudia and Macdonald, Craig and Jannach, Dietmar and Kazai, Gabriella and Nardini, Franco Maria and Pinelli, Fabio and Silvestri, Fabrizio and Tonellotto, Nicola},
PAGES = {360--375},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {15573},
ADDRESS = {Lucca, Italy},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Tigunova, Anna
%A Weikum, Gerhard
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T CUP: A Framework for Resource-Efficient Review-Based Recommenders : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0011-0BBF-B
%R 10.1007/978-3-031-88711-6_23
%D 2025
%B 47th European Conference on Information Retrieval
%Z date of event: 2025-04-06 - 2025-04-10
%C Lucca, Italy
%B Advances in Information Retrieval
%E Hauff, Claudia; Macdonald, Craig; Jannach, Dietmar; Kazai, Gabriella; Nardini, Franco Maria; Pinelli, Fabio; Silvestri, Fabrizio; Tonellotto, Nicola
%P 360 - 375
%I Springer
%@ 978-3-031-88710-9
%B Lecture Notes in Computer Science
%N 15573
%U https://rdcu.be/eh6FF

Conference paper

H. D. Tran, G. Weikum, and A. Yates

“Efficient and Effective Conversational Search with Tail Entity Selection,” in Advances in Information Retrieval (ECIR 2025), Lucca, Italy, 2025.

mehr

BibTeX

@inproceedings{Tran_ECIR25,
TITLE = {Efficient and Effective Conversational Search with Tail Entity Selection},
AUTHOR = {Tran, Hai Dang and Weikum, Gerhard and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-3-031-88713-0},
DOI = {978-3-031-88714-7_26},
PUBLISHER = {Springer},
YEAR = {2025},
MARGINALMARK = {$\bullet$},
DATE = {2025},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2025)},
EDITOR = {Hauff, Claudia and Macdonald, Craig and Jannach, Dietmar and Kazai, Gabriella and Nardini, Franco Maria and Pinelli, Fabio and Silvestri, Fabrizio and Tonellotto, Nicola},
PAGES = {275--283},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {15574},
ADDRESS = {Lucca, Italy},
}

Endnote

%0 Conference Proceedings
%A Tran, Hai Dang
%A Weikum, Gerhard
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficient and Effective Conversational Search with Tail Entity Selection : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0011-0BC4-4
%R 978-3-031-88714-7_26
%D 2025
%B 47th European Conference on Information Retrieval
%Z date of event: 2025-04-06 - 2025-04-10
%C Lucca, Italy
%B Advances in Information Retrieval
%E Hauff, Claudia; Macdonald, Craig; Jannach, Dietmar; Kazai, Gabriella; Nardini, Franco Maria; Pinelli, Fabio; Silvestri, Fabrizio; Tonellotto, Nicola
%P 275 - 283
%I Springer
%@ 978-3-031-88713-0
%B Lecture Notes in Computer Science
%N 15574
%U https://rdcu.be/eh6Lu

2024

Conference paper

P. Christmann, S. Vakulenko, I. T. Sorodoc, B. Byrne, and A. de Gispert

“Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision,” in Findings of the EMNLP 2024, Miami, FL, USA, 2024.

mehr

BibTeX

@inproceedings{Christmann_EMNLP24,
TITLE = {Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision},
AUTHOR = {Christmann, Philipp and Vakulenko, Svitlana and Sorodoc, Ionut Teodor and Byrne, Bill and de Gispert, Adri{\`a}},
LANGUAGE = {eng},
ISBN = {979-8-89176-168-1},
DOI = {10.18653/v1/2024.findings-emnlp.835},
PUBLISHER = {Association for Computational Linguistics},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Findings of the EMNLP 2024},
EDITOR = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
PAGES = {14301--14310},
ADDRESS = {Miami, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Vakulenko, Svitlana
%A Sorodoc, Ionut Teodor
%A Byrne, Bill
%A de Gispert, Adri&#224;
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5AA-2
%R 10.18653/v1/2024.findings-emnlp.835
%D 2024
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2024-11-12 - 2024-11-16
%C Miami, FL, USA
%B Findings of the EMNLP 2024
%E Al-Onaizan , Yaser ; Bansal, Mohit ; Chen, Yun-Nung
%P 14301 - 14310
%I Association for Computational Linguistics
%@ 979-8-89176-168-1

Paper

P. Christmann and G. Weikum

“RAG-based Question Answering over Heterogeneous Data and Text,” 2024. [Online]. Available: https://arxiv.org/abs/2412.07420.

mehr

Abstract

This article presents the QUASAR system for question answering over
unstructured text, structured tables, and knowledge graphs, with unified
treatment of all sources. The system adopts a RAG-based architecture, with a
pipeline of evidence retrieval followed by answer generation, with the latter
powered by a moderate-sized language model. Additionally and uniquely, QUASAR
has components for question understanding, to derive crisper input for evidence
retrieval, and for re-ranking and filtering the retrieved evidence before
feeding the most informative pieces into the answer generation. Experiments
with three different benchmarks demonstrate the high answering quality of our
approach, being on par with or better than large GPT models, while keeping the
computational cost and energy consumption orders of magnitude lower.

BibTeX

@online{Christmann_2412.07420,
TITLE = {{RAG}-based Question Answering over Heterogeneous Data and Text},
AUTHOR = {Christmann, Philipp and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2412.07420},
EPRINT = {2412.07420},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {This article presents the QUASAR system for question answering over<br>unstructured text, structured tables, and knowledge graphs, with unified<br>treatment of all sources. The system adopts a RAG-based architecture, with a<br>pipeline of evidence retrieval followed by answer generation, with the latter<br>powered by a moderate-sized language model. Additionally and uniquely, QUASAR<br>has components for question understanding, to derive crisper input for evidence<br>retrieval, and for re-ranking and filtering the retrieved evidence before<br>feeding the most informative pieces into the answer generation. Experiments<br>with three different benchmarks demonstrate the high answering quality of our<br>approach, being on par with or better than large GPT models, while keeping the<br>computational cost and energy consumption orders of magnitude lower.<br>},
}

Endnote

%0 Report
%A Christmann, Philipp
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T RAG-based Question Answering over Heterogeneous Data and Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-546F-4
%U https://arxiv.org/abs/2412.07420
%D 2024
%X   This article presents the QUASAR system for question answering over<br>unstructured text, structured tables, and knowledge graphs, with unified<br>treatment of all sources. The system adopts a RAG-based architecture, with a<br>pipeline of evidence retrieval followed by answer generation, with the latter<br>powered by a moderate-sized language model. Additionally and uniquely, QUASAR<br>has components for question understanding, to derive crisper input for evidence<br>retrieval, and for re-ranking and filtering the retrieved evidence before<br>feeding the most informative pieces into the answer generation. Experiments<br>with three different benchmarks demonstrate the high answering quality of our<br>approach, being on par with or better than large GPT models, while keeping the<br>computational cost and energy consumption orders of magnitude lower.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR

Conference paper

P. Christmann, R. Saha Roy, and G. Weikum

“CompMix: A Benchmark for Heterogeneous Question Answering,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

mehr

BibTeX

@inproceedings{ChristmannWWW24,
TITLE = {{CompMix}: A Benchmark for Heterogeneous Question Answering},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.3651444},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady},
PAGES = {1091--1094},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CompMix: A Benchmark for Heterogeneous Question Answering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-579D-1
%R 10.1145/3589335.3651444
%D 2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady
%P 1091 - 1094
%I ACM
%@ 979-8-4007-0172-6

Thesis

S. Ghosh

“Count Information: Retrieving and Estimating Cardinality of Entity Sets from the Web,” Universität des Saarlandes, Saarbrücken, 2024.

mehr

BibTeX

@phdthesis{ThesisPhDGhosh24,
TITLE = {Count Information: Retrieving and Estimating Cardinality of Entity Sets from the Web},
AUTHOR = {Ghosh, Shrestha},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-430580},
DOI = {10.22028/D291-43058},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
}

Endnote

%0 Thesis
%A Ghosh, Shrestha
%Y Razniewski, Simon
%A referee: Weikum, Gerhard
%A referee: Hose, Katja
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Count Information: Retrieving and Estimating Cardinality of Entity Sets from the Web : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-6153-3
%R 10.22028/D291-43058
%U urn:nbn:de:bsz:291--ds-430580
%F OTHER: hdl:20.500.11880/38841
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2024
%P XI, 128 p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/38841

Conference paper

S. Ghosh, S. Razniewski, D. Graux, and G. Weikum

“CardiO: Predicting Cardinality from Online Sources,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

mehr

BibTeX

@inproceedings{Ghosh_WWW2024,
TITLE = {{CardiO}: Predicting Cardinality from Online Sources},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Graux, Damien and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.365147},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady},
PAGES = {573--576},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Graux, Damien
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CardiO: Predicting Cardinality from Online Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-13B6-E
%R 10.1145/3589335.365147
%D 2024
%8 13.05.2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady
%P 573 - 576
%I ACM
%@ 979-8-4007-0172-6

Paper

Y. Hu, S. Ghosh, T.-P. Nguyen, and S. Razniewski

“GPTKB: Building Very Large Knowledge Bases from Language Models,” 2024. [Online]. Available: https://arxiv.org/abs/2411.04920.

mehr

Abstract

General-domain knowledge bases (KB), in particular the "big three" --
Wikidata, Yago and DBpedia -- are the backbone of many intelligent
applications. While these three have seen steady development, comprehensive KB
construction at large has seen few fresh attempts. In this work, we propose to
build a large general-domain KB entirely from a large language model (LLM). We
demonstrate the feasibility of large-scale KB construction from LLMs, while
highlighting specific challenges arising around entity recognition, entity and
property canonicalization, and taxonomy construction. As a prototype, we use
GPT-4o-mini to construct GPTKB, which contains 105 million triples for more
than 2.9 million entities, at a cost 100x less than previous KBC projects. Our
work is a landmark for two fields: For NLP, for the first time, it provides
\textit{constructive} insights into the knowledge (or beliefs) of LLMs. For the
Semantic Web, it shows novel ways forward for the long-standing challenge of
general-domain KB construction. GPTKB is accessible at gptkb.org.

BibTeX

@online{Hu_2411.04920,
TITLE = {{GPTKB}: Building Very Large Knowledge Bases from Language Models},
AUTHOR = {Hu, Yujia and Ghosh, Shrestha and Nguyen, Tuan-Phong and Razniewski, Simon},
LANGUAGE = {enn},
URL = {https://arxiv.org/abs/2411.04920},
EPRINT = {2411.04920},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {General-domain knowledge bases (KB), in particular the "big three" --<br>Wikidata, Yago and DBpedia -- are the backbone of many intelligent<br>applications. While these three have seen steady development, comprehensive KB<br>construction at large has seen few fresh attempts. In this work, we propose to<br>build a large general-domain KB entirely from a large language model (LLM). We<br>demonstrate the feasibility of large-scale KB construction from LLMs, while<br>highlighting specific challenges arising around entity recognition, entity and<br>property canonicalization, and taxonomy construction. As a prototype, we use<br>GPT-4o-mini to construct GPTKB, which contains 105 million triples for more<br>than 2.9 million entities, at a cost 100x less than previous KBC projects. Our<br>work is a landmark for two fields: For NLP, for the first time, it provides<br>\textit{constructive} insights into the knowledge (or beliefs) of LLMs. For the<br>Semantic Web, it shows novel ways forward for the long-standing challenge of<br>general-domain KB construction. GPTKB is accessible at http://gptkb.org.<br>},
}

Endnote

%0 Report
%A Hu, Yujia
%A Ghosh, Shrestha
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T GPTKB: Building Very Large Knowledge Bases from Language Models : 
%G enn
%U http://hdl.handle.net/21.11116/0000-0010-133A-8
%U https://arxiv.org/abs/2411.04920
%D 2024
%X   General-domain knowledge bases (KB), in particular the "big three" --<br>Wikidata, Yago and DBpedia -- are the backbone of many intelligent<br>applications. While these three have seen steady development, comprehensive KB<br>construction at large has seen few fresh attempts. In this work, we propose to<br>build a large general-domain KB entirely from a large language model (LLM). We<br>demonstrate the feasibility of large-scale KB construction from LLMs, while<br>highlighting specific challenges arising around entity recognition, entity and<br>property canonicalization, and taxonomy construction. As a prototype, we use<br>GPT-4o-mini to construct GPTKB, which contains 105 million triples for more<br>than 2.9 million entities, at a cost 100x less than previous KBC projects. Our<br>work is a landmark for two fields: For NLP, for the first time, it provides<br>\textit{constructive} insights into the knowledge (or beliefs) of LLMs. For the<br>Semantic Web, it shows novel ways forward for the long-standing challenge of<br>general-domain KB construction. GPTKB is accessible at http://gptkb.org.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB

Conference paper

Z. Jia, P. Christmann, and G. Weikum

“Faithful Temporal Question Answering over Heterogeneous Sources,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

mehr

BibTeX

@inproceedings{Jia_WWW2024,
TITLE = {Faithful Temporal Question Answering over Heterogeneous Sources},
AUTHOR = {Jia, Zhen and Christmann, Philipp and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589334.3645547},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-We and Kumar, Ravi and Lauw, Hady W.},
PAGES = {2052--2063},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Jia, Zhen
%A Christmann, Philipp
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Faithful Temporal Question Answering over Heterogeneous Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-7D1D-7
%R 10.1145/3589334.3645547
%D 2024
%8 13.05.2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-We; Kumar, Ravi; Lauw, Hady W.
%P 2052 - 2063
%I ACM
%@ 979-8-4007-0172-6

Conference paper

Z. Jia, P. Christmann, and G. Weikum

“TIQ: A Benchmark for Temporal Question Answering with Implicit Time Constraints,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

mehr

BibTeX

@inproceedings{Jia_WWW24,
TITLE = {{TIQ}: {A} Benchmark for Temporal Question Answering with Implicit Time Constraints},
AUTHOR = {Jia, Zhen and Christmann, Philipp and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.3651895},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady W.},
PAGES = {1394--1399},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Jia, Zhen
%A Christmann, Philipp
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TIQ: A Benchmark for Temporal Question Answering with Implicit Time Constraints : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-4EDB-4
%R 10.1145/3589335.3651895
%D 2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady W.
%P 1394 - 1399
%I ACM
%@ 979-8-4007-0172-6

Conference paper

M. Kaiser, P. Ernst, and G. Szarvas

“Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems,” in Findings of EMNLP 2024, Miami, FL, USA, 2024.

mehr

BibTeX

@inproceedings{Kaiser_EMNLP24,
TITLE = {Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems},
AUTHOR = {Kaiser, Magdalena and Ernst, Patrick and Szarvas, Gy{\"o}rgy},
LANGUAGE = {eng},
ISBN = {979-8-89176-168-1},
DOI = {10.18653/v1/2024.findings-emnlp.362},
PUBLISHER = {Association for Computational Linguistics},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Findings of EMNLP 2024},
EDITOR = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
PAGES = {6236--6246},
ADDRESS = {Miami, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Ernst, Patrick
%A Szarvas, Gy&#246;rgy
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5B9-1
%R 10.18653/v1/2024.findings-emnlp.362
%D 2024
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2024-11-12 - 2024-11-16
%C Miami, FL, USA
%B Findings of EMNLP 2024
%E Al-Onaizan , Yaser ; Bansal, Mohit ; Chen, Yun-Nung
%P 6236 - 6246
%I Association for Computational Linguistics
%@ 979-8-89176-168-1

Conference paper

M. Kaiser, R. Saha Roy, and G. Weikum

“Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation,” in WSDM ’24, Merida, Mexico, 2024.

mehr

BibTeX

@inproceedings{KaiserWSDM24,
TITLE = {Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1145/3616855.3635822},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {WSDM '24},
EDITOR = {Ang{\'e}lica, Luz and Lattanzi, Silvio and Mu{\~n}oz Medina, Andr{\'e}s and Akoglu, Leman and Gionis, Aristides and Vassilvitskii, Sergei},
PAGES = {322--331},
ADDRESS = {Merida, Mexico},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Robust Training for Conversational Question Answering Models with
  Reinforced Reformulation Generation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-E9D1-0
%R 10.1145/3616855.3635822
%D 2024
%8 04.03.2024
%B 17th ACM International Conference on Web Search and Data Mining
%Z date of event: 2024-03-04 - 2024-03-08
%C Merida, Mexico
%B WSDM '24
%E Ang&#233;lica, Luz; Lattanzi, Silvio; Mu&#241;oz Medina, Andr&#233;s; Akoglu, Leman; Gionis, Aristides; Vassilvitskii, Sergei
%P 322 - 331
%I ACM

Conference paper

J.-C. Kalo, T.-P. Nguyen, S. Razniewski, and B. Zhang

“Preface: LM-KBC Challenge 2024,” in Joint proceedings of the KBC-LM workshop and the LM-KBC challenge 2024, Baltimore, MD, USA, 2024, vol. 3853.

mehr

BibTeX

@inproceedings{Kalo_Preface24,
TITLE = {Preface: {LM}-{KBC} Challenge 2024},
AUTHOR = {Kalo, Jan-Christoph and Nguyen, Tuan-Phong and Razniewski, Simon and Zhang, Bohui},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {https://ceur-ws.org/Vol-3853/paper0.pdf},
PUBLISHER = {CEUR.ws},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Joint proceedings of the KBC-LM workshop and the LM-KBC challenge 2024},
EDITOR = {Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Pan, Jeff Z. and Nguyen, Tuan-Phong and Zhang, Bohui},
VOLUME = {3853},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3853},
ADDRESS = {Baltimore, MD, USA},
}

Endnote

%0 Conference Proceedings
%A Kalo, Jan-Christoph
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Zhang, Bohui
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Preface: LM-KBC Challenge 2024 : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-6E9D-3
%U https://ceur-ws.org/Vol-3853/paper0.pdf
%D 2024
%B 2nd Workshop on Knowledge Base Construction from Pre-Trained Language Models
%Z date of event: 2024-11-12 - 2024-11-12
%C Baltimore, MD, USA
%B Joint proceedings of the KBC-LM workshop and the LM-KBC challenge 2024
%E Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Pan, Jeff Z.; Nguyen, Tuan-Phong; Zhang, Bohui
%V 3853
%I CEUR.ws
%@ false
%B CEUR Workshop Proceedings
%N 3853

Conference paper

L. Lange, M. Müller, G. H. Torbati, D. Milchevski, P. Grau, S. C. Pujari, and A. Friedrich

“AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports,” in The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 2024.

mehr

BibTeX

@inproceedings{Lange_LREC24,
TITLE = {{AnnoCTR}: {A} Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports},
AUTHOR = {Lange, Lukas and M{\"u}ller, Marc and Torbati, Ghazaleh Haratinezhad and Milchevski, Dragan and Grau, Patrick and Pujari, Subhash Chandra and Friedrich, Annemarie},
LANGUAGE = {eng},
ISBN = {978-2-493814-10-4},
URL = {https://aclanthology.org/2024.lrec-main.103/},
PUBLISHER = {ELRA Language Resources Association},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
EDITOR = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},
PAGES = {1147--1160},
ADDRESS = {Torino, Italy},
}

Endnote

%0 Conference Proceedings
%A Lange, Lukas
%A M&#252;ller, Marc
%A Torbati, Ghazaleh Haratinezhad
%A Milchevski, Dragan
%A Grau, Patrick
%A Pujari, Subhash Chandra
%A Friedrich, Annemarie
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5B3-7
%U https://aclanthology.org/2024.lrec-main.103/
%D 2024
%B Joint International Conference on Computational Linguistics,
Language Resources and Evaluation
%Z date of event: 2024-05-20 - 2024-05-25
%C Torino, Italy
%B The 2024 Joint International Conference on Computational Linguistics,
Language Resources and Evaluation
%E Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen
%P 1147 - 1160
%I ELRA Language Resources Association
%@ 978-2-493814-10-4

Conference paper

T.-P. Nguyen, S. Razniewski, and G. Weikum

“Cultural Commonsense Knowledge for Intercultural Dialogues,” in CIKM ’24, 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 2024.

mehr

BibTeX

@inproceedings{Nguyen_CIKM24,
TITLE = {Cultural Commonsense Knowledge for Intercultural Dialogues},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0436-9},
DOI = {10.1145/3627673.3679768},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {CIKM '24, 33rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Serra, Edoardo and Spezzano, Francesca},
PAGES = {1774--1784},
ADDRESS = {Boise, ID, USA},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cultural Commonsense Knowledge for Intercultural Dialogues : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-7348-0
%R 10.1145/3627673.3679768
%D 2024
%B 33rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2024-10-21 - 2024-10-25
%C Boise, ID, USA
%B CIKM '24
%E Serra, Edoardo; Spezzano, Francesca
%P 1774 - 1784
%I ACM
%@ 979-8-4007-0436-9

Conference paper

K. Pal, H. Arnaout, S. Razniewski, and G. Weikum

“FASETS: Discovering Faceted Sets of Entities,” in The ACM Web Conference 2024 (WWW 2024), Singapore, 2024.

mehr

BibTeX

@inproceedings{Pal_WWW24,
TITLE = {{FASETS}: {D}iscovering Faceted Sets of Entities},
AUTHOR = {Pal, Koninika and Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0172-6},
DOI = {10.1145/3589335.3651924},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2024 (WWW 2024)},
EDITOR = {Chua, Tat-Seng and Ngo, Chong-Wah and Lee, Roy Ka-Wei and Kumar, Ravi and Lauw, Hady W.},
PAGES = {1521--1529},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Pal, Koninika
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T FASETS: Discovering Faceted Sets of Entities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-4ED6-9
%R 10.1145/3589335.3651924
%D 2024
%B ACM Web Conference
%Z date of event: 2024-05-13 - 2024-05-17
%C Singapore
%B The ACM Web Conference 2024
%E Chua, Tat-Seng; Ngo, Chong-Wah; Lee, Roy Ka-Wei; Kumar, Ravi; Lauw, Hady W.
%P 1521 - 1529
%I ACM
%@ 979-8-4007-0172-6

Article

S. Pramanik, J. Alabi, R. Saha Roy, and G. Weikum

“UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text,” Journal of Web Semantics, vol. 83, 2024.

mehr

Abstract

Question answering over knowledge graphs and other RDF data has been greatly
advanced, with a number of good systems providing crisp answers for natural
language questions or telegraphic queries. Some of these systems incorporate
textual sources as additional evidence for the answering process, but cannot
compute answers that are present in text alone. Conversely, systems from the IR
and NLP communities have addressed QA over text, but barely utilize semantic
data and knowledge. This paper presents the first QA system that can seamlessly
operate over RDF datasets and text corpora, or both together, in a unified
framework. Our method, called UNIQORN, builds a context graph on the fly, by
retrieving question-relevant triples from the RDF data and/or the text corpus,
where the latter case is handled by automatic information extraction. The
resulting graph is typically rich but highly noisy. UNIQORN copes with this
input by advanced graph algorithms for Group Steiner Trees, that identify the
best answer candidates in the context graph. Experimental results on several
benchmarks of complex questions with multiple entities and relations, show that
UNIQORN, an unsupervised method with only five parameters, produces results
comparable to the state-of-the-art on KGs, text corpora, and heterogeneous
sources. The graph-based methodology provides user-interpretable evidence for
the complete answering process.

BibTeX

@article{Pramanik24c,
TITLE = {{UNIQORN}: {U}nified Question Answering over {RDF} Knowledge Graphs and Natural Language Text},
AUTHOR = {Pramanik, Soumajit and Alabi, Jesujoba and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1873-7749},
DOI = {10.1016/j.websem.2024.100833},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
ABSTRACT = {Question answering over knowledge graphs and other RDF data has been greatly<br>advanced, with a number of good systems providing crisp answers for natural<br>language questions or telegraphic queries. Some of these systems incorporate<br>textual sources as additional evidence for the answering process, but cannot<br>compute answers that are present in text alone. Conversely, systems from the IR<br>and NLP communities have addressed QA over text, but barely utilize semantic<br>data and knowledge. This paper presents the first QA system that can seamlessly<br>operate over RDF datasets and text corpora, or both together, in a unified<br>framework. Our method, called UNIQORN, builds a context graph on the fly, by<br>retrieving question-relevant triples from the RDF data and/or the text corpus,<br>where the latter case is handled by automatic information extraction. The<br>resulting graph is typically rich but highly noisy. UNIQORN copes with this<br>input by advanced graph algorithms for Group Steiner Trees, that identify the<br>best answer candidates in the context graph. Experimental results on several<br>benchmarks of complex questions with multiple entities and relations, show that<br>UNIQORN, an unsupervised method with only five parameters, produces results<br>comparable to the state-of-the-art on KGs, text corpora, and heterogeneous<br>sources. The graph-based methodology provides user-interpretable evidence for<br>the complete answering process.<br>},
JOURNAL = {Journal of Web Semantics},
VOLUME = {83},
EID = {100833},
}

Endnote

%0 Journal Article
%A Pramanik, Soumajit
%A Alabi, Jesujoba
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6365-6
%R 10.1016/j.websem.2024.100833
%7 2024-09-10
%D 2024
%X   Question answering over knowledge graphs and other RDF data has been greatly<br>advanced, with a number of good systems providing crisp answers for natural<br>language questions or telegraphic queries. Some of these systems incorporate<br>textual sources as additional evidence for the answering process, but cannot<br>compute answers that are present in text alone. Conversely, systems from the IR<br>and NLP communities have addressed QA over text, but barely utilize semantic<br>data and knowledge. This paper presents the first QA system that can seamlessly<br>operate over RDF datasets and text corpora, or both together, in a unified<br>framework. Our method, called UNIQORN, builds a context graph on the fly, by<br>retrieving question-relevant triples from the RDF data and/or the text corpus,<br>where the latter case is handled by automatic information extraction. The<br>resulting graph is typically rich but highly noisy. UNIQORN copes with this<br>input by advanced graph algorithms for Group Steiner Trees, that identify the<br>best answer candidates in the context graph. Experimental results on several<br>benchmarks of complex questions with multiple entities and relations, show that<br>UNIQORN, an unsupervised method with only five parameters, produces results<br>comparable to the state-of-the-art on KGs, text corpora, and heterogeneous<br>sources. The graph-based methodology provides user-interpretable evidence for<br>the complete answering process.<br>
%J Journal of Web Semantics
%V 83
%Z sequence number: 100833
%I Elsevier
%C Amsterdam
%@ false

Article

S. Razniewski, H. Arnaout, S. Ghosh, and F. Suchanek

“Completeness, Recall, and Negation in Open-world Knowledge Bases: A Survey,” ACM Computing Surveys, vol. 56, no. 6, 2024.

mehr

BibTeX

@article{Razniewski24,
TITLE = {Completeness, Recall, and Negation in Open-world Knowledge Bases: {A} Survey},
AUTHOR = {Razniewski, Simon and Arnaout, Hiba and Ghosh, Shrestha and Suchanek, Fabian},
LANGUAGE = {eng},
ISSN = {0360-0300; 1557-7341},
DOI = {10.1145/3639563},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
JOURNAL = {ACM Computing Surveys},
VOLUME = {56},
NUMBER = {6},
PAGES = {1--42},
EID = {150},
}

Endnote

%0 Journal Article
%A Razniewski, Simon
%A Arnaout, Hiba
%A Ghosh, Shrestha
%A Suchanek, Fabian
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Completeness, Recall, and Negation in Open-world Knowledge Bases: A Survey : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-76FD-1
%R 10.1145/3639563
%D 2024
%J ACM Computing Surveys
%O ACM Comput. Surv. Computing surveys CSUR
%V 56
%N 6
%& 1
%P 1 - 42
%Z sequence number: 150
%I ACM
%C New York, NY
%@ false

Proceedings

S. Razniewski, J.-C. Kalo, S. Singhania, J. Z. Pan, T.-P. Nguyen, and B. Zhang

Eds., Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge 2024. CEUR-WS, 2024.

mehr

BibTeX

@proceedings{RazniewskiKBC24,
TITLE = {Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge 2024 (KBC-LM-LM-KBC 2024)},
EDITOR = {Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Pan, Jeff Z. and Nguyen, Tuan-Phong and Zhang, Bohui},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-3853-0},
PUBLISHER = {CEUR-WS},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3853},
ADDRESS = {Baltimore, MD, USA},
}

Endnote

%0 Conference Proceedings
%E Razniewski, Simon
%E Kalo, Jan-Christoph
%E Singhania, Sneha
%E Pan, Jeff Z.
%E Nguyen, Tuan-Phong
%E Zhang, Bohui
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge 2024 : Joint proceedings of the 2nd workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM 2024) and the 3rd challenge on Language Models for Knowledge Base Construction (LM-KBC 2024)
co-located with the 23nd International Semantic Web Conference (ISWC 2024)
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-6E90-0
%U urn:nbn:de:0074-3853-0
%I CEUR-WS
%D 2024
%B 2nd Workshop on Knowledge Base Construction from Pre-Trained Language Models
%Z date of event: 2024-11-12 - 2024-11-12
%D 2024
%C Baltimore, MD, USA
%S CEUR Workshop Proceedings
%V 3853
%@ false

Conference paper

T. P. Schrader, L. Lange, S. Razniewski, and A. Friedrich

“QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Miami, FL, USA, 2024.

mehr

BibTeX

@inproceedings{Schrader_EMNLP24,
TITLE = {{QUITE}: {Q}uantifying Uncertainty in Natural Language Text in {B}ayesian Reasoning Scenarios},
AUTHOR = {Schrader, Timo Pierre and Lange, Lukas and Razniewski, Simon and Friedrich, Annemarie},
LANGUAGE = {eng},
ISBN = {979-8-89176-164-3},
URL = {https://aclanthology.org/2024.emnlp-main.153},
PUBLISHER = {Association for Computational Linguistics},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)},
EDITOR = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
PAGES = {2634--2652},
ADDRESS = {Miami, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Schrader, Timo Pierre
%A Lange, Lukas
%A Razniewski, Simon
%A Friedrich, Annemarie
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-2E26-1
%U https://aclanthology.org/2024.emnlp-main.153
%D 2024
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2024-11-12 - 2024-11-16
%C Miami, FL, USA
%B Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
%E Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung
%P 2634 - 2652
%I Association for Computational Linguistics
%@ 979-8-89176-164-3
%U https://aclanthology.org/2024.emnlp-main.153.pdf

Paper

S. Singhania, S. Cucerzan, A. Herring, and S. K. Jauhar

“Neon: News Entity-Interaction Extraction for Enhanced Question Answering,” 2024. [Online]. Available: https://arxiv.org/abs/2411.12449.

mehr

Abstract

Capturing fresh information in near real-time and using it to augment
existing large language models (LLMs) is essential to generate up-to-date,
grounded, and reliable output. This problem becomes particularly challenging
when LLMs are used for informational tasks in rapidly evolving fields, such as
Web search related to recent or unfolding events involving entities, where
generating temporally relevant responses requires access to up-to-the-hour news
sources. However, the information modeled by the parametric memory of LLMs is
often outdated, and Web results from prototypical retrieval systems may fail to
capture the latest relevant information and struggle to handle conflicting
reports in evolving news. To address this challenge, we present the NEON
framework, designed to extract emerging entity interactions -- such as events
or activities -- as described in news articles. NEON constructs an
entity-centric timestamped knowledge graph that captures such interactions,
thereby facilitating enhanced QA capabilities related to news events. Our
framework innovates by integrating open Information Extraction (openIE) style
tuples into LLMs to enable in-context retrieval-augmented generation. This
integration demonstrates substantial improvements in QA performance when
tackling temporal, entity-centric search queries. Through NEON, LLMs can
deliver more accurate, reliable, and up-to-date responses.

BibTeX

@online{Singhania2411.12449,
TITLE = {Neon: News Entity-Interaction Extraction for Enhanced Question Answering},
AUTHOR = {Singhania, Sneha and Cucerzan, Silviu and Herring, Allen and Jauhar, Sujay Kumar},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2411.12449},
EPRINT = {2411.12449},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Capturing fresh information in near real-time and using it to augment<br>existing large language models (LLMs) is essential to generate up-to-date,<br>grounded, and reliable output. This problem becomes particularly challenging<br>when LLMs are used for informational tasks in rapidly evolving fields, such as<br>Web search related to recent or unfolding events involving entities, where<br>generating temporally relevant responses requires access to up-to-the-hour news<br>sources. However, the information modeled by the parametric memory of LLMs is<br>often outdated, and Web results from prototypical retrieval systems may fail to<br>capture the latest relevant information and struggle to handle conflicting<br>reports in evolving news. To address this challenge, we present the NEON<br>framework, designed to extract emerging entity interactions -- such as events<br>or activities -- as described in news articles. NEON constructs an<br>entity-centric timestamped knowledge graph that captures such interactions,<br>thereby facilitating enhanced QA capabilities related to news events. Our<br>framework innovates by integrating open Information Extraction (openIE) style<br>tuples into LLMs to enable in-context retrieval-augmented generation. This<br>integration demonstrates substantial improvements in QA performance when<br>tackling temporal, entity-centric search queries. Through NEON, LLMs can<br>deliver more accurate, reliable, and up-to-date responses.<br>},
}

Endnote

%0 Report
%A Singhania, Sneha
%A Cucerzan, Silviu
%A Herring, Allen
%A Jauhar, Sujay Kumar
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Neon: News Entity-Interaction Extraction for Enhanced Question Answering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5F6-C
%U https://arxiv.org/abs/2411.12449
%D 2024
%X   Capturing fresh information in near real-time and using it to augment<br>existing large language models (LLMs) is essential to generate up-to-date,<br>grounded, and reliable output. This problem becomes particularly challenging<br>when LLMs are used for informational tasks in rapidly evolving fields, such as<br>Web search related to recent or unfolding events involving entities, where<br>generating temporally relevant responses requires access to up-to-the-hour news<br>sources. However, the information modeled by the parametric memory of LLMs is<br>often outdated, and Web results from prototypical retrieval systems may fail to<br>capture the latest relevant information and struggle to handle conflicting<br>reports in evolving news. To address this challenge, we present the NEON<br>framework, designed to extract emerging entity interactions -- such as events<br>or activities -- as described in news articles. NEON constructs an<br>entity-centric timestamped knowledge graph that captures such interactions,<br>thereby facilitating enhanced QA capabilities related to news events. Our<br>framework innovates by integrating open Information Extraction (openIE) style<br>tuples into LLMs to enable in-context retrieval-augmented generation. This<br>integration demonstrates substantial improvements in QA performance when<br>tackling temporal, entity-centric search queries. Through NEON, LLMs can<br>deliver more accurate, reliable, and up-to-date responses.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR

Paper

S. Singhania, S. Razniewski, and G. Weikum

“Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents,” 2024. .

mehr

Abstract

Methods for relation extraction from text mostly focus on high precision, at
the cost of limited recall. High recall is crucial, though, to populate long
lists of object entities that stand in a specific relation with a given
subject. Cues for relevant objects can be spread across many passages in long
texts. This poses the challenge of extracting long lists from long texts. We
present the L3X method which tackles the problem in two stages: (1)
recall-oriented generation using a large language model (LLM) with judicious
techniques for retrieval augmentation, and (2) precision-oriented
scrutinization to validate or prune candidates. Our L3X method outperforms
LLM-only generations by a substantial margin.

BibTeX

@online{Singhania_2405.02732,
TITLE = {Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents},
AUTHOR = {Singhania, Sneha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
EPRINT = {2405.02732},
EPRINTTYPE = {arXiv},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Methods for relation extraction from text mostly focus on high precision, at<br>the cost of limited recall. High recall is crucial, though, to populate long<br>lists of object entities that stand in a specific relation with a given<br>subject. Cues for relevant objects can be spread across many passages in long<br>texts. This poses the challenge of extracting long lists from long texts. We<br>present the L3X method which tackles the problem in two stages: (1)<br>recall-oriented generation using a large language model (LLM) with judicious<br>techniques for retrieval augmentation, and (2) precision-oriented<br>scrutinization to validate or prune candidates. Our L3X method outperforms<br>LLM-only generations by a substantial margin.<br>},
}

Endnote

%0 Report
%A Singhania, Sneha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-75A0-8
%D 2024
%X   Methods for relation extraction from text mostly focus on high precision, at<br>the cost of limited recall. High recall is crucial, though, to populate long<br>lists of object entities that stand in a specific relation with a given<br>subject. Cues for relevant objects can be spread across many passages in long<br>texts. This poses the challenge of extracting long lists from long texts. We<br>present the L3X method which tackles the problem in two stages: (1)<br>recall-oriented generation using a large language model (LLM) with judicious<br>techniques for retrieval augmentation, and (2) precision-oriented<br>scrutinization to validate or prune candidates. Our L3X method outperforms<br>LLM-only generations by a substantial margin.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR

Conference paper

A. Tigunova, G. H. Torbati, A. Yates, and G. Weikum

“STAR: Sparse Text Approach for Recommendation,” in CIKM ’24, 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 2024.

mehr

BibTeX

@inproceedings{Tigunova_CIKM24,
TITLE = {{STAR}: {S}parse Text Approach for Recommendation},
AUTHOR = {Tigunova, Anna and Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0436-9},
DOI = {10.1145/3627673.3679999},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {CIKM '24, 33rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Serra, Edoardo and Spezzano, Francesca},
PAGES = {4086--4090},
ADDRESS = {Boise, ID, USA},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%A Torbati, Ghazaleh Haratinezhad
%A Yates, Andrew
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T STAR: Sparse Text Approach for Recommendation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-FD24-C
%R 10.1145/3627673.3679999
%D 2024
%B 33rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2024-10-21 - 2024-10-25
%C Boise, ID, USA
%B CIKM '24
%E Serra, Edoardo; Spezzano, Francesca
%P 4086 - 4090
%I ACM
%@ 979-8-4007-0436-9

Conference paper

G. H. Torbati, A. Tigunova, G. Weikum, and A. Yates

“Recommendations in Sparse-Data Low-Resource Settings by Constructing Concise User Profiles from Review Text,” in 3rd International Workshop on Industrial Recommendation Systems (at CIKM 2024) (IRS 2024), Boise, ID, USA, 2024.

mehr

BibTeX

@inproceedings{Torbati_IRS24,
TITLE = {Recommendations in Sparse-Data Low-Resource Settings by Constructing Concise User Profiles from Review Text},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Tigunova, Anna and Weikum, Gerhard and Yates, Andrew},
LANGUAGE = {eng},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {3rd International Workshop on Industrial Recommendation Systems (at CIKM 2024) (IRS 2024)},
ADDRESS = {Boise, ID, USA},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Tigunova, Anna
%A Weikum, Gerhard
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Recommendations in Sparse-Data Low-Resource Settings by Constructing Concise User Profiles from Review Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-0BD1-6
%D 2024
%8 21.10.2024
%B 3rd International Workshop on Industrial Recommendation Systems
%Z date of event: 2024-10-25 - 2024-10-25
%C Boise, ID, USA
%B 3rd International Workshop on Industrial Recommendation Systems (at CIKM 2024)
%I ACM

Conference paper

G. H. Torbati, A. Tigunova, and G. Weikum

“SIRUP: Search-based Book Recommendation Playground,” in WSDM ’24, 17th ACM International Conference on Web Search and Data Mining, Merida, Mexico, 2024.

mehr

BibTeX

@inproceedings{TorbatiWSDM24,
TITLE = {{SIRUP}: {S}earch-based Book Recommendation Playground},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Tigunova, Anna and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0371-3},
DOI = {10.1145/3616855.3635692},
PUBLISHER = {ACM},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {WSDM '24, 17th ACM International Conference on Web Search and Data Mining},
EDITOR = {Ang{\'e}lica Caudillo Mata, Luz and Lattanzi, Silvio and Mu{\~n}oz Medina, Andr{\'e}s and Akoglu, Leman and Gionis, Aristides and Vassilvitskii, Sergei},
PAGES = {1062--1065},
ADDRESS = {Merida, Mexico},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Tigunova, Anna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T SIRUP: Search-based Book Recommendation Playground : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000E-A663-7
%R 10.1145/3616855.3635692
%D 2024
%B 17th ACM International Conference on Web Search and Data Mining
%Z date of event: 2024-03-04 - 2024-03-08
%C Merida, Mexico
%B WSDM '24
%E Ang&#233;lica Caudillo Mata, Luz; Lattanzi, Silvio; Mu&#241;oz Medina, Andr&#233;s; Akoglu, Leman; Gionis, Aristides; Vassilvitskii, Sergei
%P 1062 - 1065
%I ACM
%@ 979-8-4007-0371-3

Conference paper

H. D. Tran, A. Yates, and G. Weikum

“Conversational Search with Tail Entities,” in Advances in Information Retrieval (ECIR 2024), Glasgow, UK, 2024.

mehr

BibTeX

@inproceedings{Tran_ECIR24,
TITLE = {Conversational Search with Tail Entities},
AUTHOR = {Tran, Hai Dang and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-031-56059-0},
DOI = {10.1007/978-3-031-56060-6_20},
PUBLISHER = {Springer},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2024)},
EDITOR = {Goharian, Nazli and Tonellotto, Nicola and He, Yulan and Lipani, Aldo and McDonald, Graham and Macdonald, Craig and Ounis, Iadh},
PAGES = {303--317},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {14609},
ADDRESS = {Glasgow, UK},
}

Endnote

%0 Conference Proceedings
%A Tran, Hai Dang
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Conversational Search with Tail Entities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000F-042C-C
%R 10.1007/978-3-031-56060-6_20
%D 2024
%B 46th European Conference on Information Retrieval
%Z date of event: 2024-03-24 - 2024-03-28
%C Glasgow, UK
%B Advances in Information Retrieval
%E Goharian, Nazli; Tonellotto, Nicola; He, Yulan; Lipani, Aldo; McDonald, Graham; Macdonald, Craig; Ounis, Iadh
%P 303 - 317
%I Springer
%@ 978-3-031-56059-0
%B Lecture Notes in Computer Science
%N 14609

Article

A. Varde, D. Karthikeyan, and W. Wang

“Facilitating COVID Recognition from X-Rays with Computer Vision Models and Transfer Learning,” Multimedia Tools and Applications, vol. 83, 2024.

mehr

BibTeX

@article{Varde23,
TITLE = {Facilitating {COVID} Recognition from {X}-Rays with Computer Vision Models and Transfer Learning},
AUTHOR = {Varde, Aparna and Karthikeyan, Divydharshini and Wang, Weitian},
LANGUAGE = {eng},
ISSN = {1380-7501},
DOI = {10.1007/s11042-023-15744-9},
PUBLISHER = {Springer Nature},
ADDRESS = {New York, NY},
YEAR = {2024},
MARGINALMARK = {$\bullet$},
DATE = {2024},
JOURNAL = {Multimedia Tools and Applications},
VOLUME = {83},
PAGES = {807--838},
}

Endnote

%0 Journal Article
%A Varde, Aparna
%A Karthikeyan, Divydharshini
%A Wang, Weitian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Facilitating COVID Recognition from X-Rays with Computer Vision Models and Transfer Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-578B-5
%R 10.1007/s11042-023-15744-9
%7 2023-05-26
%D 2024
%J Multimedia Tools and Applications
%V 83
%& 807
%P 807 - 838
%I Springer Nature
%C New York, NY
%@ false
%U https://rdcu.be/d7bKu

2023

Thesis

A. S. Anwari

“Learning Filters to Improve Social Media Search,” Universität des Saarlandes, Saarbrücken, 2023.

mehr

BibTeX

@mastersthesis{AnwariMSc23,
TITLE = {Learning Filters to Improve Social Media Search},
AUTHOR = {Anwari, Ahmed Sohail},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
}

Endnote

%0 Thesis
%A Anwari, Ahmed Sohail
%Y Yates, Andrew
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Learning Filters to Improve Social Media Search : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-1C08-C
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2023
%P XI, 69 p.
%V master
%9 master

Conference paper

H. Arnaout, T.-P. Nguyen, S. Razniewski, and G. Weikum

“UnCommonSense in Action! Informative Negations for Commonsense Knowledge Bases,” in WSDM ’23, 16th ACM International Conference on Web Search and Data Mining, Singapore, 2023.

mehr

BibTeX

@inproceedings{Arnaout_WSDM23,
TITLE = {{UnCommonSense} in Action! {I}nformative Negations for Commonsense Knowledge Bases},
AUTHOR = {Arnaout, Hiba and Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-9407-9},
DOI = {10.1145/3539597.3573027},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {WSDM '23, 16th ACM International Conference on Web Search and Data Mining},
EDITOR = {Chua, Tat-Seng and Lauw, Hady and Si, Luo and Terzi, Evimaria and Tsaparas, Panayiotis},
PAGES = {1120--1123},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Arnaout, Hiba
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T UnCommonSense in Action! Informative Negations for Commonsense Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-18BC-6
%R 10.1145/3539597.3573027
%D 2023
%B 16th ACM International Conference on Web Search and Data Mining
%Z date of event: 2023-02-27 - 2023-03-03
%C Singapore
%B WSDM '23
%E Chua, Tat-Seng; Lauw, Hady; Si, Luo; Terzi, Evimaria; Tsaparas, Panayiotis
%P 1120 - 1123
%I ACM
%@ 978-1-4503-9407-9

Paper

H. Arnaout and S. Razniewski

“Can Large Language Models Generate Salient Negative Statements?,” 2023. [Online]. Available: https://arxiv.org/abs/2305.16755.

mehr

Abstract

We examine the ability of large language models (LLMs) to generate salient
(interesting) negative statements about real-world entities; an emerging
research topic of the last few years. We probe the LLMs using zero- and k-shot
unconstrained probes, and compare with traditional methods for negation
generation, i.e., pattern-based textual extractions and knowledge-graph-based
inferences, as well as crowdsourced gold statements. We measure the correctness
and salience of the generated lists about subjects from different domains. Our
evaluation shows that guided probes do in fact improve the quality of generated
negatives, compared to the zero-shot variant. Nevertheless, using both prompts,
LLMs still struggle with the notion of factuality of negatives, frequently
generating many ambiguous statements, or statements with negative keywords but
a positive meaning.

BibTeX

@online{Arnaout2305.16755,
TITLE = {Can Large Language Models Generate Salient Negative Statements?},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2305.16755},
EPRINT = {2305.16755},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {We examine the ability of large language models (LLMs) to generate salient<br>(interesting) negative statements about real-world entities; an emerging<br>research topic of the last few years. We probe the LLMs using zero- and k-shot<br>unconstrained probes, and compare with traditional methods for negation<br>generation, i.e., pattern-based textual extractions and knowledge-graph-based<br>inferences, as well as crowdsourced gold statements. We measure the correctness<br>and salience of the generated lists about subjects from different domains. Our<br>evaluation shows that guided probes do in fact improve the quality of generated<br>negatives, compared to the zero-shot variant. Nevertheless, using both prompts,<br>LLMs still struggle with the notion of factuality of negatives, frequently<br>generating many ambiguous statements, or statements with negative keywords but<br>a positive meaning.<br>},
}

Endnote

%0 Report
%A Arnaout, Hiba
%A Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Can Large Language Models Generate Salient Negative Statements? : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-3D73-E
%U https://arxiv.org/abs/2305.16755
%D 2023
%X   We examine the ability of large language models (LLMs) to generate salient<br>(interesting) negative statements about real-world entities; an emerging<br>research topic of the last few years. We probe the LLMs using zero- and k-shot<br>unconstrained probes, and compare with traditional methods for negation<br>generation, i.e., pattern-based textual extractions and knowledge-graph-based<br>inferences, as well as crowdsourced gold statements. We measure the correctness<br>and salience of the generated lists about subjects from different domains. Our<br>evaluation shows that guided probes do in fact improve the quality of generated<br>negatives, compared to the zero-shot variant. Nevertheless, using both prompts,<br>LLMs still struggle with the notion of factuality of negatives, frequently<br>generating many ambiguous statements, or statements with negative keywords but<br>a positive meaning.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI

Thesis

D5IMPR-CS

H. Arnaout

“Enriching Open-world Knowledge Graphs with Expressive Negative Statements,” Universität des Saarlandes, Saarbrücken, 2023.

mehr

BibTeX

@phdthesis{Arnaout_PhD2023,
TITLE = {Enriching Open-world Knowledge Graphs with Expressive Negative Statements},
AUTHOR = {Arnaout, Hiba},
URL = {urn:nbn:de:bsz:291--ds-409235},
DOI = {10.22028/D291-40923},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
}

Endnote

%0 Thesis
%A Arnaout, Hiba
%Y Weikum, Gerhard
%A referee: Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Enriching Open-world Knowledge Graphs with Expressive Negative Statements : 
%U http://hdl.handle.net/21.11116/0000-000E-5B06-6
%R 10.22028/D291-40923
%U urn:nbn:de:bsz:291--ds-409235
%F OTHER: hdl:20.500.11880/36992
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2023
%P xiii, 98 p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/36992

Thesis

A. Bashir

“Leveraging Self-Supervised Learning in Domain-Speciﬁc Language Models,” Universität des Saarlandes, Saarbrücken, 2023.

mehr

BibTeX

@mastersthesis{BashirMSc23,
TITLE = {Leveraging Self-Supervised Learning in Domain-Speci{fi}c Language Models},
AUTHOR = {Bashir, Abdallah},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
}

Endnote

%0 Thesis
%A Bashir, Abdallah
%Y Terolli, Erisa
%Y Ernst, Patrick
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Leveraging Self-Supervised Learning in Domain-Speci&#64257;c Language Models : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-2D82-E
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2023
%P XI, 54 p.
%V master
%9 master

Conference paper

L. Boualili and A. Yates

“A Study of Term-Topic Embeddings for Ranking,” in Advances in Information Retrieval (ECIR 2023), Dublin, Ireland, 2023.

mehr

BibTeX

@inproceedings{Boualili_ECIR23,
TITLE = {A Study of Term-Topic Embeddings for Ranking},
AUTHOR = {Boualili, Lila and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-3-031-28237-9},
DOI = {10.1007/978-3-031-28238-6_25},
PUBLISHER = {Springer},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2023)},
EDITOR = {Kamps, Jaap and Goeuriot, Lorraine and Crestani, Fabio and Maistro, Maria and Joho, Hideao and Davis, Brian and Gurrin, Cathal and Kruschwitz, Udo and Caputo, Annalina},
PAGES = {359--366},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {13981},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Boualili, Lila
%A Yates, Andrew
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Study of Term-Topic Embeddings for Ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DC34-2
%R 10.1007/978-3-031-28238-6_25
%D 2023
%B 45th European Conference on IR Research
%Z date of event: 2023-04-02 - 2023-04-06
%C Dublin, Ireland
%B Advances in Information Retrieval
%E Kamps, Jaap; Goeuriot, Lorraine; Crestani, Fabio; Maistro, Maria; Joho, Hideao; Davis, Brian; Gurrin, Cathal; Kruschwitz, Udo; Caputo, Annalina
%P 359 - 366
%I Springer
%@ 978-3-031-28237-9
%B Lecture Notes in Computer Science
%N 13981

Paper

L. Chen, S. Razniewski, and G. Weikum

“Knowledge Base Completion for Long-Tail Entities,” 2023. [Online]. Available: https://arxiv.org/abs/2306.17472.

mehr

Abstract

Despite their impressive scale, knowledge bases (KBs), such as Wikidata,
still contain significant gaps. Language models (LMs) have been proposed as a
source for filling these gaps. However, prior works have focused on prominent
entities with rich coverage by LMs, neglecting the crucial case of long-tail
entities. In this paper, we present a novel method for LM-based-KB completion
that is specifically geared for facts about long-tail entities. The method
leverages two different LMs in two stages: for candidate retrieval and for
candidate verification and disambiguation. To evaluate our method and various
baselines, we introduce a novel dataset, called MALT, rooted in Wikidata. Our
method outperforms all baselines in F1, with major gains especially in recall.

BibTeX

@online{Chen2306.17472,
TITLE = {Knowledge Base Completion for Long-Tail Entities},
AUTHOR = {Chen, Lihu and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2306.17472},
EPRINT = {2306.17472},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Despite their impressive scale, knowledge bases (KBs), such as Wikidata,<br>still contain significant gaps. Language models (LMs) have been proposed as a<br>source for filling these gaps. However, prior works have focused on prominent<br>entities with rich coverage by LMs, neglecting the crucial case of long-tail<br>entities. In this paper, we present a novel method for LM-based-KB completion<br>that is specifically geared for facts about long-tail entities. The method<br>leverages two different LMs in two stages: for candidate retrieval and for<br>candidate verification and disambiguation. To evaluate our method and various<br>baselines, we introduce a novel dataset, called MALT, rooted in Wikidata. Our<br>method outperforms all baselines in F1, with major gains especially in recall.<br>},
}

Endnote

%0 Report
%A Chen, Lihu
%A Razniewski, Simon
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Base Completion for Long-Tail Entities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-6C7F-D
%U https://arxiv.org/abs/2306.17472
%D 2023
%X   Despite their impressive scale, knowledge bases (KBs), such as Wikidata,<br>still contain significant gaps. Language models (LMs) have been proposed as a<br>source for filling these gaps. However, prior works have focused on prominent<br>entities with rich coverage by LMs, neglecting the crucial case of long-tail<br>entities. In this paper, we present a novel method for LM-based-KB completion<br>that is specifically geared for facts about long-tail entities. The method<br>leverages two different LMs in two stages: for candidate retrieval and for<br>candidate verification and disambiguation. To evaluate our method and various<br>baselines, we introduce a novel dataset, called MALT, rooted in Wikidata. Our<br>method outperforms all baselines in F1, with major gains especially in recall.<br>
%K Computer Science, Computation and Language, cs.CL

Conference paper

P. Christmann, R. Saha Roy, and G. Weikum

“Explainable Conversational Question Answering over Heterogeneous Sources via Iterative Graph Neural Networks,” in SIGIR ’23, 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 2023.

mehr

BibTeX

@inproceedings{Christmann:SIGIR2023,
TITLE = {Explainable Conversational Question Answering over Heterogeneous Sources via Iterative Graph Neural Networks},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-9408-6},
DOI = {10.1145/3539618.3591682},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
BOOKTITLE = {SIGIR '23, 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Chen, Hsin-Hsi and Duh, Wei-Jou and Huang, Hen-Hsen and Kato, Makoto P. and Mothe, Josiane and Poblete, Barbara},
PAGES = {643--653},
ADDRESS = {Taipei, Taiwan},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Explainable Conversational Question Answering over Heterogeneous Sources via Iterative Graph Neural Networks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-FE28-A
%R 10.1145/3539618.3591682
%D 2023
%B 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2023-07-23 - 2023-07-27
%C Taipei, Taiwan
%B SIGIR '23
%E Chen, Hsin-Hsi; Duh, Wei-Jou; Huang, Hen-Hsen; Kato, Makoto P.; Mothe, Josiane; Poblete, Barbara
%P 643 - 653
%I ACM
%@ 978-1-4503-9408-6

Conference paper

P. Christmann, R. Saha Roy, and G. Weikum

“CLOCQ: A Toolkit for Fast and Easy Access to Knowledge Bases,” in BTW 2023, Dresden, Germany, 2023.

mehr

BibTeX

@inproceedings{Christmann_BTW2023,
TITLE = {{CLOCQ}: {A} Toolkit for Fast and Easy Access to Knowledge Bases},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-88579-725-8},
DOI = {10.18420/BTW2023-28},
PUBLISHER = {GI},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {BTW 2023},
EDITOR = {K{\"o}nig-Ries, Birgitta and Scherzinger, Stefanie and Lehner, Wolfgang and Vossen, Gottfried},
PAGES = {579--591},
SERIES = {Lecture Notes in Informatics},
VOLUME = {P-331},
ADDRESS = {Dresden, Germany},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CLOCQ: A Toolkit for Fast and Easy Access to Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-BF14-7
%R 10.18420/BTW2023-28
%D 2023
%B 20th Conference on Database Systems for Business, Technology and Web
%Z date of event: 2023-03-06 - 2023-03-10
%C Dresden, Germany
%B BTW 2023
%E K&#246;nig-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
%P 579 - 591
%I GI
%@ 978-3-88579-725-8
%B Lecture Notes in Informatics
%N P-331

Conference paper

X. L. Dong, B. Li, J. Stoyanovich, A. K. H. Tung, G. Weikum, A. Halevy, and W.-C. Tan

“Personal Data for Personal Use: Vision or Reality?,” in SIGMOD ’23 Companion, Seattle, WA, USA, 2023.

mehr

BibTeX

@inproceedings{DongPODS23,
TITLE = {Personal Data for Personal Use: Vision or Reality?},
AUTHOR = {Dong, Xin Luna and Li, Bo and Stoyanovich, Julia and Tung, Anthony Kum Hoe and Weikum, Gerhard and Halevy, Alon and Tan, Wang-Chiew},
LANGUAGE = {eng},
ISBN = {978-1-4503-9507-6},
DOI = {10.1145/3555041.3589378},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {SIGMOD '23 Companion},
EDITOR = {Das, Sudipto and Pandis, Ippokratis and Candan, K. Sel{\c c}uk and Amer-Yahia, Sihem},
PAGES = {263--264},
ADDRESS = {Seattle, WA, USA},
}

Endnote

%0 Conference Proceedings
%A Dong, Xin Luna
%A Li, Bo
%A Stoyanovich, Julia
%A Tung, Anthony Kum Hoe
%A Weikum, Gerhard
%A Halevy, Alon
%A Tan, Wang-Chiew
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Personal Data for Personal Use: Vision or Reality? : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-5775-E
%R 10.1145/3555041.3589378
%D 2023
%B ACM/SIGMOD International Conference on Management of Data
%Z date of event: 2023-06-18 - 2023-06-23
%C Seattle, WA, USA
%B SIGMOD '23 Companion
%E Das, Sudipto; Pandis, Ippokratis; Candan, K. Sel&#231;uk; Amer-Yahia, Sihem
%P 263 - 264
%I ACM
%@ 978-1-4503-9507-6

Conference paper

A. Ghazimatin

“Enhancing Explainability and Scrutability of Recommender Systems,” in BTW 2023, Dresden, Germany, 2023.

mehr

BibTeX

@inproceedings{DBLP:conf/btw/Ghazimatin23,
TITLE = {Enhancing Explainability and Scrutability of Recommender Systems},
AUTHOR = {Ghazimatin, Azin},
LANGUAGE = {eng},
ISBN = {978-3-88579-725-8},
DOI = {10.18420/BTW2023-32},
PUBLISHER = {GI},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {BTW 2023},
EDITOR = {K{\"o}nig-Ries, Birgitta and Scherzinger, Stefanie and Lehner, Wolfgang and Vossen, Gottfried},
PAGES = {633--640},
SERIES = {Lecture Notes in Informatics},
VOLUME = {P-331},
ADDRESS = {Dresden, Germany},
}

Endnote

%0 Conference Proceedings
%A Ghazimatin, Azin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Enhancing Explainability and Scrutability of Recommender Systems : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DC4D-7
%R 10.18420/BTW2023-32
%D 2023
%B 20th Conference on Database Systems for Business, Technology and Web
%Z date of event: 2023-03-06 - 2023-03-10
%C Dresden, Germany
%B BTW 2023
%E K&#246;nig-Ries, Birgitta; Scherzinger, Stefanie; Lehner, Wolfgang; Vossen, Gottfried
%P 633 - 640
%I GI
%@ 978-3-88579-725-8
%B Lecture Notes in Informatics
%N P-331

Article

S. Ghosh, S. Razniewski, and G. Weikum

“Answering Count Questions with Structured Answers from Text,” Journal of Web Semantics, vol. 76, 2023.

mehr

BibTeX

@article{Ghosh23,
TITLE = {Answering Count Questions with Structured Answers from Text},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1016/j.websem.2022.100769},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
JOURNAL = {Journal of Web Semantics},
VOLUME = {76},
EID = {100769},
}

Endnote

%0 Journal Article
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Answering Count Questions with Structured Answers from Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-47CB-0
%R 10.1016/j.websem.2022.100769
%7 2022
%D 2023
%J Journal of Web Semantics
%V 76
%Z sequence number: 100769
%I Elsevier
%C Amsterdam

Conference paper

S. Ghosh, S. Razniewski, and G. Weikum

“CoQEx: Entity Counts Explained,” in WSDM ’23, 16th ACM International Conference on Web Search and Data Mining, Singapore, 2023.

mehr

BibTeX

@inproceedings{Christmann_WSDM23,
TITLE = {{CoQEx}: {E}ntity Counts Explained},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-9407-9},
DOI = {10.1145/3539597.3573021},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {WSDM '23, 16th ACM International Conference on Web Search and Data Mining},
EDITOR = {Chua, Tat-Seng and Lauw, Hady and Si, Luo and Terzi, Evimaria and Tsaparas, Panayiotis},
PAGES = {1168--1171},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CoQEx: Entity Counts Explained : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-F41F-0
%R 10.1145/3539597.3573021
%D 2023
%B 16th ACM International Conference on Web Search and Data Mining
%Z date of event: 2023-02-27 - 2023-03-03
%C Singapore
%B WSDM '23
%E Chua, Tat-Seng; Lauw, Hady; Si, Luo; Terzi, Evimaria; Tsaparas, Panayiotis
%P 1168 - 1171
%I ACM
%@ 978-1-4503-9407-9

Conference paper

S. Ghosh, S. Razniewski, and G. Weikum

“Class Cardinality Comparison as a Fermi Problem,” in The ACM Web Conference 2023 (WWW 2023), Austin, TX, USA, 2023.

mehr

Abstract

Questions on class cardinality comparisons are quite tricky to answer and
come with its own challenges. They require some kind of reasoning since web
documents and knowledge bases, indispensable sources of information, rarely
store direct answers to questions, such as, ``Are there more astronauts or
Physics Nobel Laureates?'' We tackle questions on class cardinality comparison
by tapping into three sources for absolute cardinalities as well as the
cardinalities of orthogonal subgroups of the classes. We propose novel
techniques for aggregating signals with partial coverage for more reliable
estimates and evaluate them on a dataset of 4005 class pairs, achieving an
accuracy of 83.7%.

BibTeX

@inproceedings{Ghosh2303.04532,
TITLE = {Class Cardinality Comparison as a {F}ermi Problem},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-9419-2},
DOI = {10.1145/3543873.3587334},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Questions on class cardinality comparisons are quite tricky to answer and<br>come with its own challenges. They require some kind of reasoning since web<br>documents and knowledge bases, indispensable sources of information, rarely<br>store direct answers to questions, such as, ``Are there more astronauts or<br>Physics Nobel Laureates?'' We tackle questions on class cardinality comparison<br>by tapping into three sources for absolute cardinalities as well as the<br>cardinalities of orthogonal subgroups of the classes. We propose novel<br>techniques for aggregating signals with partial coverage for more reliable<br>estimates and evaluate them on a dataset of 4005 class pairs, achieving an<br>accuracy of 83.7%.<br>},
BOOKTITLE = {The ACM Web Conference 2023 (WWW 2023)},
EDITOR = {Ding, YIng and Tang, Jie and Sequeda, Juan and Aroyo, Lora and Castillo, Carlos and Houben, Geert-Jan},
PAGES = {148--151},
ADDRESS = {Austin, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Class Cardinality Comparison as a Fermi Problem : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-BF05-8
%R 10.1145/3543873.3587334
%D 2023
%B ACM Web Conference
%Z date of event: 2023-04-30 - 2023-05-04
%C Austin, TX, USA
%X   Questions on class cardinality comparisons are quite tricky to answer and<br>come with its own challenges. They require some kind of reasoning since web<br>documents and knowledge bases, indispensable sources of information, rarely<br>store direct answers to questions, such as, ``Are there more astronauts or<br>Physics Nobel Laureates?'' We tackle questions on class cardinality comparison<br>by tapping into three sources for absolute cardinalities as well as the<br>cardinalities of orthogonal subgroups of the classes. We propose novel<br>techniques for aggregating signals with partial coverage for more reliable<br>estimates and evaluate them on a dataset of 4005 class pairs, achieving an<br>accuracy of 83.7%.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI
%B The ACM Web Conference 2023
%E Ding, YIng; Tang, Jie; Sequeda, Juan; Aroyo, Lora; Castillo, Carlos; Houben, Geert-Jan
%P 148 - 151
%I ACM
%@ 978-1-4503-9419-2

Thesis

D5IMPR-CS

J. Kalofolias

“Subgroup Discovery for Structured Target Concepts,” Universität des Saarlandes, Saarbrücken, 2023.

mehr

BibTeX

@phdthesis{Kalofolias_PhD2023,
TITLE = {Subgroup Discovery for Structured Target Concepts},
AUTHOR = {Kalofolias, Janis},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-393710},
DOI = {10.22028/D291-39371},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
}

Endnote

%0 Thesis
%A Kalofolias, Janis
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Subgroup Discovery for Structured Target Concepts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-FE96-D
%R 10.22028/D291-39371
%U urn:nbn:de:bsz:291--ds-393710
%F OTHER: hdl:20.500.11880/35569
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2023
%P xi, 215 p.
%V phd
%9 phd
%U https://scidok.sulb.uni-saarland.de/handle/20.500.11880/35569

Conference paper

T.-P. Nguyen, S. Razniewski, A. Varde, and G. Weikum

“Extracting Cultural Commonsense Knowledge at Scale,” in The ACM Web Conference 2023 (WWW 2023), Austin, TX, USA, 2023.

mehr

BibTeX

@inproceedings{Nguyen_WWW23,
TITLE = {Extracting Cultural Commonsense Knowledge at Scale},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Varde, Aparna and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-9416-1},
DOI = {10.1145/3543507.3583535},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {The ACM Web Conference 2023 (WWW 2023)},
EDITOR = {Ding, YIng and Tang, Jie and Sequeda, Juan and Aroyo, Lora and Castillo, Carlos and Houben, Geert-Jan},
PAGES = {1907--1917},
ADDRESS = {Austin, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Varde, Aparna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Extracting Cultural Commonsense Knowledge at Scale : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-9FF7-B
%R 10.1145/3543507.3583535
%D 2023
%B ACM Web Conference
%Z date of event: 2023-04-30 - 2023-05-04
%C Austin, TX, USA
%B The ACM Web
Conference 2023
%E Ding, YIng; Tang, Jie; Sequeda, Juan; Aroyo, Lora; Castillo, Carlos; Houben, Geert-Jan
%P 1907 - 1917
%I ACM
%@ 978-1-4503-9416-1

Article

T.-P. Nguyen, S. Razniewski, J. Romero, and G. Weikum

“Refined Commonsense Knowledge from Large-Scale Web Contents,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 8, 2023.

mehr

BibTeX

@article{Nguyen_TKDE_2022,
TITLE = {Refined Commonsense Knowledge from Large-Scale Web Contents},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Romero, Julien and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1558-2191},
DOI = {10.1109/TKDE.2022.3206505},
PUBLISHER = {IEEE},
ADDRESS = {Piscataway, NJ},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
JOURNAL = {IEEE Transactions on Knowledge and Data Engineering},
VOLUME = {35},
NUMBER = {8},
PAGES = {8431--8447},
}

Endnote

%0 Journal Article
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Romero, Julien
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Refined Commonsense Knowledge from Large-Scale Web Contents : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-9FEE-6
%R 10.1109/TKDE.2022.3206505
%7 2022
%D 2023
%J IEEE Transactions on Knowledge and Data Engineering
%V 35
%N 8
%& 8431
%P 8431 - 8447
%I IEEE
%C Piscataway, NJ
%@ false

Article

J. Z. Pan, S. Razniewski, J.-C. Kalo, S. Singhania, J. Chen, S. Dietze, H. Jabeen, J. Omeliyanenko, W. Zhang, M. Lissandrini, R. Biswas, G. de Melo, A. Bonifati, E. Vakaj, M. Dragoni, and D. Graux

“Large Language Models and Knowledge Graphs: Opportunities and Challenges,” Transactions on Graph Data and Knowledge, vol. 1, no. 1, 2023.

mehr

BibTeX

@article{Pan23TGDK,
TITLE = {Large Language Models and Knowledge Graphs: Opportunities and Challenges},
AUTHOR = {Pan, Jeff Z. and Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Chen, Jiaoyan and Dietze, Stefan and Jabeen, Hajira and Omeliyanenko, Janna and Zhang, Wen and Lissandrini, Matteo and Biswas, Russa and de Melo, Gerard and Bonifati, Angela and Vakaj, Edlira and Dragoni, Mauro and Graux, Damien},
LANGUAGE = {eng},
ISSN = {2942-7517},
URL = {urn:nbn:de:0030-drops-194766},
DOI = {10.4230/TGDK.1.1.2},
PUBLISHER = {Schloss Dagstuhl},
ADDRESS = {Wadern},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
JOURNAL = {Transactions on Graph Data and Knowledge},
VOLUME = {1},
NUMBER = {1},
PAGES = {1--38},
EID = {2},
}

Endnote

%0 Journal Article
%A Pan, Jeff Z.
%A Razniewski, Simon
%A Kalo, Jan-Christoph
%A Singhania, Sneha
%A Chen, Jiaoyan
%A Dietze, Stefan
%A Jabeen, Hajira
%A Omeliyanenko, Janna
%A Zhang, Wen
%A Lissandrini, Matteo
%A Biswas, Russa
%A de Melo, Gerard
%A Bonifati, Angela
%A Vakaj, Edlira
%A Dragoni, Mauro
%A Graux, Damien
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Large Language Models and Knowledge Graphs: Opportunities and Challenges : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5F9-9
%R 10.4230/TGDK.1.1.2
%U urn:nbn:de:0030-drops-194766
%7 2023-12-19
%D 2023
%J Transactions on Graph Data and Knowledge
%O TGDK
%V 1
%N 1
%& 1
%P 1 - 38
%Z sequence number: 2
%I Schloss Dagstuhl
%C Wadern
%@ false

Paper

J. Z. Pan, S. Razniewski, J.-C. Kalo, S. Singhania, J. Chen, S. Dietze, H. Jabeen, J. Omeliyanenko, W. Zhang, M. Lissandrini, R. Biswas, G. de Melo, A. Bonifati, E. Vakaj, M. Dragoni, and D. Graux

“Large Language Models and Knowledge Graphs: Opportunities and Challenges,” 2023. [Online]. Available: https://arxiv.org/abs/2308.06374.

mehr

Abstract

Large Language Models (LLMs) have taken Knowledge Representation -- and the
world -- by storm. This inflection point marks a shift from explicit knowledge
representation to a renewed focus on the hybrid representation of both explicit
knowledge and parametric knowledge. In this position paper, we will discuss
some of the common debate points within the community on LLMs (parametric
knowledge) and Knowledge Graphs (explicit knowledge) and speculate on
opportunities and visions that the renewed focus brings, as well as related
research topics and challenges.

BibTeX

@online{Pan2308.06374,
TITLE = {Large Language Models and Knowledge Graphs: Opportunities and Challenges},
AUTHOR = {Pan, Jeff Z. and Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Chen, Jiaoyan and Dietze, Stefan and Jabeen, Hajira and Omeliyanenko, Janna and Zhang, Wen and Lissandrini, Matteo and Biswas, Russa and de Melo, Gerard and Bonifati, Angela and Vakaj, Edlira and Dragoni, Mauro and Graux, Damien},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2308.06374},
EPRINT = {2308.06374},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Large Language Models (LLMs) have taken Knowledge Representation -- and the<br>world -- by storm. This inflection point marks a shift from explicit knowledge<br>representation to a renewed focus on the hybrid representation of both explicit<br>knowledge and parametric knowledge. In this position paper, we will discuss<br>some of the common debate points within the community on LLMs (parametric<br>knowledge) and Knowledge Graphs (explicit knowledge) and speculate on<br>opportunities and visions that the renewed focus brings, as well as related<br>research topics and challenges.<br>},
}

Endnote

%0 Report
%A Pan, Jeff Z.
%A Razniewski, Simon
%A Kalo, Jan-Christoph
%A Singhania, Sneha
%A Chen, Jiaoyan
%A Dietze, Stefan
%A Jabeen, Hajira
%A Omeliyanenko, Janna
%A Zhang, Wen
%A Lissandrini, Matteo
%A Biswas, Russa
%A de Melo, Gerard
%A Bonifati, Angela
%A Vakaj, Edlira
%A Dragoni, Mauro
%A Graux, Damien
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Large Language Models and Knowledge Graphs: Opportunities and Challenges : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-A223-4
%U https://arxiv.org/abs/2308.06374
%D 2023
%X   Large Language Models (LLMs) have taken Knowledge Representation -- and the<br>world -- by storm. This inflection point marks a shift from explicit knowledge<br>representation to a renewed focus on the hybrid representation of both explicit<br>knowledge and parametric knowledge. In this position paper, we will discuss<br>some of the common debate points within the community on LLMs (parametric<br>knowledge) and Knowledge Graphs (explicit knowledge) and speculate on<br>opportunities and visions that the renewed focus brings, as well as related<br>research topics and challenges.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL

Article

M. Puri, A. S. Varde, and G. de Melo

“Commonsense Based Text Mining on Urban Policy,” Language Resources and Evaluation, vol. 57, 2023.

mehr

BibTeX

@article{Puri2022,
TITLE = {Commonsense Based Text Mining on Urban Policy},
AUTHOR = {Puri, Manish and Varde, Aparna S. and de Melo, Gerard},
LANGUAGE = {eng},
ISSN = {1574-020X; 1572-0218; 1572-8412; 1574-0218; 0010-4817},
DOI = {10.1007/s10579-022-09584-6},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
JOURNAL = {Language Resources and Evaluation},
VOLUME = {57},
PAGES = {733--763},
}

Endnote

%0 Journal Article
%A Puri, Manish
%A Varde, Aparna S.
%A de Melo, Gerard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Commonsense Based Text Mining on Urban Policy : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-20AC-0
%R 10.1007/s10579-022-09584-6
%7 2022-02-25
%D 2023
%J Language Resources and Evaluation
%O Computers and the Humanities Lang Resources & Evaluation
%V 57
%& 733
%P 733 - 763
%I Springer
%C New York, NY
%@ false
%U https://rdcu.be/cJwGl

Paper

S. Razniewski, H. Arnaout, S. Ghosh, and F. Suchanek

“Completeness, Recall, and Negation in Open-World Knowledge Bases: A Survey,” 2023. [Online]. Available: https://arxiv.org/abs/2305.05403.

mehr

Abstract

General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric
AI. Many of them are constructed pragmatically from Web sources, and are thus
far from complete. This poses challenges for the consumption as well as the
curation of their content. While several surveys target the problem of
completing incomplete KBs, the first problem is arguably to know whether and
where the KB is incomplete in the first place, and to which degree.
In this survey we discuss how knowledge about completeness, recall, and
negation in KBs can be expressed, extracted, and inferred. We cover (i) the
logical foundations of knowledge representation and querying under partial
closed-world semantics; (ii) the estimation of this information via statistical
patterns; (iii) the extraction of information about recall from KBs and text;
(iv) the identification of interesting negative statements; and (v) relaxed
notions of relative recall.
This survey is targeted at two types of audiences: (1) practitioners who are
interested in tracking KB quality, focusing extraction efforts, and building
quality-aware downstream applications; and (2) data management, knowledge base
and semantic web researchers who wish to understand the state of the art of
knowledge bases beyond the open-world assumption. Consequently, our survey
presents both fundamental methodologies and their working, and gives
practice-oriented recommendations on how to choose between different approaches
for a problem at hand.

BibTeX

@online{Razniewski_2305.05403,
TITLE = {Completeness, Recall, and Negation in Open-World Knowledge Bases: A Survey},
AUTHOR = {Razniewski, Simon and Arnaout, Hiba and Ghosh, Shrestha and Suchanek, Fabian},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2305.05403},
EPRINT = {2305.05403},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric<br>AI. Many of them are constructed pragmatically from Web sources, and are thus<br>far from complete. This poses challenges for the consumption as well as the<br>curation of their content. While several surveys target the problem of<br>completing incomplete KBs, the first problem is arguably to know whether and<br>where the KB is incomplete in the first place, and to which degree.<br> In this survey we discuss how knowledge about completeness, recall, and<br>negation in KBs can be expressed, extracted, and inferred. We cover (i) the<br>logical foundations of knowledge representation and querying under partial<br>closed-world semantics; (ii) the estimation of this information via statistical<br>patterns; (iii) the extraction of information about recall from KBs and text;<br>(iv) the identification of interesting negative statements; and (v) relaxed<br>notions of relative recall.<br> This survey is targeted at two types of audiences: (1) practitioners who are<br>interested in tracking KB quality, focusing extraction efforts, and building<br>quality-aware downstream applications; and (2) data management, knowledge base<br>and semantic web researchers who wish to understand the state of the art of<br>knowledge bases beyond the open-world assumption. Consequently, our survey<br>presents both fundamental methodologies and their working, and gives<br>practice-oriented recommendations on how to choose between different approaches<br>for a problem at hand.<br>},
}

Endnote

%0 Report
%A Razniewski, Simon
%A Arnaout, Hiba
%A Ghosh, Shrestha
%A Suchanek, Fabian
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Completeness, Recall, and Negation in Open-World Knowledge Bases: A
  Survey : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-1C00-4
%U https://arxiv.org/abs/2305.05403
%D 2023
%X   General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric<br>AI. Many of them are constructed pragmatically from Web sources, and are thus<br>far from complete. This poses challenges for the consumption as well as the<br>curation of their content. While several surveys target the problem of<br>completing incomplete KBs, the first problem is arguably to know whether and<br>where the KB is incomplete in the first place, and to which degree.<br>  In this survey we discuss how knowledge about completeness, recall, and<br>negation in KBs can be expressed, extracted, and inferred. We cover (i) the<br>logical foundations of knowledge representation and querying under partial<br>closed-world semantics; (ii) the estimation of this information via statistical<br>patterns; (iii) the extraction of information about recall from KBs and text;<br>(iv) the identification of interesting negative statements; and (v) relaxed<br>notions of relative recall.<br>  This survey is targeted at two types of audiences: (1) practitioners who are<br>interested in tracking KB quality, focusing extraction efforts, and building<br>quality-aware downstream applications; and (2) data management, knowledge base<br>and semantic web researchers who wish to understand the state of the art of<br>knowledge bases beyond the open-world assumption. Consequently, our survey<br>presents both fundamental methodologies and their working, and gives<br>practice-oriented recommendations on how to choose between different approaches<br>for a problem at hand.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Databases, cs.DB,Computer Science, Digital Libraries, cs.DL

Proceedings

S. Razniewski, J.-C. Kalo, S. Singhania, and J. Z. Pan

Eds., Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge. CEUR-WS, 2023.

mehr

BibTeX

@proceedings{RazniewskiKBC23,
TITLE = {Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge (KBC-LM-LM-KBC 2023)},
EDITOR = {Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Pan, Jeff Z.},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-3577-1},
PUBLISHER = {CEUR-WS},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3577},
ADDRESS = {Athens, Greece},
}

Endnote

%0 Conference Proceedings
%E Razniewski, Simon
%E Kalo, Jan-Christoph
%E Singhania, Sneha
%E Pan, Jeff Z.
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Joint Proceedings of the KBC-LM Workshop and the LM-KBC Challenge  : JJoint proceedings of the 1st workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM) and the 2nd challenge on Language Models for Knowledge Base Construction (LM-KBC)
co-located with the 22nd International Semantic Web Conference (ISWC 2023)
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-B5FE-4
%U urn:nbn:de:0074-3577-1
%I CEUR-WS
%D 2023
%B 1st Workshop on Knowledge Base Construction from Pre-Trained Language Models
%Z date of event: 2023-11-06 - 2023-11-06
%D 2023
%C Athens, Greece
%S CEUR Workshop Proceedings
%V 3577
%@ false
%U https://ceur-ws.org/Vol-3577/

Paper

S. Singhania, S. Razniewski, and G. Weikum

“Extracting Multi-valued Relations from Language Models,” 2023. [Online]. Available: https://arxiv.org/abs/2307.03122v2.

mehr

Abstract

The widespread usage of latent language representations via pre-trained
language models (LMs) suggests that they are a promising source of structured
knowledge. However, existing methods focus only on a single object per
subject-relation pair, even though often multiple objects are correct. To
overcome this limitation, we analyze these representations for their potential
to yield materialized multi-object relational knowledge. We formulate the
problem as a rank-then-select task. For ranking candidate objects, we evaluate
existing prompting techniques and propose new ones incorporating domain
knowledge. Among the selection methods, we find that choosing objects with a
likelihood above a learned relation-specific threshold gives a 49.5% F1 score.
Our results highlight the difficulty of employing LMs for the multi-valued
slot-filling task and pave the way for further research on extracting
relational knowledge from latent language representations.

BibTeX

@online{Singhania2307.03122,
TITLE = {Extracting Multi-valued Relations from Language Models},
AUTHOR = {Singhania, Sneha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2307.03122v2},
EPRINT = {2307.03122},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {The widespread usage of latent language representations via pre-trained<br>language models (LMs) suggests that they are a promising source of structured<br>knowledge. However, existing methods focus only on a single object per<br>subject-relation pair, even though often multiple objects are correct. To<br>overcome this limitation, we analyze these representations for their potential<br>to yield materialized multi-object relational knowledge. We formulate the<br>problem as a rank-then-select task. For ranking candidate objects, we evaluate<br>existing prompting techniques and propose new ones incorporating domain<br>knowledge. Among the selection methods, we find that choosing objects with a<br>likelihood above a learned relation-specific threshold gives a 49.5% F1 score.<br>Our results highlight the difficulty of employing LMs for the multi-valued<br>slot-filling task and pave the way for further research on extracting<br>relational knowledge from latent language representations.<br>},
}

Endnote

%0 Report
%A Singhania, Sneha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Extracting Multi-valued Relations from Language Models : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-938B-0
%U https://arxiv.org/abs/2307.03122v2
%D 2023
%X   The widespread usage of latent language representations via pre-trained<br>language models (LMs) suggests that they are a promising source of structured<br>knowledge. However, existing methods focus only on a single object per<br>subject-relation pair, even though often multiple objects are correct. To<br>overcome this limitation, we analyze these representations for their potential<br>to yield materialized multi-object relational knowledge. We formulate the<br>problem as a rank-then-select task. For ranking candidate objects, we evaluate<br>existing prompting techniques and propose new ones incorporating domain<br>knowledge. Among the selection methods, we find that choosing objects with a<br>likelihood above a learned relation-specific threshold gives a 49.5% F1 score.<br>Our results highlight the difficulty of employing LMs for the multi-valued<br>slot-filling task and pave the way for further research on extracting<br>relational knowledge from latent language representations.<br>
%K Computer Science, Computation and Language, cs.CL

Conference paper

G. H. Torbati, G. Weikum, and A. Yates

“Search-based Recommendation : The Case for Difficult Predictions,” in The ACM Web Conference 2023 (WWW 2023), Austin, TX, USA, 2023.

mehr

Abstract

BibTeX

@inproceedings{Torbati_WWW23,
TITLE = {Search-based Recommendation : {T}he Case for Difficult Predictions},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Weikum, Gerhard and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-1-4503-9419-2},
DOI = {10.1145/3543873.3587374},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Questions on class cardinality comparisons are quite tricky to answer and<br>come with its own challenges. They require some kind of reasoning since web<br>documents and knowledge bases, indispensable sources of information, rarely<br>store direct answers to questions, such as, ``Are there more astronauts or<br>Physics Nobel Laureates?'' We tackle questions on class cardinality comparison<br>by tapping into three sources for absolute cardinalities as well as the<br>cardinalities of orthogonal subgroups of the classes. We propose novel<br>techniques for aggregating signals with partial coverage for more reliable<br>estimates and evaluate them on a dataset of 4005 class pairs, achieving an<br>accuracy of 83.7%.<br>},
BOOKTITLE = {The ACM Web Conference 2023 (WWW 2023)},
EDITOR = {Ding, Ying and Tang, Jie and Sequeda, Juan and Aroyo, Lora and Castillo, Carlos and Houben, Geert-Jan},
PAGES = {318--321},
ADDRESS = {Austin, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Weikum, Gerhard
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Search-based Recommendation : The Case for Difficult Predictions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DC45-F
%R 10.1145/3543873.3587374
%D 2023
%B ACM Web Conference
%Z date of event: 2023-04-30 - 2023-05-04
%C Austin, TX, USA
%X   Questions on class cardinality comparisons are quite tricky to answer and<br>come with its own challenges. They require some kind of reasoning since web<br>documents and knowledge bases, indispensable sources of information, rarely<br>store direct answers to questions, such as, ``Are there more astronauts or<br>Physics Nobel Laureates?'' We tackle questions on class cardinality comparison<br>by tapping into three sources for absolute cardinalities as well as the<br>cardinalities of orthogonal subgroups of the classes. We propose novel<br>techniques for aggregating signals with partial coverage for more reliable<br>estimates and evaluate them on a dataset of 4005 class pairs, achieving an<br>accuracy of 83.7%.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI
%B The ACM Web Conference 2023
%E Ding, Ying; Tang, Jie; Sequeda, Juan; Aroyo, Lora; Castillo, Carlos; Houben, Geert-Jan
%P 318 - 321
%I ACM
%@ 978-1-4503-9419-2

Conference paper

G. H. Torbati, A. Tigunova, and G. Weikum

“Unveiling Challenging Cases in Text-based Recommender Systems,” in Perspectives on the Evaluation of Recommender Systems 2023, Singapore, 2023.

mehr

BibTeX

@inproceedings{Torbati_PERSPECTIVES23,
TITLE = {Unveiling Challenging Cases in Text-based Recommender Systems},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Tigunova, Anna and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {https://ceur-ws.org/Vol-3476/paper5.pdf; urn:nbn:de:0074-3476-4},
PUBLISHER = {CEUR-WS.org},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {Perspectives on the Evaluation of Recommender Systems 2023},
EDITOR = {Said, Alain and Zangerle, Eva and Bauer, Christine},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3476},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Tigunova, Anna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Unveiling Challenging Cases in Text-based Recommender Systems : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-AEBA-E
%U https://ceur-ws.org/Vol-3476/paper5.pdf
%D 2023
%B 3rd Workshop Perspectives on the Evaluation of Recommender Systems
%Z date of event: 2023-09-19 - 2023-09-19
%C Singapore
%B Perspectives on the Evaluation of Recommender Systems 2023
%E Said, Alain; Zangerle, Eva; Bauer, Christine
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 3476
%@ false

Paper

G. H. Torbati, A. Tigunova, A. Yates, and G. Weikum

“Recommendations by Concise User Profiles from Review Text,” 2023. [Online]. Available: https://arxiv.org/abs/2311.01314.

mehr

Abstract

Recommender systems are most successful for popular items and users with
ample interactions (likes, ratings etc.). This work addresses the difficult and
underexplored case of supporting users who have very sparse interactions but
post informative review texts. Our experimental studies address two book
communities with these characteristics. We design a framework with
Transformer-based representation learning, covering user-item interactions,
item content, and user-provided reviews. To overcome interaction sparseness, we
devise techniques for selecting the most informative cues to construct concise
user profiles. Comprehensive experiments, with datasets from Amazon and
Goodreads, show that judicious selection of text snippets achieves the best
performance, even in comparison to ChatGPT-generated user profiles.

BibTeX

@online{Torbati2311.01314,
TITLE = {Recommendations by Concise User Profiles from Review Text},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Tigunova, Anna and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2311.01314},
EPRINT = {2311.01314},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Recommender systems are most successful for popular items and users with<br>ample interactions (likes, ratings etc.). This work addresses the difficult and<br>underexplored case of supporting users who have very sparse interactions but<br>post informative review texts. Our experimental studies address two book<br>communities with these characteristics. We design a framework with<br>Transformer-based representation learning, covering user-item interactions,<br>item content, and user-provided reviews. To overcome interaction sparseness, we<br>devise techniques for selecting the most informative cues to construct concise<br>user profiles. Comprehensive experiments, with datasets from Amazon and<br>Goodreads, show that judicious selection of text snippets achieves the best<br>performance, even in comparison to ChatGPT-generated user profiles.<br>},
}

Endnote

%0 Report
%A Torbati, Ghazaleh Haratinezhad
%A Tigunova, Anna
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Recommendations by Concise User Profiles from Review Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-E9AE-9
%U https://arxiv.org/abs/2311.01314
%D 2023
%X   Recommender systems are most successful for popular items and users with<br>ample interactions (likes, ratings etc.). This work addresses the difficult and<br>underexplored case of supporting users who have very sparse interactions but<br>post informative review texts. Our experimental studies address two book<br>communities with these characteristics. We design a framework with<br>Transformer-based representation learning, covering user-item interactions,<br>item content, and user-provided reviews. To overcome interaction sparseness, we<br>devise techniques for selecting the most informative cues to construct concise<br>user profiles. Comprehensive experiments, with datasets from Amazon and<br>Goodreads, show that judicious selection of text snippets achieves the best<br>performance, even in comparison to ChatGPT-generated user profiles.<br>
%K Computer Science, Information Retrieval, cs.IR

Article

B. Veseli, S. Razniewski, J.-C. Kalo, and G. Weikum

“Evaluating the Knowledge Base Completion Potential of GPT,” Findings of EMNLP 2023, 2023.

mehr

Abstract

Structured knowledge bases (KBs) are an asset for search engines and other
applications, but are inevitably incomplete. Language models (LMs) have been
proposed for unsupervised knowledge base completion (KBC), yet, their ability
to do this at scale and with high accuracy remains an open question. Prior
experimental studies mostly fall short because they only evaluate on popular
subjects, or sample already existing facts from KBs. In this work, we perform a
careful evaluation of GPT's potential to complete the largest public KB:
Wikidata. We find that, despite their size and capabilities, models like GPT-3,
ChatGPT and GPT-4 do not achieve fully convincing results on this task.
Nonetheless, they provide solid improvements over earlier approaches with
smaller LMs. In particular, we show that, with proper thresholding, GPT-3
enables to extend Wikidata by 27M facts at 90% precision.

BibTeX

@article{Veseli2310.14771,
TITLE = {Evaluating the Knowledge Base Completion Potential of {GPT}},
AUTHOR = {Veseli, Blerta and Razniewski, Simon and Kalo, Jan-Christoph and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2310.14771},
EPRINT = {2310.14771},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Structured knowledge bases (KBs) are an asset for search engines and other<br>applications, but are inevitably incomplete. Language models (LMs) have been<br>proposed for unsupervised knowledge base completion (KBC), yet, their ability<br>to do this at scale and with high accuracy remains an open question. Prior<br>experimental studies mostly fall short because they only evaluate on popular<br>subjects, or sample already existing facts from KBs. In this work, we perform a<br>careful evaluation of GPT's potential to complete the largest public KB:<br>Wikidata. We find that, despite their size and capabilities, models like GPT-3,<br>ChatGPT and GPT-4 do not achieve fully convincing results on this task.<br>Nonetheless, they provide solid improvements over earlier approaches with<br>smaller LMs. In particular, we show that, with proper thresholding, GPT-3<br>enables to extend Wikidata by 27M facts at 90% precision.<br>},
JOURNAL = {Findings of EMNLP 2023},
}

Endnote

%0 Journal Article
%A Veseli, Blerta
%A Razniewski, Simon
%A Kalo, Jan-Christoph
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Evaluating the Knowledge Base Completion Potential of GPT : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-E9BA-B
%U https://arxiv.org/abs/2310.14771
%7 2023
%D 2023
%X   Structured knowledge bases (KBs) are an asset for search engines and other<br>applications, but are inevitably incomplete. Language models (LMs) have been<br>proposed for unsupervised knowledge base completion (KBC), yet, their ability<br>to do this at scale and with high accuracy remains an open question. Prior<br>experimental studies mostly fall short because they only evaluate on popular<br>subjects, or sample already existing facts from KBs. In this work, we perform a<br>careful evaluation of GPT's potential to complete the largest public KB:<br>Wikidata. We find that, despite their size and capabilities, models like GPT-3,<br>ChatGPT and GPT-4 do not achieve fully convincing results on this task.<br>Nonetheless, they provide solid improvements over earlier approaches with<br>smaller LMs. In particular, we show that, with proper thresholding, GPT-3<br>enables to extend Wikidata by 27M facts at 90% precision.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI
%J Findings of EMNLP 2023

Conference paper

B. Veseli, S. Singhania, S. Razniewski, and G. Weikum

“Evaluating Language Models for Knowledge Base Completion,” in The Semantic Web (ESWC 2023), Hersonissos, Greece, 2023.

mehr

BibTeX

@inproceedings{Veseli_ESWC23,
TITLE = {Evaluating Language Models for Knowledge Base Completion},
AUTHOR = {Veseli, Blerta and Singhania, Sneha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-031-33454-2},
DOI = {10.1007/978-3-031-33455-9_14},
PUBLISHER = {Springer},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
DATE = {2023},
BOOKTITLE = {The Semantic Web (ESWC 2023)},
EDITOR = {Pesquita, Catia and Jimenez-Ruiz, Ernesto and McCusker, Jamie and Faria, Daniel and Dimou, Anastasia and Troncy, Raphael and Hertling, Sven},
PAGES = {227--243},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {13870},
ADDRESS = {Hersonissos, Greece},
}

Endnote

%0 Conference Proceedings
%A Veseli, Blerta
%A Singhania, Sneha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Evaluating Language Models for Knowledge Base Completion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DC39-D
%R 10.1007/978-3-031-33455-9_14
%D 2023
%B 20th Extended Semantic Web Conference
%Z date of event: 2023-05-28 - 2023-06-01
%C Hersonissos, Greece
%B The Semantic Web
%E Pesquita, Catia; Jimenez-Ruiz, Ernesto; McCusker, Jamie; Faria, Daniel; Dimou, Anastasia; Troncy, Raphael; Hertling, Sven
%P 227 - 243
%I Springer
%@ 978-3-031-33454-2
%B Lecture Notes in Computer Science
%N 13870

Paper

B. Veseli, S. Singhania, S. Razniewski, and G. Weikum

“Evaluating Language Models for Knowledge Base Completion,” 2023. [Online]. Available: https://arxiv.org/abs/2303.11082.

mehr

Abstract

Structured knowledge bases (KBs) are a foundation of many intelligent
applications, yet are notoriously incomplete. Language models (LMs) have
recently been proposed for unsupervised knowledge base completion (KBC), yet,
despite encouraging initial results, questions regarding their suitability
remain open. Existing evaluations often fall short because they only evaluate
on popular subjects, or sample already existing facts from KBs. In this work,
we introduce a novel, more challenging benchmark dataset, and a methodology
tailored for a realistic assessment of the KBC potential of LMs. For automated
assessment, we curate a dataset called WD-KNOWN, which provides an unbiased
random sample of Wikidata, containing over 3.9 million facts. In a second step,
we perform a human evaluation on predictions that are not yet in the KB, as
only this provides real insights into the added value over existing KBs. Our
key finding is that biases in dataset conception of previous benchmarks lead to
a systematic overestimate of LM performance for KBC. However, our results also
reveal strong areas of LMs. We could, for example, perform a significant
completion of Wikidata on the relations nativeLanguage, by a factor of ~21
(from 260k to 5.8M) at 82% precision, usedLanguage, by a factor of ~2.1 (from
2.1M to 6.6M) at 82% precision, and citizenOf by a factor of ~0.3 (from 4.2M to
5.3M) at 90% precision. Moreover, we find that LMs possess surprisingly strong
generalization capabilities: even on relations where most facts were not
directly observed in LM training, prediction quality can be high.

BibTeX

@online{Veseli2303.11082,
TITLE = {Evaluating Language Models for Knowledge Base Completion},
AUTHOR = {Veseli, Blerta and Singhania, Sneha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2303.11082},
EPRINT = {2303.11082},
EPRINTTYPE = {arXiv},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
ABSTRACT = {Structured knowledge bases (KBs) are a foundation of many intelligent<br>applications, yet are notoriously incomplete. Language models (LMs) have<br>recently been proposed for unsupervised knowledge base completion (KBC), yet,<br>despite encouraging initial results, questions regarding their suitability<br>remain open. Existing evaluations often fall short because they only evaluate<br>on popular subjects, or sample already existing facts from KBs. In this work,<br>we introduce a novel, more challenging benchmark dataset, and a methodology<br>tailored for a realistic assessment of the KBC potential of LMs. For automated<br>assessment, we curate a dataset called WD-KNOWN, which provides an unbiased<br>random sample of Wikidata, containing over 3.9 million facts. In a second step,<br>we perform a human evaluation on predictions that are not yet in the KB, as<br>only this provides real insights into the added value over existing KBs. Our<br>key finding is that biases in dataset conception of previous benchmarks lead to<br>a systematic overestimate of LM performance for KBC. However, our results also<br>reveal strong areas of LMs. We could, for example, perform a significant<br>completion of Wikidata on the relations nativeLanguage, by a factor of ~21<br>(from 260k to 5.8M) at 82% precision, usedLanguage, by a factor of ~2.1 (from<br>2.1M to 6.6M) at 82% precision, and citizenOf by a factor of ~0.3 (from 4.2M to<br>5.3M) at 90% precision. Moreover, we find that LMs possess surprisingly strong<br>generalization capabilities: even on relations where most facts were not<br>directly observed in LM training, prediction quality can be high.<br>},
}

Endnote

%0 Report
%A Veseli, Blerta
%A Singhania, Sneha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Evaluating Language Models for Knowledge Base Completion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-D3CD-F
%U https://arxiv.org/abs/2303.11082
%D 2023
%X   Structured knowledge bases (KBs) are a foundation of many intelligent<br>applications, yet are notoriously incomplete. Language models (LMs) have<br>recently been proposed for unsupervised knowledge base completion (KBC), yet,<br>despite encouraging initial results, questions regarding their suitability<br>remain open. Existing evaluations often fall short because they only evaluate<br>on popular subjects, or sample already existing facts from KBs. In this work,<br>we introduce a novel, more challenging benchmark dataset, and a methodology<br>tailored for a realistic assessment of the KBC potential of LMs. For automated<br>assessment, we curate a dataset called WD-KNOWN, which provides an unbiased<br>random sample of Wikidata, containing over 3.9 million facts. In a second step,<br>we perform a human evaluation on predictions that are not yet in the KB, as<br>only this provides real insights into the added value over existing KBs. Our<br>key finding is that biases in dataset conception of previous benchmarks lead to<br>a systematic overestimate of LM performance for KBC. However, our results also<br>reveal strong areas of LMs. We could, for example, perform a significant<br>completion of Wikidata on the relations nativeLanguage, by a factor of ~21<br>(from 260k to 5.8M) at 82% precision, usedLanguage, by a factor of ~2.1 (from<br>2.1M to 6.6M) at 82% precision, and citizenOf by a factor of ~0.3 (from 4.2M to<br>5.3M) at 90% precision. Moreover, we find that LMs possess surprisingly strong<br>generalization capabilities: even on relations where most facts were not<br>directly observed in LM training, prediction quality can be high.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI
%U https://github.com/bveseli/LMsForKBC

Conference paper

M. Zhang, P. Mundra, C. Chikweze, F. Nargesian, and G. Weikum

“Approximate Query Answering over Open Data,” in HILDA 2023, Workshop on Human-In-the-Loop Data Analytics, Seattle, WA, USA, 2023.

mehr

BibTeX

@inproceedings{Zhang_HILDA23,
TITLE = {Approximate Query Answering over Open Data},
AUTHOR = {Zhang, Mengqi and Mundra, Pranay and Chikweze, Chukwubuikem and Nargesian, Fatemeh and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-8-4007-0216-7},
DOI = {10.1145/3597465.3605227},
PUBLISHER = {ACM},
YEAR = {2023},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {HILDA 2023, Workshop on Human-In-the-Loop Data Analytics},
PAGES = {1--3},
EID = {11},
ADDRESS = {Seattle, WA, USA},
}

Endnote

%0 Conference Proceedings
%A Zhang, Mengqi
%A Mundra, Pranay
%A Chikweze, Chukwubuikem
%A Nargesian, Fatemeh
%A Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Approximate Query Answering over Open Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-941D-C
%R 10.1145/3597465.3605227
%D 2023
%B Workshop on Human-In-the-Loop Data Analytics
%Z date of event: 2023-06-18 - 2023-06-18
%C Seattle, WA, USA 
%B HILDA 2023
%P 1 - 3
%Z sequence number: 11
%I ACM
%@ 979-8-4007-0216-7

2022

Conference paper

H. Arnaout, T.-K. Tran, D. Stepanova, M. H. Gad-Elrab, S. Razniewski, and G. Weikum

“Utilizing Language Model Probes for Knowledge Graph Repair,” in Wiki Workshop 2022, Virtual Event, 2022.

mehr

BibTeX

@inproceedings{Arnaout_Wiki2022,
TITLE = {Utilizing Language Model Probes for Knowledge Graph Repair},
AUTHOR = {Arnaout, Hiba and Tran, Trung-Kien and Stepanova, Daria and Gad-Elrab, Mohamed Hassan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://wikiworkshop.org/2022/},
YEAR = {2022},
BOOKTITLE = {Wiki Workshop 2022},
ADDRESS = {Virtual Event},
}

Endnote

%0 Conference Proceedings
%A Arnaout, Hiba
%A Tran, Trung-Kien
%A Stepanova, Daria
%A Gad-Elrab, Mohamed Hassan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Utilizing Language Model Probes for Knowledge Graph Repair : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-63F4-3
%U https://wikiworkshop.org/2022/
%D 2022
%B Wiki Workshop 2022
%Z date of event: 2022-04-25 - 2022-04-25
%C Virtual Event
%B Wiki Workshop 2022

Conference paper

H. Arnaout, S. Razniewski, G. Weikum, and J. Z. Pan

“UnCommonSense: Informative Negative Knowledge about Everyday Concepts,” in CIKM ’22, 31st ACM International Conference on Information and Knowledge Management, Atlanta GA USA, 2022.

mehr

Abstract

Commonsense knowledge about everyday concepts is an important asset for AI
applications, such as question answering and chatbots. Recently, we have seen
an increasing interest in the construction of structured commonsense knowledge
bases (CSKBs). An important part of human commonsense is about properties that
do not apply to concepts, yet existing CSKBs only store positive statements.
Moreover, since CSKBs operate under the open-world assumption, absent
statements are considered to have unknown truth rather than being invalid. This
paper presents the UNCOMMONSENSE framework for materializing informative
negative commonsense statements. Given a target concept, comparable concepts
are identified in the CSKB, for which a local closed-world assumption is
postulated. This way, positive statements about comparable concepts that are
absent for the target concept become seeds for negative statement candidates.
The large set of candidates is then scrutinized, pruned and ranked by
informativeness. Intrinsic and extrinsic evaluations show that our method
significantly outperforms the state-of-the-art. A large dataset of informative
negations is released as a resource for future research.

BibTeX

@inproceedings{ArnaoutCIKM2022,
TITLE = {{UnCommonSense}: Informative Negative Knowledge about Everyday Concepts},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard and Pan, Jeff Z.},
LANGUAGE = {eng},
ISBN = {978-1-4503-9236-5},
DOI = {10.1145/3511808.3557484},
PUBLISHER = {ACM},
YEAR = {2022},
ABSTRACT = {Commonsense knowledge about everyday concepts is an important asset for AI<br>applications, such as question answering and chatbots. Recently, we have seen<br>an increasing interest in the construction of structured commonsense knowledge<br>bases (CSKBs). An important part of human commonsense is about properties that<br>do not apply to concepts, yet existing CSKBs only store positive statements.<br>Moreover, since CSKBs operate under the open-world assumption, absent<br>statements are considered to have unknown truth rather than being invalid. This<br>paper presents the UNCOMMONSENSE framework for materializing informative<br>negative commonsense statements. Given a target concept, comparable concepts<br>are identified in the CSKB, for which a local closed-world assumption is<br>postulated. This way, positive statements about comparable concepts that are<br>absent for the target concept become seeds for negative statement candidates.<br>The large set of candidates is then scrutinized, pruned and ranked by<br>informativeness. Intrinsic and extrinsic evaluations show that our method<br>significantly outperforms the state-of-the-art. A large dataset of informative<br>negations is released as a resource for future research.<br>},
BOOKTITLE = {CIKM '22, 31st ACM International Conference on Information and Knowledge Management},
EDITOR = {Al Hasan, Mohammad and Xiong, Li},
PAGES = {37--46},
ADDRESS = {Atlanta GA USA},
}

Endnote

%0 Conference Proceedings
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%A Pan, Jeff Z.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T UnCommonSense: Informative Negative Knowledge about Everyday Concepts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-F224-C
%R 10.1145/3511808.3557484
%D 2022
%B 31st ACM International Conference on Information and Knowledge Management
%Z date of event: 2022-10-17 - 2022-10-21
%C Atlanta GA USA
%X   Commonsense knowledge about everyday concepts is an important asset for AI<br>applications, such as question answering and chatbots. Recently, we have seen<br>an increasing interest in the construction of structured commonsense knowledge<br>bases (CSKBs). An important part of human commonsense is about properties that<br>do not apply to concepts, yet existing CSKBs only store positive statements.<br>Moreover, since CSKBs operate under the open-world assumption, absent<br>statements are considered to have unknown truth rather than being invalid. This<br>paper presents the UNCOMMONSENSE framework for materializing informative<br>negative commonsense statements. Given a target concept, comparable concepts<br>are identified in the CSKB, for which a local closed-world assumption is<br>postulated. This way, positive statements about comparable concepts that are<br>absent for the target concept become seeds for negative statement candidates.<br>The large set of candidates is then scrutinized, pruned and ranked by<br>informativeness. Intrinsic and extrinsic evaluations show that our method<br>significantly outperforms the state-of-the-art. A large dataset of informative<br>negations is released as a resource for future research.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB,Computer Science, Information Retrieval, cs.IR
%B CIKM '22
%E Al Hasan, Mohammad; Xiong, Li
%P 37 - 46
%I ACM
%@ 978-1-4503-9236-5

Paper

H. Arnaout, S. Razniewski, G. Weikum, and J. Z. Pan

“UnCommonSense: Informative Negative Knowledge about Everyday Concepts,” 2022. [Online]. Available: https://arxiv.org/abs/2208.09292.

mehr

Abstract

BibTeX

@online{Arnaout2208.09292,
TITLE = {{UnCommonSense}: Informative Negative Knowledge about Everyday Concepts},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard and Pan, Jeff Z.},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2208.09292},
EPRINT = {2208.09292},
EPRINTTYPE = {arXiv},
YEAR = {2022},
ABSTRACT = {Commonsense knowledge about everyday concepts is an important asset for AI<br>applications, such as question answering and chatbots. Recently, we have seen<br>an increasing interest in the construction of structured commonsense knowledge<br>bases (CSKBs). An important part of human commonsense is about properties that<br>do not apply to concepts, yet existing CSKBs only store positive statements.<br>Moreover, since CSKBs operate under the open-world assumption, absent<br>statements are considered to have unknown truth rather than being invalid. This<br>paper presents the UNCOMMONSENSE framework for materializing informative<br>negative commonsense statements. Given a target concept, comparable concepts<br>are identified in the CSKB, for which a local closed-world assumption is<br>postulated. This way, positive statements about comparable concepts that are<br>absent for the target concept become seeds for negative statement candidates.<br>The large set of candidates is then scrutinized, pruned and ranked by<br>informativeness. Intrinsic and extrinsic evaluations show that our method<br>significantly outperforms the state-of-the-art. A large dataset of informative<br>negations is released as a resource for future research.<br>},
}

Endnote

%0 Report
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%A Pan, Jeff Z.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T UnCommonSense: Informative Negative Knowledge about Everyday Concepts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-1651-0
%U https://arxiv.org/abs/2208.09292
%D 2022
%X   Commonsense knowledge about everyday concepts is an important asset for AI<br>applications, such as question answering and chatbots. Recently, we have seen<br>an increasing interest in the construction of structured commonsense knowledge<br>bases (CSKBs). An important part of human commonsense is about properties that<br>do not apply to concepts, yet existing CSKBs only store positive statements.<br>Moreover, since CSKBs operate under the open-world assumption, absent<br>statements are considered to have unknown truth rather than being invalid. This<br>paper presents the UNCOMMONSENSE framework for materializing informative<br>negative commonsense statements. Given a target concept, comparable concepts<br>are identified in the CSKB, for which a local closed-world assumption is<br>postulated. This way, positive statements about comparable concepts that are<br>absent for the target concept become seeds for negative statement candidates.<br>The large set of candidates is then scrutinized, pruned and ranked by<br>informativeness. Intrinsic and extrinsic evaluations show that our method<br>significantly outperforms the state-of-the-art. A large dataset of informative<br>negations is released as a resource for future research.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR

Conference paper

I. Chernyavsky, A. S. Varde, and S. Razniewski

“CSK-Detector: Commonsense in Object Detection,” in IEEE International Conference on Big Data, Osaka, Japan, 2022.

mehr

BibTeX

@inproceedings{ChernyavskyBIGDATA22,
TITLE = {{CSK-Detector}: {C}ommonsense in object detection},
AUTHOR = {Chernyavsky, Irina and Varde, Aparna S. and Razniewski, Simon},
LANGUAGE = {eng},
ISBN = {978-1-6654-8045-1},
DOI = {10.1109/BigData55660.2022.10020915},
PUBLISHER = {IEEE},
YEAR = {2022},
BOOKTITLE = {IEEE International Conference on Big Data},
PAGES = {6609--6612},
ADDRESS = {Osaka, Japan},
}

Endnote

%0 Conference Proceedings
%A Chernyavsky, Irina
%A Varde, Aparna S.
%A Razniewski, Simon
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CSK-Detector: Commonsense in Object Detection : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-B77F-8
%R 10.1109/BigData55660.2022.10020915
%D 2022
%B IEEE International Conference on Big Data
%Z date of event: 2022-12-17 - 2022-12-20
%C Osaka, Japan
%B IEEE International Conference on Big Data
%P 6609 - 6612
%I IEEE
%@ 978-1-6654-8045-1

Conference paper

P. Christmann, R. Saha Roy, and G. Weikum

“Conversational Question Answering on Heterogeneous Sources,” in SIGIR ’22, 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 2022.

mehr

BibTeX

@inproceedings{Christmann_SIGIR2022,
TITLE = {Conversational Question Answering on Heterogeneous Sources},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8732-3},
DOI = {10.1145/3477495.3531815},
PUBLISHER = {ACM},
YEAR = {2022},
BOOKTITLE = {SIGIR '22, 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Amigo, Enrique and Castells, Pablo and Gonzalo, Julio and Carterett, Ben and Culpepper, J. Shane and Kazai, Gabriella},
PAGES = {144--154},
ADDRESS = {Madrid, Spain},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Conversational Question Answering on Heterogeneous Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-6148-8
%R 10.1145/3477495.3531815
%D 2022
%B 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2022-07-11 - 2022-07-15
%C Madrid, Spain
%B SIGIR '22
%E Amigo, Enrique; Castells, Pablo; Gonzalo, Julio; Carterett, Ben; Culpepper, J. Shane; Kazai, Gabriella
%P 144 - 154
%I ACM
%@ 978-1-4503-8732-3

Conference paper

P. Christmann, R. Saha Roy, and G. Weikum

“Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases,” in WSDM ’22, Fifteenth ACM International Conference on Web Search and Data Mining, Tempe, AZ, USA (Virutal Event), 2022.

mehr

BibTeX

@inproceedings{Christmann_WSDM22,
TITLE = {Beyond {NED}: {F}ast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-9132-0},
DOI = {10.1145/3488560.3498488},
PUBLISHER = {ACM},
YEAR = {2022},
BOOKTITLE = {WSDM '22, Fifteenth ACM International Conference on Web Search and Data Mining},
PAGES = {172--180},
ADDRESS = {Tempe, AZ, USA (Virutal Event)},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-27C6-B
%R 10.1145/3488560.3498488
%D 2022
%B Fifteenth ACM International Conference on Web Search and Data Mining
%Z date of event: 2022-02-21 - 2022-02-25
%C Tempe, AZ, USA (Virutal Event)
%B WSDM '22
%P 172 - 180
%I ACM
%@ 978-1-4503-9132-0

Conference paper

P. Christmann, R. Saha Roy, and G. Weikum

“Question Entity and Relation Linking to Knowledge Bases via CLOCQ,” in Joint Proceedings of SemREC 2022 and SMART 2022 co-located with 21st International Semantic Web Conference (ISWC 2022), Hybrid Event, Hanghzou, China, 2022.

mehr

BibTeX

@inproceedings{Christmann_SMART22,
TITLE = {Question Entity and Relation Linking to Knowledge Bases via {CLOCQ}},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://ceur-ws.org/Vol-3337/smart-paper1.pdf; urn:nbn:de:0074-3337-1},
PUBLISHER = {CEUR-WS.org},
YEAR = {2022},
BOOKTITLE = {Joint Proceedings of SemREC 2022 and SMART 2022 co-located with 21st International Semantic Web Conference (ISWC 2022)},
EDITOR = {Singh, Gunjan and Mutharaju, Raghava and Kapanipathi, Pavan and Mihindukulasooriya, Nandana and Dubey, Mohnish and Usbeck, Ricardo and Banerjee, Debayan},
PAGES = {33--47},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3337},
ADDRESS = {Hybrid Event, Hanghzou, China},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Question Entity and Relation Linking to Knowledge Bases via CLOCQ : Question Entity and Relation Linking to Knowledge Bases via {CLOCQ}
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-9595-3
%U https://ceur-ws.org/Vol-3337/smart-paper1.pdf
%D 2022
%B 2nd Semantic Reasoning Evaluation Challenge 3rd SeMantic Answer Type, Relation and Entity Prediction Tasks Challengeand 
%Z date of event: 2022-10-24 - 2022-10-27
%C Hybrid Event, Hanghzou, China
%B Joint Proceedings of SemREC 2022 and SMART 2022
co-located with 21st International Semantic Web Conference (ISWC 2022)
%E Singh, Gunjan; Mutharaju, Raghava; Kapanipathi, Pavan; Mihindukulasooriya, Nandana; Dubey, Mohnish; Usbeck, Ricardo; Banerjee, Debayan
%P 33 - 47
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 3337

Paper

P. Christmann, R. Saha Roy, and G. Weikum

“Conversational Question Answering on Heterogeneous Sources,” 2022. [Online]. Available: https://arxiv.org/abs/2204.11677.

mehr

Abstract

Conversational question answering (ConvQA) tackles sequential information
needs where contexts in follow-up questions are left implicit. Current ConvQA
systems operate over homogeneous sources of information: either a knowledge
base (KB), or a text corpus, or a collection of tables. This paper addresses
the novel issue of jointly tapping into all of these together, this way
boosting answer coverage and confidence. We present CONVINSE, an end-to-end
pipeline for ConvQA over heterogeneous sources, operating in three stages: i)
learning an explicit structured representation of an incoming question and its
conversational context, ii) harnessing this frame-like representation to
uniformly capture relevant evidences from KB, text, and tables, and iii)
running a fusion-in-decoder model to generate the answer. We construct and
release the first benchmark, ConvMix, for ConvQA over heterogeneous sources,
comprising 3000 real-user conversations with 16000 questions, along with entity
annotations, completed question utterances, and question paraphrases.
Experiments demonstrate the viability and advantages of our method, compared to
state-of-the-art baselines.

BibTeX

@online{Christmann2204.11677,
TITLE = {Conversational Question Answering on Heterogeneous Sources},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2204.11677},
EPRINT = {2204.11677},
EPRINTTYPE = {arXiv},
YEAR = {2022},
ABSTRACT = {Conversational question answering (ConvQA) tackles sequential information<br>needs where contexts in follow-up questions are left implicit. Current ConvQA<br>systems operate over homogeneous sources of information: either a knowledge<br>base (KB), or a text corpus, or a collection of tables. This paper addresses<br>the novel issue of jointly tapping into all of these together, this way<br>boosting answer coverage and confidence. We present CONVINSE, an end-to-end<br>pipeline for ConvQA over heterogeneous sources, operating in three stages: i)<br>learning an explicit structured representation of an incoming question and its<br>conversational context, ii) harnessing this frame-like representation to<br>uniformly capture relevant evidences from KB, text, and tables, and iii)<br>running a fusion-in-decoder model to generate the answer. We construct and<br>release the first benchmark, ConvMix, for ConvQA over heterogeneous sources,<br>comprising 3000 real-user conversations with 16000 questions, along with entity<br>annotations, completed question utterances, and question paraphrases.<br>Experiments demonstrate the viability and advantages of our method, compared to<br>state-of-the-art baselines.<br>},
}

Endnote

%0 Report
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Conversational Question Answering on Heterogeneous Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-164E-5
%U https://arxiv.org/abs/2204.11677
%D 2022
%X   Conversational question answering (ConvQA) tackles sequential information<br>needs where contexts in follow-up questions are left implicit. Current ConvQA<br>systems operate over homogeneous sources of information: either a knowledge<br>base (KB), or a text corpus, or a collection of tables. This paper addresses<br>the novel issue of jointly tapping into all of these together, this way<br>boosting answer coverage and confidence. We present CONVINSE, an end-to-end<br>pipeline for ConvQA over heterogeneous sources, operating in three stages: i)<br>learning an explicit structured representation of an incoming question and its<br>conversational context, ii) harnessing this frame-like representation to<br>uniformly capture relevant evidences from KB, text, and tables, and iii)<br>running a fusion-in-decoder model to generate the answer. We construct and<br>release the first benchmark, ConvMix, for ConvQA over heterogeneous sources,<br>comprising 3000 real-user conversations with 16000 questions, along with entity<br>annotations, completed question utterances, and question paraphrases.<br>Experiments demonstrate the viability and advantages of our method, compared to<br>state-of-the-art baselines.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Thesis

D5IMPR-CS

C. X. Chu

“Knowledge Extraction from Fictional Texts,” Universität des Saarlandes, Saarbrücken, 2022.

mehr

Abstract

Knowledge extraction from text is a key task in natural language processing, which involves many sub-tasks, such as taxonomy induction, named entity recognition and typing, relation extraction, knowledge canonicalization and so on. By constructing structured knowledge from natural language text, knowledge extraction becomes a key asset for search engines, question answering and other downstream applications. However, current knowledge extraction methods mostly focus on prominent real-world entities with Wikipedia and mainstream news articles as sources. The constructed knowledge bases, therefore, lack information about long-tail domains, with fiction and fantasy as archetypes. Fiction and fantasy are core parts of our human culture, spanning from literature to movies, TV series, comics and video games. With thousands of fictional universes which have been created, knowledge from fictional domains are subject of search-engine queries - by fans as well as cultural analysts. Unlike the real-world domain, knowledge extraction on such specific domains like fiction and fantasy has to tackle several key challenges: - Training data: Sources for fictional domains mostly come from books and fan-built content, which is sparse and noisy, and contains difficult structures of texts, such as dialogues and quotes. Training data for key tasks such as taxonomy induction, named entity typing or relation extraction are also not available. - Domain characteristics and diversity: Fictional universes can be highly sophisticated, containing entities, social structures and sometimes languages that are completely different from the real world. State-of-the-art methods for knowledge extraction make assumptions on entity-class, subclass and entity-entity relations that are often invalid for fictional domains. With different genres of fictional domains, another requirement is to transfer models across domains. - Long fictional texts: While state-of-the-art models have limitations on the input sequence length, it is essential to develop methods that are able to deal with very long texts (e.g. entire books), to capture multiple contexts and leverage widely spread cues. This dissertation addresses the above challenges, by developing new methodologies that advance the state of the art on knowledge extraction in fictional domains. - The first contribution is a method, called TiFi, for constructing type systems (taxonomy induction) for fictional domains. By tapping noisy fan-built content from online communities such as Wikia, TiFi induces taxonomies through three main steps: category cleaning, edge cleaning and top-level construction. Exploiting a variety of features from the original input, TiFi is able to construct taxonomies for a diverse range of fictional domains with high precision. - The second contribution is a comprehensive approach, called ENTYFI, for named entity recognition and typing in long fictional texts. Built on 205 automatically induced high-quality type systems for popular fictional domains, ENTYFI exploits the overlap and reuse of these fictional domains on unseen texts. By combining different typing modules with a consolidation stage, ENTYFI is able to do fine-grained entity typing in long fictional texts with high precision and recall. - The third contribution is an end-to-end system, called KnowFi, for extracting relations between entities in very long texts such as entire books. KnowFi leverages background knowledge from 142 popular fictional domains to identify interesting relations and to collect distant training samples. KnowFi devises a similarity-based ranking technique to reduce false positives in training samples and to select potential text passages that contain seed pairs of entities. By training a hierarchical neural network for all relations, KnowFi is able to infer relations between entity pairs across long fictional texts, and achieves gains over the best prior methods for relation extraction.

BibTeX

@phdthesis{Chuphd2022,
TITLE = {Knowledge Extraction from Fictional Texts},
AUTHOR = {Chu, Cuong Xuan},
LANGUAGE = {eng},
URL = {nbn:de:bsz:291--ds-361070},
DOI = {10.22028/D291-36107},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2022},
DATE = {2022},
ABSTRACT = {Knowledge extraction from text is a key task in natural language processing, which involves many sub-tasks, such as taxonomy induction, named entity recognition and typing, relation extraction, knowledge canonicalization and so on. By constructing structured knowledge from natural language text, knowledge extraction becomes a key asset for search engines, question answering and other downstream applications. However, current knowledge extraction methods mostly focus on prominent real-world entities with Wikipedia and mainstream news articles as sources. The constructed knowledge bases, therefore, lack information about long-tail domains, with fiction and fantasy as archetypes. Fiction and fantasy are core parts of our human culture, spanning from literature to movies, TV series, comics and video games. With thousands of fictional universes which have been created, knowledge from fictional domains are subject of search-engine queries -- by fans as well as cultural analysts. Unlike the real-world domain, knowledge extraction on such specific domains like fiction and fantasy has to tackle several key challenges: -- Training data: Sources for fictional domains mostly come from books and fan-built content, which is sparse and noisy, and contains difficult structures of texts, such as dialogues and quotes. Training data for key tasks such as taxonomy induction, named entity typing or relation extraction are also not available. -- Domain characteristics and diversity: Fictional universes can be highly sophisticated, containing entities, social structures and sometimes languages that are completely different from the real world. State-of-the-art methods for knowledge extraction make assumptions on entity-class, subclass and entity-entity relations that are often invalid for fictional domains. With different genres of fictional domains, another requirement is to transfer models across domains. -- Long fictional texts: While state-of-the-art models have limitations on the input sequence length, it is essential to develop methods that are able to deal with very long texts (e.g. entire books), to capture multiple contexts and leverage widely spread cues. This dissertation addresses the above challenges, by developing new methodologies that advance the state of the art on knowledge extraction in fictional domains. -- The first contribution is a method, called TiFi, for constructing type systems (taxonomy induction) for fictional domains. By tapping noisy fan-built content from online communities such as Wikia, TiFi induces taxonomies through three main steps: category cleaning, edge cleaning and top-level construction. Exploiting a variety of features from the original input, TiFi is able to construct taxonomies for a diverse range of fictional domains with high precision. -- The second contribution is a comprehensive approach, called ENTYFI, for named entity recognition and typing in long fictional texts. Built on 205 automatically induced high-quality type systems for popular fictional domains, ENTYFI exploits the overlap and reuse of these fictional domains on unseen texts. By combining different typing modules with a consolidation stage, ENTYFI is able to do fine-grained entity typing in long fictional texts with high precision and recall. -- The third contribution is an end-to-end system, called KnowFi, for extracting relations between entities in very long texts such as entire books. KnowFi leverages background knowledge from 142 popular fictional domains to identify interesting relations and to collect distant training samples. KnowFi devises a similarity-based ranking technique to reduce false positives in training samples and to select potential text passages that contain seed pairs of entities. By training a hierarchical neural network for all relations, KnowFi is able to infer relations between entity pairs across long fictional texts, and achieves gains over the best prior methods for relation extraction.},
}

Endnote

%0 Thesis
%A Chu, Cuong Xuan
%Y Weikum, Gerhard
%A referee: Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Extraction from Fictional Texts :
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-9598-2
%R 10.22028/D291-36107
%U nbn:de:bsz:291--ds-361070
%F OTHER: hdl:20.500.11880/32914
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2022
%P 129 p.
%V phd
%9 phd
%X Knowledge extraction from text is a key task in natural language processing, which involves many sub-tasks, such as taxonomy induction, named entity recognition and typing, relation extraction, knowledge canonicalization and so on. By constructing structured knowledge from natural language text, knowledge extraction becomes a key asset for search engines, question answering and other downstream applications. However, current knowledge extraction methods mostly focus on prominent real-world entities with Wikipedia and mainstream news articles as sources. The constructed knowledge bases, therefore, lack information about long-tail domains, with fiction and fantasy as archetypes. Fiction and fantasy are core parts of our human culture, spanning from literature to movies, TV series, comics and video games. With thousands of fictional universes which have been created, knowledge from fictional domains are subject of search-engine queries - by fans as well as cultural analysts. Unlike the real-world domain, knowledge extraction on such specific domains like fiction and fantasy has to tackle several key challenges: - Training data: Sources for fictional domains mostly come from books and fan-built content, which is sparse and noisy, and contains difficult structures of texts, such as dialogues and quotes. Training data for key tasks such as taxonomy induction, named entity typing or relation extraction are also not available. - Domain characteristics and diversity: Fictional universes can be highly sophisticated, containing entities, social structures and sometimes languages that are completely different from the real world. State-of-the-art methods for knowledge extraction make assumptions on entity-class, subclass and entity-entity relations that are often invalid for fictional domains. With different genres of fictional domains, another requirement is to transfer models across domains. - Long fictional texts: While state-of-the-art models have limitations on the input sequence length, it is essential to develop methods that are able to deal with very long texts (e.g. entire books), to capture multiple contexts and leverage widely spread cues. This dissertation addresses the above challenges, by developing new methodologies that advance the state of the art on knowledge extraction in fictional domains. - The first contribution is a method, called TiFi, for constructing type systems (taxonomy induction) for fictional domains. By tapping noisy fan-built content from online communities such as Wikia, TiFi induces taxonomies through three main steps: category cleaning, edge cleaning and top-level construction. Exploiting a variety of features from the original input, TiFi is able to construct taxonomies for a diverse range of fictional domains with high precision. - The second contribution is a comprehensive approach, called ENTYFI, for named entity recognition and typing in long fictional texts. Built on 205 automatically induced high-quality type systems for popular fictional domains, ENTYFI exploits the overlap and reuse of these fictional domains on unseen texts. By combining different typing modules with a consolidation stage, ENTYFI is able to do fine-grained entity typing in long fictional texts with high precision and recall. - The third contribution is an end-to-end system, called KnowFi, for extracting relations between entities in very long texts such as entire books. KnowFi leverages background knowledge from 142 popular fictional domains to identify interesting relations and to collect distant training samples. KnowFi devises a similarity-based ranking technique to reduce false positives in training samples and to select potential text passages that contain seed pairs of entities. By training a hierarchical neural network for all relations, KnowFi is able to infer relations between entity pairs across long fictional texts, and achieves gains over the best prior methods for relation extraction.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/32914

Article

D. Dave, A. Celestino, A. S. Varde, and V. Anu

“Management of Implicit Requirements Data in Large SRS Documents: Taxonomy and Techniques,” Sigmod Record, vol. 51, no. 2, 2022.

mehr

BibTeX

@article{dave2022,
TITLE = {Management of Implicit Requirements Data in Large {SRS} Documents: {T}axonomy and Techniques},
AUTHOR = {Dave, Dev and Celestino, Angelica and Varde, Aparna S. and Anu, Vaibhav},
LANGUAGE = {eng},
ISSN = {0163-5808},
PUBLISHER = {Special Interest Group on the Management of Data},
ADDRESS = {New York, NY},
YEAR = {2022},
JOURNAL = {Sigmod Record},
VOLUME = {51},
NUMBER = {2},
PAGES = {18--29},
}

Endnote

%0 Journal Article
%A Dave, Dev
%A Celestino, Angelica
%A Varde, Aparna S.
%A Anu, Vaibhav
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Management of Implicit Requirements Data in Large SRS Documents: Taxonomy and Techniques : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-F1AD-3
%7 2022
%D 2022
%J Sigmod Record
%V 51
%N 2
%& 18
%P 18 - 29
%I Special Interest Group on the Management of Data
%C New York, NY
%@ false

Thesis

D5IMPR-CS

J. Fischer

“More than the sum of its parts,” Universität des Saarlandes, Saarbrücken, 2022.

mehr

Abstract

In this thesis we explore pattern mining and deep learning. Often seen as orthogonal, we show that these fields complement each other and propose to combine them to gain from each other’s strengths. We, first, show how to efficiently discover succinct and non-redundant sets of patterns that provide insight into data beyond conjunctive statements. We leverage the interpretability of such patterns to unveil how and which information flows through neural networks, as well as what characterizes their decisions. Conversely, we show how to combine continuous optimization with pattern discovery, proposing a neural network that directly encodes discrete patterns, which allows us to apply pattern mining at a scale orders of magnitude larger than previously possible. Large neural networks are, however, exceedingly expensive to train for which ‘lottery tickets’ – small, well-trainable sub-networks in randomly initialized neural networks – offer a remedy. We identify theoretical limitations of strong tickets and overcome them by equipping these tickets with the property of universal approximation. To analyze whether limitations in ticket sparsity are algorithmic or fundamental, we propose a framework to plant and hide lottery tickets. With novel ticket benchmarks we then conclude that the limitation is likely algorithmic, encouraging further developments for which our framework offers means to measure progress.

BibTeX

@phdthesis{Fischerphd2022,
TITLE = {More than the sum of its parts},
AUTHOR = {Fischer, Jonas},
LANGUAGE = {eng},
URL = {nbn:de:bsz:291--ds-370240},
DOI = {10.22028/D291-37024},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2022},
DATE = {2022},
ABSTRACT = {In this thesis we explore pattern mining and deep learning. Often seen as orthogonal, we show that these fields complement each other and propose to combine them to gain from each other{\textquoteright}s strengths. We, first, show how to efficiently discover succinct and non-redundant sets of patterns that provide insight into data beyond conjunctive statements. We leverage the interpretability of such patterns to unveil how and which information flows through neural networks, as well as what characterizes their decisions. Conversely, we show how to combine continuous optimization with pattern discovery, proposing a neural network that directly encodes discrete patterns, which allows us to apply pattern mining at a scale orders of magnitude larger than previously possible. Large neural networks are, however, exceedingly expensive to train for which {\textquoteleft}lottery tickets{\textquoteright} -- small, well-trainable sub-networks in randomly initialized neural networks -- offer a remedy. We identify theoretical limitations of strong tickets and overcome them by equipping these tickets with the property of universal approximation. To analyze whether limitations in ticket sparsity are algorithmic or fundamental, we propose a framework to plant and hide lottery tickets. With novel ticket benchmarks we then conclude that the limitation is likely algorithmic, encouraging further developments for which our framework offers means to measure progress.},
}

Endnote

%0 Thesis
%A Fischer, Jonas
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%A referee: Parthasarathy, Srinivasan
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T More than the sum of its parts : pattern mining neural networks, and how they complement each other
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-38BF-0
%R 10.22028/D291-37024
%U nbn:de:bsz:291--ds-370240
%F OTHER: hdl:20.500.11880/33893
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2022
%P 250 p.
%V phd
%9 phd
%X In this thesis we explore pattern mining and deep learning. Often seen as orthogonal, we show that these fields complement each other and propose to combine them to gain from each other&#8217;s strengths. We, first, show how to efficiently discover succinct and non-redundant sets of patterns that provide insight into data beyond conjunctive statements. We leverage the interpretability of such patterns to unveil how and which information flows through neural networks, as well as what characterizes their decisions. Conversely, we show how to combine continuous optimization with pattern discovery, proposing a neural network that directly encodes discrete patterns, which allows us to apply pattern mining at a scale orders of magnitude larger than previously possible. Large neural networks are, however, exceedingly expensive to train for which &#8216;lottery tickets&#8217; &#8211; small, well-trainable sub-networks in randomly initialized neural networks &#8211; offer a remedy. We identify theoretical limitations of strong tickets and overcome them by equipping these tickets with the property of universal approximation. To analyze whether limitations in ticket sparsity are algorithmic or fundamental, we propose a framework to plant and hide lottery tickets. With novel ticket benchmarks we then conclude that the limitation is likely algorithmic, encouraging further developments for which our framework offers means to measure progress.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/33893

Conference paper

S. Ghosh, S. Razniewski, and G. Weikum

“Answering Count Queries with Explanatory Evidence,” in SIGIR ’22, 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 2022.

mehr

BibTeX

@inproceedings{Ghosh_SIGIR22,
TITLE = {Answering Count Queries with Explanatory Evidence},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8732-3},
DOI = {10.1145/3477495.3531870},
PUBLISHER = {ACM},
YEAR = {2022},
BOOKTITLE = {SIGIR '22, 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Amigo, Enrique and Castells, Pablo and Gonzalo, Julio and Carterett, Ben and Culpepper, J. Shane and Kazai, Gabriella},
PAGES = {2415--2419},
ADDRESS = {Madrid, Spain},
}

Endnote

%0 Conference Proceedings
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Answering Count Queries with Explanatory Evidence : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-9E36-8
%R 10.1145/3477495.3531870
%D 2022
%B 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2022-07-11 - 2022-07-15
%C Madrid, Spain
%B SIGIR '22
%E Amigo, Enrique; Castells, Pablo; Gonzalo, Julio; Carterett, Ben; Culpepper, J. Shane; Kazai, Gabriella
%P 2415 - 2419
%I ACM
%@ 978-1-4503-8732-3

Paper

S. Ghosh, S. Razniewski, and G. Weikum

“Answering Count Questions with Structured Answers from Text,” 2022. .

mehr

Abstract

In this work we address the challenging case of answering count queries in
web search, such as ``number of songs by John Lennon''. Prior methods merely
answer these with a single, and sometimes puzzling number or return a ranked
list of text snippets with different numbers. This paper proposes a methodology
for answering count queries with inference, contextualization and explanatory
evidence. Unlike previous systems, our method infers final answers from
multiple observations, supports semantic qualifiers for the counts, and
provides evidence by enumerating representative instances. Experiments with a
wide variety of queries, including existing benchmark show the benefits of our
method, and the influence of specific parameter settings. Our code, data and an
interactive system demonstration are publicly available at
github.com/ghoshs/CoQEx and nlcounqer.mpi-inf.mpg.de.

BibTeX

@online{Ghosh_2209.07250,
TITLE = {Answering Count Questions with Structured Answers from Text},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.48550/arXiv.2209.07250},
EPRINT = {2209.07250},
EPRINTTYPE = {arXiv},
YEAR = {2022},
ABSTRACT = {In this work we address the challenging case of answering count queries in<br>web search, such as ``number of songs by John Lennon''. Prior methods merely<br>answer these with a single, and sometimes puzzling number or return a ranked<br>list of text snippets with different numbers. This paper proposes a methodology<br>for answering count queries with inference, contextualization and explanatory<br>evidence. Unlike previous systems, our method infers final answers from<br>multiple observations, supports semantic qualifiers for the counts, and<br>provides evidence by enumerating representative instances. Experiments with a<br>wide variety of queries, including existing benchmark show the benefits of our<br>method, and the influence of specific parameter settings. Our code, data and an<br>interactive system demonstration are publicly available at<br>https://github.com/ghoshs/CoQEx and https://nlcounqer.mpi-inf.mpg.de/.<br>},
}

Endnote

%0 Report
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Answering Count Questions with Structured Answers from Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-1D84-0
%R 10.48550/arXiv.2209.07250
%D 2022
%X   In this work we address the challenging case of answering count queries in<br>web search, such as ``number of songs by John Lennon''. Prior methods merely<br>answer these with a single, and sometimes puzzling number or return a ranked<br>list of text snippets with different numbers. This paper proposes a methodology<br>for answering count queries with inference, contextualization and explanatory<br>evidence. Unlike previous systems, our method infers final answers from<br>multiple observations, supports semantic qualifiers for the counts, and<br>provides evidence by enumerating representative instances. Experiments with a<br>wide variety of queries, including existing benchmark show the benefits of our<br>method, and the influence of specific parameter settings. Our code, data and an<br>interactive system demonstration are publicly available at<br>https://github.com/ghoshs/CoQEx and https://nlcounqer.mpi-inf.mpg.de/.<br>
%K Computer Science, Information Retrieval, cs.IR

Thesis

D5IMPR-CS

A. Guimarães

“Data Science Methods for the Analysis of Controversial Social Media Discussions,” Universität des Saarlandes, Saarbrücken, 2022.

mehr

Abstract

Social media communities like Reddit and Twitter allow users to express their views on
topics of their interest, and to engage with other users who may share or oppose these views.
This can lead to productive discussions towards a consensus, or to contended debates, where
disagreements frequently arise.
Prior work on such settings has primarily focused on identifying notable instances of antisocial
behavior such as hate-speech and “trolling”, which represent possible threats to the health of
a community. These, however, are exceptionally severe phenomena, and do not encompass
controversies stemming from user debates, differences of opinions, and off-topic content, all
of which can naturally come up in a discussion without going so far as to compromise its
development.
This dissertation proposes a framework for the systematic analysis of social media discussions
that take place in the presence of controversial themes, disagreements, and mixed opinions from
participating users. For this, we develop a feature-based model to describe key elements of a
discussion, such as its salient topics, the level of activity from users, the sentiments it expresses,
and the user feedback it receives.
Initially, we build our feature model to characterize adversarial discussions surrounding
political campaigns on Twitter, with a focus on the factual and sentimental nature of their
topics and the role played by different users involved. We then extend our approach to Reddit
discussions, leveraging community feedback signals to define a new notion of controversy
and to highlight conversational archetypes that arise from frequent and interesting interaction
patterns. We use our feature model to build logistic regression classifiers that can predict future
instances of controversy in Reddit communities centered on politics, world news, sports, and
personal relationships. Finally, our model also provides the basis for a comparison of different
communities in the health domain, where topics and activity vary considerably despite their
shared overall focus. In each of these cases, our framework provides insight into how user
behavior can shape a community’s individual definition of controversy and its overall identity.

BibTeX

@phdthesis{Decarvalhophd2021,
TITLE = {Data Science Methods for the Analysis of Controversial Social Media Discussions},
AUTHOR = {Guimar{\~a}es, Anna},
LANGUAGE = {eng},
URL = {nbn:de:bsz:291--ds-365021},
DOI = {10.22028/D291-36502},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2022},
DATE = {2022},
ABSTRACT = {Social media communities like Reddit and Twitter allow users to express their views on<br>topics of their interest, and to engage with other users who may share or oppose these views.<br>This can lead to productive discussions towards a consensus, or to contended debates, where<br>disagreements frequently arise.<br>Prior work on such settings has primarily focused on identifying notable instances of antisocial<br>behavior such as hate-speech and {\textquotedblleft}trolling{\textquotedblright}, which represent possible threats to the health of<br>a community. These, however, are exceptionally severe phenomena, and do not encompass<br>controversies stemming from user debates, differences of opinions, and off-topic content, all<br>of which can naturally come up in a discussion without going so far as to compromise its<br>development.<br>This dissertation proposes a framework for the systematic analysis of social media discussions<br>that take place in the presence of controversial themes, disagreements, and mixed opinions from<br>participating users. For this, we develop a feature-based model to describe key elements of a<br>discussion, such as its salient topics, the level of activity from users, the sentiments it expresses,<br>and the user feedback it receives.<br>Initially, we build our feature model to characterize adversarial discussions surrounding<br>political campaigns on Twitter, with a focus on the factual and sentimental nature of their<br>topics and the role played by different users involved. We then extend our approach to Reddit<br>discussions, leveraging community feedback signals to define a new notion of controversy<br>and to highlight conversational archetypes that arise from frequent and interesting interaction<br>patterns. We use our feature model to build logistic regression classifiers that can predict future<br>instances of controversy in Reddit communities centered on politics, world news, sports, and<br>personal relationships. Finally, our model also provides the basis for a comparison of different<br>communities in the health domain, where topics and activity vary considerably despite their<br>shared overall focus. In each of these cases, our framework provides insight into how user<br>behavior can shape a community{\textquoteright}s individual definition of controversy and its overall identity.},
}

Endnote

%0 Thesis
%A Guimar&#227;es, Anna
%Y Weikum, Gerhard
%A referee: de Melo, Gerard
%A referee: Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Data Science Methods for the Analysis of
Controversial Social Media Discussions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-CDF7-9
%R 10.22028/D291-36502
%U nbn:de:bsz:291--ds-365021
%F OTHER: hdl:20.500.11880/33161
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2022
%P 94 p.
%V phd
%9 phd
%X Social media communities like Reddit and Twitter allow users to express their views on<br>topics of their interest, and to engage with other users who may share or oppose these views.<br>This can lead to productive discussions towards a consensus, or to contended debates, where<br>disagreements frequently arise.<br>Prior work on such settings has primarily focused on identifying notable instances of antisocial<br>behavior such as hate-speech and &#8220;trolling&#8221;, which represent possible threats to the health of<br>a community. These, however, are exceptionally severe phenomena, and do not encompass<br>controversies stemming from user debates, differences of opinions, and off-topic content, all<br>of which can naturally come up in a discussion without going so far as to compromise its<br>development.<br>This dissertation proposes a framework for the systematic analysis of social media discussions<br>that take place in the presence of controversial themes, disagreements, and mixed opinions from<br>participating users. For this, we develop a feature-based model to describe key elements of a<br>discussion, such as its salient topics, the level of activity from users, the sentiments it expresses,<br>and the user feedback it receives.<br>Initially, we build our feature model to characterize adversarial discussions surrounding<br>political campaigns on Twitter, with a focus on the factual and sentimental nature of their<br>topics and the role played by different users involved. We then extend our approach to Reddit<br>discussions, leveraging community feedback signals to define a new notion of controversy<br>and to highlight conversational archetypes that arise from frequent and interesting interaction<br>patterns. We use our feature model to build logistic regression classifiers that can predict future<br>instances of controversy in Reddit communities centered on politics, world news, sports, and<br>personal relationships. Finally, our model also provides the basis for a comparison of different<br>communities in the health domain, where topics and activity vary considerably despite their<br>shared overall focus. In each of these cases, our framework provides insight into how user<br>behavior can shape a community&#8217;s individual definition of controversy and its overall identity.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/33161

Conference paper

M. A. Hedderich, J. Fischer, D. Klakow, and J. Vreeken

“Label-Descriptive Patterns and Their Application to Characterizing Classification Errors,” in Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Baltimore, MA, USA, 2022.

mehr

BibTeX

@inproceedings{Hedderich_ICML22,
TITLE = {Label-Descriptive Patterns and Their Application to Characterizing Classification Errors},
AUTHOR = {Hedderich, Michael A. and Fischer, Jonas and Klakow, Dietrich and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {1938-7228},
URL = {https://proceedings.mlr.press/v162/hedderich22a.html},
YEAR = {2022},
BOOKTITLE = {Proceedings of the 39th International Conference on Machine Learning (ICML 2022)},
EDITOR = {Chaudhuri, Kamalika and Jegelka, Stefanie and Le, Song and Csaba, Szepesvari and Gang, Niu and Sabato, Sivan},
PAGES = {8691--8707},
SERIES = {Proceedings of the Machine Learning Research},
VOLUME = {162},
ADDRESS = {Baltimore, MA, USA},
}

Endnote

%0 Conference Proceedings
%A Hedderich, Michael A.
%A Fischer, Jonas
%A Klakow, Dietrich
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Label-Descriptive Patterns and Their Application to Characterizing Classification Errors : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-165A-7
%U https://proceedings.mlr.press/v162/hedderich22a.html
%D 2022
%B 39th International Conference on Machine Learning
%Z date of event: 2022-07-17 - 2022-07-23
%C Baltimore, MA, USA
%B Proceedings of the 39th International Conference on Machine Learning
%E Chaudhuri, Kamalika; Jegelka, Stefanie; Le, Song; Csaba, Szepesvari; Gang, Niu; Sabato, Sivan
%P 8691 - 8707
%B Proceedings of the Machine Learning Research
%N 162
%@ false

Conference paper

V. T. Ho, D. Stepanova, D. Milchevski, J. Strötgen, and G. Weikum

“Enhancing Knowledge Bases with Quantity Facts,” in WWW ’22, ACM Web Conference, Virtual Event, Lyon, France, 2022.

mehr

BibTeX

@inproceedings{Ho_WWW22,
TITLE = {Enhancing Knowledge Bases with Quantity Facts},
AUTHOR = {Ho, Vinh Thinh and Stepanova, Daria and Milchevski, Dragan and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-9096-5},
DOI = {10.1145/3485447.3511932},
PUBLISHER = {ACM},
YEAR = {2022},
BOOKTITLE = {WWW '22, ACM Web Conference},
EDITOR = {Laforest, Fr{\'e}d{\'e}rique and Troncy, Rapha{\"e}l and Simperl, Elena and Agarwal, Deepak and Gionis, Aristides and Herman, Ivan and M{\'e}dini, Lionel},
PAGES = {893--901},
ADDRESS = {Virtual Event, Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Ho, Vinh Thinh
%A Stepanova, Daria
%A Milchevski, Dragan
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Enhancing Knowledge Bases with Quantity Facts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-614E-2
%R 10.1145/3485447.3511932
%D 2022
%B ACM Web Conference
%Z date of event: 2022-04-25 - 2022-04-29
%C Virtual Event, Lyon, France
%B WWW '22
%E Laforest, Fr&#233;d&#233;rique; Troncy, Rapha&#235;l; Simperl, Elena; Agarwal, Deepak; Gionis, Aristides; Herman, Ivan; M&#233;dini, Lionel
%P 893 - 901
%I ACM
%@ 978-1-4503-9096-5

Thesis

D5IMPR-CS

V. T. Ho

“Entities with Quantities: Extraction, Search and Ranking,” Universität des Saarlandes, Saarbrücken, 2022.

mehr

BibTeX

@phdthesis{Ho_PhD2022,
TITLE = {Entities with Quantities: Extraction, Search and Ranking},
AUTHOR = {Ho, Vinh Thinh},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-380308},
DOI = {10.22028/D291-38030},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2022},
DATE = {2022},
}

Endnote

%0 Thesis
%A Ho, Vinh Thinh
%Y Weikum, Gerhard
%A referee: Stepanova, Daria
%A referee: Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Entities with Quantities: Extraction, Search and Ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-B756-5
%R 10.22028/D291-38030
%U urn:nbn:de:bsz:291--ds-380308
%F OTHER: hdl:20.500.11880/34538
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2022
%P xii, 131p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/34538

Proceedings

L.-A. Kaffee, S. Razniewski, G. Amaral, and K. S. Alghamdi

Eds., Wikidata Workshop 2022. CEUR-WS, 2022.

mehr

BibTeX

@proceedings{Kaffee_Wikidata22,
TITLE = {Wikidata Workshop 2022},
EDITOR = {Kaffee, Lucie-Aim{\'e}e and Razniewski, Simon and Amaral, Gabriel and Alghamdi, Kholoud Saad},
LANGUAGE = {eng},
URL = {https://ceur-ws.org/Vol-3262/; urn:nbn:de:0074-3262-0},
PUBLISHER = {CEUR-WS},
YEAR = {2022},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3262},
ADDRESS = {Virtual Event, Hangzhou, China},
}

Endnote

%0 Conference Proceedings
%E Kaffee, Lucie-Aim&#233;e
%E Razniewski, Simon
%E Amaral, Gabriel
%E Alghamdi, Kholoud Saad
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Wikidata Workshop 2022 : Proceedings of the 3rd Wikidata Workshop 2022,
co-located with the 21st International Semantic Web Conference (ISWC2022)
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-1663-C
%U https://ceur-ws.org/Vol-3262/
%U urn:nbn:de:0074-3262-0
%I CEUR-WS
%D 2022
%B 3rd Wikidata Workshop
%Z date of event:  - 
%C Virtual Event, Hangzhou, China
%S CEUR Workshop Proceedings
%V 3262

Article

P. Lahoti, K. Gummadi, and G. Weikum

“Responsible Model Deployment via Model-agnostic Uncertainty Learning,” Machine Learning, vol. 112, 2022.

mehr

BibTeX

@article{Lahoti2022,
TITLE = {Responsible Model Deployment via Model-agnostic Uncertainty Learning},
AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {0885-6125},
DOI = {10.1007/s10994-022-06248-y},
PUBLISHER = {Springer},
ADDRESS = {Dordrecht},
YEAR = {2022},
JOURNAL = {Machine Learning},
VOLUME = {112},
PAGES = {939--970},
}

Endnote

%0 Journal Article
%A Lahoti, Preethi
%A Gummadi, Krishna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Responsible Model Deployment via Model-agnostic Uncertainty Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-58F0-3
%R 10.1007/s10994-022-06248-y
%7 2022
%D 2022
%J Machine Learning
%V 112
%& 939
%P 939 - 970
%I Springer
%C Dordrecht
%@ false

Conference paper

P. Lahoti, K. Gummadi, and G. Weikum

“Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning,” in 21st IEEE International Conference on Data Mining (ICDM 2021), Auckland, New Zealand (Virtual Conference), 2022.

mehr

BibTeX

@inproceedings{Gummadi_ICDM21,
TITLE = {Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning},
AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-6654-2398-4},
DOI = {10.1109/ICDM51629.2021.00141},
PUBLISHER = {IEEE},
YEAR = {2021},
DATE = {2022},
BOOKTITLE = {21st IEEE International Conference on Data Mining (ICDM 2021)},
EDITOR = {Bailey, James and Miettinen, Pauli and Koh, Yun Sing and Tao, Dacheng and Wu, Xindong},
PAGES = {1174--1179},
ADDRESS = {Auckland, New Zealand (Virtual Conference)},
}

Endnote

%0 Conference Proceedings
%A Lahoti, Preethi
%A Gummadi, Krishna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Detecting and Mitigating Test-time Failure Risks via Model-agnostic
  Uncertainty Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-5E15-6
%R 10.1109/ICDM51629.2021.00141
%D 2022
%B 21st IEEE International Conference on Data Mining 
%Z date of event: 2021-12-07 - 2021-12-10
%C Auckland, New Zealand (Virtual Conference)
%B 21st IEEE International Conference on Data Mining 
%E Bailey, James; Miettinen, Pauli; Koh, Yun Sing; Tao, Dacheng; Wu, Xindong
%P 1174 - 1179
%I IEEE
%@ 978-1-6654-2398-4

Thesis

D5IMPR-CS

P. Lahoti

“Operationalizing Fairness for Responsible Machine Learning,” Universität des Saarlandes, Saarbrücken, 2022.

mehr

Abstract

As machine learning (ML) is increasingly used for decision making in scenarios that impact humans, there is a growing awareness of its potential for unfairness. A large body of recent work has focused on proposing formal notions of fairness in ML, as well as approaches to mitigate unfairness. However, there is a growing disconnect between the ML fairness literature and the needs to operationalize fairness in practice. This thesis addresses the need for responsible ML by developing new models and methods to address challenges in operationalizing fairness in practice. Specifically, it makes the following contributions. First, we tackle a key assumption in the group fairness literature that sensitive demographic attributes such as race and gender are known upfront, and can be readily used in model training to mitigate unfairness. In practice, factors like privacy and regulation often prohibit ML models from collecting or using protected attributes in decision making. To address this challenge we introduce the novel notion of computationally-identifiable errors and propose Adversarially Reweighted Learning (ARL), an optimization method that seeks to improve the worst-case performance over unobserved groups, without requiring access to the protected attributes in the dataset. Second, we argue that while group fairness notions are a desirable fairness criterion, they are fundamentally limited as they reduce fairness to an average statistic over pre-identified protected groups. In practice, automated decisions are made at an individual level, and can adversely impact individual people irrespective of the group statistic. We advance the paradigm of individual fairness by proposing iFair (individually fair representations), an optimization approach for learning a low dimensional latent representation of the data with two goals: to encode the data as well as possible, while removing any information about protected attributes in the transformed representation. Third, we advance the individual fairness paradigm, which requires that similar individuals receive similar outcomes. However, similarity metrics computed over observed feature space can be brittle, and inherently limited in their ability to accurately capture similarity between individuals. To address this, we introduce a novel notion of fairness graphs, wherein pairs of individuals can be identified as deemed similar with respect to the ML objective. We cast the problem of individual fairness into graph embedding, and propose PFR (pairwise fair representations), a method to learn a unified pairwise fair representation of the data. Fourth, we tackle the challenge that production data after model deployment is constantly evolving. As a consequence, in spite of the best efforts in training a fair model, ML systems can be prone to failure risks due to a variety of unforeseen reasons. To ensure responsible model deployment, potential failure risks need to be predicted, and mitigation actions need to be devised, for example, deferring to a human expert when uncertain or collecting additional data to address model’s blind-spots. We propose Risk Advisor, a model-agnostic meta-learner to predict potential failure risks and to give guidance on the sources of uncertainty inducing the risks, by leveraging information theoretic notions of aleatoric and epistemic uncertainty. This dissertation brings ML fairness closer to real-world applications by developing methods that address key practical challenges. Extensive experiments on a variety of real-world and synthetic datasets show that our proposed methods are viable in practice.

BibTeX

@phdthesis{Lahotophd2022,
TITLE = {Operationalizing Fairness for Responsible Machine Learning},
AUTHOR = {Lahoti, Preethi},
LANGUAGE = {eng},
URL = {nbn:de:bsz:291--ds-365860},
DOI = {10.22028/D291-36586},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2022},
DATE = {2022},
ABSTRACT = {As machine learning (ML) is increasingly used for decision making in scenarios that impact humans, there is a growing awareness of its potential for unfairness. A large body of recent work has focused on proposing formal notions of fairness in ML, as well as approaches to mitigate unfairness. However, there is a growing disconnect between the ML fairness literature and the needs to operationalize fairness in practice. This thesis addresses the need for responsible ML by developing new models and methods to address challenges in operationalizing fairness in practice. Specifically, it makes the following contributions. First, we tackle a key assumption in the group fairness literature that sensitive demographic attributes such as race and gender are known upfront, and can be readily used in model training to mitigate unfairness. In practice, factors like privacy and regulation often prohibit ML models from collecting or using protected attributes in decision making. To address this challenge we introduce the novel notion of computationally-identifiable errors and propose Adversarially Reweighted Learning (ARL), an optimization method that seeks to improve the worst-case performance over unobserved groups, without requiring access to the protected attributes in the dataset. Second, we argue that while group fairness notions are a desirable fairness criterion, they are fundamentally limited as they reduce fairness to an average statistic over pre-identified protected groups. In practice, automated decisions are made at an individual level, and can adversely impact individual people irrespective of the group statistic. We advance the paradigm of individual fairness by proposing iFair (individually fair representations), an optimization approach for learning a low dimensional latent representation of the data with two goals: to encode the data as well as possible, while removing any information about protected attributes in the transformed representation. Third, we advance the individual fairness paradigm, which requires that similar individuals receive similar outcomes. However, similarity metrics computed over observed feature space can be brittle, and inherently limited in their ability to accurately capture similarity between individuals. To address this, we introduce a novel notion of fairness graphs, wherein pairs of individuals can be identified as deemed similar with respect to the ML objective. We cast the problem of individual fairness into graph embedding, and propose PFR (pairwise fair representations), a method to learn a unified pairwise fair representation of the data. Fourth, we tackle the challenge that production data after model deployment is constantly evolving. As a consequence, in spite of the best efforts in training a fair model, ML systems can be prone to failure risks due to a variety of unforeseen reasons. To ensure responsible model deployment, potential failure risks need to be predicted, and mitigation actions need to be devised, for example, deferring to a human expert when uncertain or collecting additional data to address model{\textquoteright}s blind-spots. We propose Risk Advisor, a model-agnostic meta-learner to predict potential failure risks and to give guidance on the sources of uncertainty inducing the risks, by leveraging information theoretic notions of aleatoric and epistemic uncertainty. This dissertation brings ML fairness closer to real-world applications by developing methods that address key practical challenges. Extensive experiments on a variety of real-world and synthetic datasets show that our proposed methods are viable in practice.},
}

Endnote

%0 Thesis
%A Lahoti, Preethi
%Y Weikum, Gerhard
%A referee: Gummadi, Krishna
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society
%T Operationalizing Fairness for
Responsible Machine Learning :
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-CEC6-F
%R 10.22028/D291-36586
%U nbn:de:bsz:291--ds-365860
%F OTHER: hdl:20.500.11880/33465
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2022
%P 129 p.
%V phd
%9 phd
%X As machine learning (ML) is increasingly used for decision making in scenarios that impact humans, there is a growing awareness of its potential for unfairness. A large body of recent work has focused on proposing formal notions of fairness in ML, as well as approaches to mitigate unfairness. However, there is a growing disconnect between the ML fairness literature and the needs to operationalize fairness in practice. This thesis addresses the need for responsible ML by developing new models and methods to address challenges in operationalizing fairness in practice. Specifically, it makes the following contributions. First, we tackle a key assumption in the group fairness literature that sensitive demographic attributes such as race and gender are known upfront, and can be readily used in model training to mitigate unfairness. In practice, factors like privacy and regulation often prohibit ML models from collecting or using protected attributes in decision making. To address this challenge we introduce the novel notion of computationally-identifiable errors and propose Adversarially Reweighted Learning (ARL), an optimization method that seeks to improve the worst-case performance over unobserved groups, without requiring access to the protected attributes in the dataset. Second, we argue that while group fairness notions are a desirable fairness criterion, they are fundamentally limited as they reduce fairness to an average statistic over pre-identified protected groups. In practice, automated decisions are made at an individual level, and can adversely impact individual people irrespective of the group statistic. We advance the paradigm of individual fairness by proposing iFair (individually fair representations), an optimization approach for learning a low dimensional latent representation of the data with two goals: to encode the data as well as possible, while removing any information about protected attributes in the transformed representation. Third, we advance the individual fairness paradigm, which requires that similar individuals receive similar outcomes. However, similarity metrics computed over observed feature space can be brittle, and inherently limited in their ability to accurately capture similarity between individuals. To address this, we introduce a novel notion of fairness graphs, wherein pairs of individuals can be identified as deemed similar with respect to the ML objective. We cast the problem of individual fairness into graph embedding, and propose PFR (pairwise fair representations), a method to learn a unified pairwise fair representation of the data. Fourth, we tackle the challenge that production data after model deployment is constantly evolving. As a consequence, in spite of the best efforts in training a fair model, ML systems can be prone to failure risks due to a variety of unforeseen reasons. To ensure responsible model deployment, potential failure risks need to be predicted, and mitigation actions need to be devised, for example, deferring to a human expert when uncertain or collecting additional data to address model&#8217;s blind-spots. We propose Risk Advisor, a model-agnostic meta-learner to predict potential failure risks and to give guidance on the sources of uncertainty inducing the risks, by leveraging information theoretic notions of aleatoric and epistemic uncertainty. This dissertation brings ML fairness closer to real-world applications by developing methods that address key practical challenges. Extensive experiments on a variety of real-world and synthetic datasets show that our proposed methods are viable in practice.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/33465

Book

J. Lin, R. Nogueira, and A. Yates

Pretrained Transformers for Text Ranking : BERT and Beyond. Cham: Springer International Publishin, 2022.

mehr

BibTeX

@book{LinSLHT53,
TITLE = {Pretrained Transformers for Text Ranking : {BERT} and Beyond},
AUTHOR = {Lin, Jimmy and Nogueira, Rodrigo and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-3-031-02181-7},
DOI = {10.1007/978-3-031-02181-7},
PUBLISHER = {Springer International Publishin},
ADDRESS = {Cham},
YEAR = {2022},
PAGES = {XVII, 307},
SERIES = {Synthesis Lectures on Human Language Technologies},
}

Endnote

%0 Book
%A Lin, Jimmy
%A Nogueira, Rodrigo
%A Yates, Andrew
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Pretrained Transformers for Text Ranking : BERT and Beyond : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DC2B-D
%@ 978-3-031-02181-7
%R 10.1007/978-3-031-02181-7
%I Springer International Publishin
%C Cham
%D 2022
%P XVII, 307
%B Synthesis Lectures on Human Language Technologies

Conference paper

A. Marx and J. Fischer

“Estimating Mutual Information via Geodesic kNN,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2022), Alexandria, VA, USA, 2022.

mehr

BibTeX

@inproceedings{Marx_SDM2022,
TITLE = {{Estimating Mutual Information via Geodesic $k$NN}},
AUTHOR = {Marx, Alexander and Fischer, Jonas},
LANGUAGE = {eng},
ISBN = {978-1-61197-717-2},
DOI = {10.1137/1.9781611977172.47},
PUBLISHER = {SIAM},
YEAR = {2022},
BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2022)},
PAGES = {415--423},
ADDRESS = {Alexandria, VA, USA},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Fischer, Jonas
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Estimating Mutual Information via Geodesic kNN : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-B19D-E
%R 10.1137/1.9781611977172.47
%D 2022
%B SIAM International Conference on Data Mining
%Z date of event: 2022-04-28 - 2022-04-30
%C Alexandria, VA, USA
%B Proceedings of the SIAM International Conference on Data Mining
%P 415 - 423
%I SIAM
%@ 978-1-61197-717-2

Paper

T. Nguyen, A. Yates, A. Zirikly, B. Desmet, and A. Cohan

“Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires,” 2022. [Online]. Available: https://arxiv.org/abs/2204.10432.

mehr

Abstract

Automated methods have been widely used to identify and analyze mental health
conditions (e.g., depression) from various sources of information, including
social media. Yet, deployment of such models in real-world healthcare
applications faces challenges including poor out-of-domain generalization and
lack of trust in black box models. In this work, we propose approaches for
depression detection that are constrained to different degrees by the presence
of symptoms described in PHQ9, a questionnaire used by clinicians in the
depression screening process. In dataset-transfer experiments on three social
media datasets, we find that grounding the model in PHQ9's symptoms
substantially improves its ability to generalize to out-of-distribution data
compared to a standard BERT-based approach. Furthermore, this approach can
still perform competitively on in-domain data. These results and our
qualitative analyses suggest that grounding model predictions in
clinically-relevant symptoms can improve generalizability while producing a
model that is easier to inspect.

BibTeX

@online{Nguyen2204.10432,
TITLE = {Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires},
AUTHOR = {Nguyen, Thong and Yates, Andrew and Zirikly, Ayah and Desmet, Bart and Cohan, Arman},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2204.10432},
EPRINT = {2204.10432},
EPRINTTYPE = {arXiv},
YEAR = {2022},
ABSTRACT = {Automated methods have been widely used to identify and analyze mental health<br>conditions (e.g., depression) from various sources of information, including<br>social media. Yet, deployment of such models in real-world healthcare<br>applications faces challenges including poor out-of-domain generalization and<br>lack of trust in black box models. In this work, we propose approaches for<br>depression detection that are constrained to different degrees by the presence<br>of symptoms described in PHQ9, a questionnaire used by clinicians in the<br>depression screening process. In dataset-transfer experiments on three social<br>media datasets, we find that grounding the model in PHQ9's symptoms<br>substantially improves its ability to generalize to out-of-distribution data<br>compared to a standard BERT-based approach. Furthermore, this approach can<br>still perform competitively on in-domain data. These results and our<br>qualitative analyses suggest that grounding model predictions in<br>clinically-relevant symptoms can improve generalizability while producing a<br>model that is easier to inspect.<br>},
}

Endnote

%0 Report
%A Nguyen, Thong
%A Yates, Andrew
%A Zirikly, Ayah
%A Desmet, Bart
%A Cohan, Arman
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Improving the Generalizability of Depression Detection by Leveraging
  Clinical Questionnaires : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-166D-2
%U https://arxiv.org/abs/2204.10432
%D 2022
%X   Automated methods have been widely used to identify and analyze mental health<br>conditions (e.g., depression) from various sources of information, including<br>social media. Yet, deployment of such models in real-world healthcare<br>applications faces challenges including poor out-of-domain generalization and<br>lack of trust in black box models. In this work, we propose approaches for<br>depression detection that are constrained to different degrees by the presence<br>of symptoms described in PHQ9, a questionnaire used by clinicians in the<br>depression screening process. In dataset-transfer experiments on three social<br>media datasets, we find that grounding the model in PHQ9's symptoms<br>substantially improves its ability to generalize to out-of-distribution data<br>compared to a standard BERT-based approach. Furthermore, this approach can<br>still perform competitively on in-domain data. These results and our<br>qualitative analyses suggest that grounding model predictions in<br>clinically-relevant symptoms can improve generalizability while producing a<br>model that is easier to inspect.<br>
%K Computer Science, Computation and Language, cs.CL

Conference paper

T. Nguyen, A. Yates, A. Zirikly, B. Desmet, and A. Cohan

“Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires,” in The 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland, 2022.

mehr

BibTeX

@inproceedings{Nguyen_ACL22,
TITLE = {Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires},
AUTHOR = {Nguyen, Thong and Yates, Andrew and Zirikly, Ayah and Desmet, Bart and Cohan, Arman},
LANGUAGE = {eng},
ISBN = {978-1-955917-21-6},
DOI = {10.18653/v1/2022.acl-long.578},
PUBLISHER = {ACL},
YEAR = {2022},
BOOKTITLE = {The 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)},
EDITOR = {Muresan, Smaranda and Nakov, Preslav and Villavicencio, Aline},
PAGES = {8446--8459},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Thong
%A Yates, Andrew
%A Zirikly, Ayah
%A Desmet, Bart
%A Cohan, Arman
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires  : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-1DAA-6
%R 10.18653/v1/2022.acl-long.578
%D 2022
%B 60th Annual Meeting of the Association for Computational Linguistic
%Z date of event: 2022-05-22 - 2022-05-27
%C Dublin, Ireland
%B The 60th Annual Meeting of the Association for Computational Linguistics
%E Muresan, Smaranda; Nakov, Preslav; Villavicencio, Aline
%P 8446 - 8459
%I ACL
%@ 978-1-955917-21-6

Paper

T.-P. Nguyen, S. Razniewski, A. Varde, and G. Weikum

“Extracting Cultural Commonsense Knowledge at Scale,” 2022. [Online]. Available: https://arxiv.org/abs/2210.07763.

mehr

Abstract

Structured knowledge is important for many AI applications. Commonsense
knowledge, which is crucial for robust human-centric AI, is covered by a small
number of structured knowledge projects. However, they lack knowledge about
human traits and behaviors conditioned on socio-cultural contexts, which is
crucial for situative AI. This paper presents CANDLE, an end-to-end methodology
for extracting high-quality cultural commonsense knowledge (CCSK) at scale.
CANDLE extracts CCSK assertions from a huge web corpus and organizes them into
coherent clusters, for 3 domains of subjects (geography, religion, occupation)
and several cultural facets (food, drinks, clothing, traditions, rituals,
behaviors). CANDLE includes judicious techniques for classification-based
filtering and scoring of interestingness. Experimental evaluations show the
superiority of the CANDLE CCSK collection over prior works, and an extrinsic
use case demonstrates the benefits of CCSK for the GPT-3 language model. Code
and data can be accessed at cultural-csk.herokuapp.com.

BibTeX

@online{Nguyen2210.07763,
TITLE = {Extracting Cultural Commonsense Knowledge at Scale},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Varde, Aparna and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2210.07763},
DOI = {10.48550/arXiv.2210.07763},
EPRINT = {2210.07763},
EPRINTTYPE = {arXiv},
YEAR = {2022},
ABSTRACT = {Structured knowledge is important for many AI applications. Commonsense<br>knowledge, which is crucial for robust human-centric AI, is covered by a small<br>number of structured knowledge projects. However, they lack knowledge about<br>human traits and behaviors conditioned on socio-cultural contexts, which is<br>crucial for situative AI. This paper presents CANDLE, an end-to-end methodology<br>for extracting high-quality cultural commonsense knowledge (CCSK) at scale.<br>CANDLE extracts CCSK assertions from a huge web corpus and organizes them into<br>coherent clusters, for 3 domains of subjects (geography, religion, occupation)<br>and several cultural facets (food, drinks, clothing, traditions, rituals,<br>behaviors). CANDLE includes judicious techniques for classification-based<br>filtering and scoring of interestingness. Experimental evaluations show the<br>superiority of the CANDLE CCSK collection over prior works, and an extrinsic<br>use case demonstrates the benefits of CCSK for the GPT-3 language model. Code<br>and data can be accessed at https://cultural-csk.herokuapp.com/.<br>},
}

Endnote

%0 Report
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Varde, Aparna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Extracting Cultural Commonsense Knowledge at Scale : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-58B3-8
%U https://arxiv.org/abs/2210.07763
%R 10.48550/arXiv.2210.07763
%D 2022
%X   Structured knowledge is important for many AI applications. Commonsense<br>knowledge, which is crucial for robust human-centric AI, is covered by a small<br>number of structured knowledge projects. However, they lack knowledge about<br>human traits and behaviors conditioned on socio-cultural contexts, which is<br>crucial for situative AI. This paper presents CANDLE, an end-to-end methodology<br>for extracting high-quality cultural commonsense knowledge (CCSK) at scale.<br>CANDLE extracts CCSK assertions from a huge web corpus and organizes them into<br>coherent clusters, for 3 domains of subjects (geography, religion, occupation)<br>and several cultural facets (food, drinks, clothing, traditions, rituals,<br>behaviors). CANDLE includes judicious techniques for classification-based<br>filtering and scoring of interestingness. Experimental evaluations show the<br>superiority of the CANDLE CCSK collection over prior works, and an extrinsic<br>use case demonstrates the benefits of CCSK for the GPT-3 language model. Code<br>and data can be accessed at https://cultural-csk.herokuapp.com/.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI

Conference paper

T.-P. Nguyen and S. Razniewski

“Materialized Knowledge Bases from Commonsense Transformers,” in Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022), Dublin, Ireland, 2022.

mehr

BibTeX

@inproceedings{Nguyen_CSRR22,
TITLE = {Materialized Knowledge Bases from Commonsense Transformers},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon},
LANGUAGE = {eng},
ISBN = {978-1-955917-28-5},
URL = {https://openreview.net/forum?id=HI5M4MYedZ5},
PUBLISHER = {ACL},
YEAR = {2022},
BOOKTITLE = {Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)},
EDITOR = {Bosselut, Antoine and Li, Xiang and Yuchen, Bill and Shwartz, Vered and Majumder, Bodhisattwa Prasad and Kumar Lal, Yash and Rudinger, Rachel and Ren, Xiang and Tandon, Niket and Zouhar, Vil{\'e}m},
PAGES = {36--42},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Materialized Knowledge Bases from Commonsense Transformers : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-1D87-D
%U https://openreview.net/forum?id=HI5M4MYedZ5
%D 2022
%B 1st Workshop on Commonsense Representation and Reasoning
%Z date of event: 2022-05-27 - 2022-05-27
%C Dublin, Ireland
%B Proceedings of the First Workshop on Commonsense Representation and Reasoning
%E Bosselut, Antoine; Li, Xiang; Yuchen, Bill; Shwartz, Vered; Majumder, Bodhisattwa Prasad; Kumar Lal, Yash; Rudinger, Rachel; Ren, Xiang; Tandon, Niket; Zouhar, Vil&#233;m
%P 36 - 42
%I ACL
%@ 978-1-955917-28-5

Conference paper

R. Pradeep, Y. Liu, X. Zhang, Y. Li, A. Yates, and J. Lin

“Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking,” in Advances in Information Retrieval (ECIR 2022), Stavanger, Norway, 2022.

mehr

BibTeX

@inproceedings{Pradeep_ECIR2022,
TITLE = {Squeezing Water from a Stone: {A} Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking},
AUTHOR = {Pradeep, Ronak and Liu, Yuqi and Zhang, Xinyu and Li, Yilin and Yates, Andrew and Lin, Jimmy},
LANGUAGE = {eng},
ISBN = {978-3-030-99736-6},
DOI = {10.1007/978-3-030-99736-6_44},
PUBLISHER = {Springer},
YEAR = {2022},
DATE = {2022},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2022)},
EDITOR = {Hagen, Matthias and Verbene, Suzan and Macdonald, Craig and Seifert, Christin and Balog, Krisztian and N{\o}rv{\aa}g, Kjetil and Setty, Vinay},
PAGES = {655--670},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {13185},
ADDRESS = {Stavanger, Norway},
}

Endnote

%0 Conference Proceedings
%A Pradeep, Ronak
%A Liu, Yuqi
%A Zhang, Xinyu
%A Li, Yilin
%A Yates, Andrew
%A Lin, Jimmy
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-9E28-8
%R 10.1007/978-3-030-99736-6_44
%D 2022
%B 44th European Conference on IR Research
%Z date of event: 2022-04-10 - 2022-04-14
%C Stavanger, Norway
%B Advances in Information Retrieval
%E Hagen, Matthias; Verbene, Suzan; Macdonald, Craig; Seifert, Christin; Balog, Krisztian; N&#248;rv&#229;g, Kjetil; Setty, Vinay
%P 655 - 670
%I Springer
%@ 978-3-030-99736-6
%B Lecture Notes in Computer Science
%N 13185

Paper

J. Romero and S. Razniewski

“Do Children Texts Hold The Key To Commonsense Knowledge?,” 2022. [Online]. Available: https://arxiv.org/abs/2210.04530.

mehr

Abstract

Compiling comprehensive repositories of commonsense knowledge is a
long-standing problem in AI. Many concerns revolve around the issue of
reporting bias, i.e., that frequency in text sources is not a good proxy for
relevance or truth. This paper explores whether children's texts hold the key
to commonsense knowledge compilation, based on the hypothesis that such content
makes fewer assumptions on the reader's knowledge, and therefore spells out
commonsense more explicitly. An analysis with several corpora shows that
children's texts indeed contain much more, and more typical commonsense
assertions. Moreover, experiments show that this advantage can be leveraged in
popular language-model-based commonsense knowledge extraction settings, where
task-unspecific fine-tuning on small amounts of children texts (childBERT)
already yields significant improvements. This provides a refreshing perspective
different from the common trend of deriving progress from ever larger models
and corpora.

BibTeX

@online{Romero2210.04530,
TITLE = {Do Children Texts Hold The Key To Commonsense Knowledge?},
AUTHOR = {Romero, Julien and Razniewski, Simon},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2210.04530},
DOI = {10.48550/arXiv.2210.04530},
EPRINT = {2210.04530},
EPRINTTYPE = {arXiv},
YEAR = {2022},
ABSTRACT = {Compiling comprehensive repositories of commonsense knowledge is a<br>long-standing problem in AI. Many concerns revolve around the issue of<br>reporting bias, i.e., that frequency in text sources is not a good proxy for<br>relevance or truth. This paper explores whether children's texts hold the key<br>to commonsense knowledge compilation, based on the hypothesis that such content<br>makes fewer assumptions on the reader's knowledge, and therefore spells out<br>commonsense more explicitly. An analysis with several corpora shows that<br>children's texts indeed contain much more, and more typical commonsense<br>assertions. Moreover, experiments show that this advantage can be leveraged in<br>popular language-model-based commonsense knowledge extraction settings, where<br>task-unspecific fine-tuning on small amounts of children texts (childBERT)<br>already yields significant improvements. This provides a refreshing perspective<br>different from the common trend of deriving progress from ever larger models<br>and corpora.<br>},
}

Endnote

%0 Report
%A Romero, Julien
%A Razniewski, Simon
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Do Children Texts Hold The Key To Commonsense Knowledge? : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-58AA-3
%U https://arxiv.org/abs/2210.04530
%R 10.48550/arXiv.2210.04530
%D 2022
%X   Compiling comprehensive repositories of commonsense knowledge is a<br>long-standing problem in AI. Many concerns revolve around the issue of<br>reporting bias, i.e., that frequency in text sources is not a good proxy for<br>relevance or truth. This paper explores whether children's texts hold the key<br>to commonsense knowledge compilation, based on the hypothesis that such content<br>makes fewer assumptions on the reader's knowledge, and therefore spells out<br>commonsense more explicitly. An analysis with several corpora shows that<br>children's texts indeed contain much more, and more typical commonsense<br>assertions. Moreover, experiments show that this advantage can be leveraged in<br>popular language-model-based commonsense knowledge extraction settings, where<br>task-unspecific fine-tuning on small amounts of children texts (childBERT)<br>already yields significant improvements. This provides a refreshing perspective<br>different from the common trend of deriving progress from ever larger models<br>and corpora.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI

Conference paper

J. Romero and S. Razniewski

“Do Children Texts Hold The Key To Commonsense Knowledge?,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi, United Arab Emirates, 2022.

mehr

BibTeX

@inproceedings{DBLP:conf/emnlp/RomeroR22,
TITLE = {Do Children Texts Hold The Key To Commonsense Knowledge?},
AUTHOR = {Romero, Julien and Razniewski, Simon},
LANGUAGE = {eng},
URL = {https://aclanthology.org/2022.emnlp-main.752/; https://aclanthology.org/2022.emnlp-main},
PUBLISHER = {ACL},
YEAR = {2022},
BOOKTITLE = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)},
EDITOR = {Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue},
PAGES = {10954--10959},
ADDRESS = {Abu Dhabi, United Arab Emirates},
}

Endnote

%0 Conference Proceedings
%A Romero, Julien
%A Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Do Children Texts Hold The Key To Commonsense Knowledge? : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DBE5-B
%U https://aclanthology.org/2022.emnlp-main.752/
%D 2022
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2022-12-07 - 2022-12-11
%C Abu Dhabi, United Arab Emirates
%B Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
%E Goldberg, Yoav; Kozareva, Zornitsa; Zhang, Yue
%P 10954 - 10959
%I ACL

Proceedings

S. Singhania, T.-P. Nguyen, and S. Razniewski

Eds., Knowledge Base Construction from Pre-trained Language Models 2022. CEUR-WS, 2022.

mehr

BibTeX

@proceedings{SinghaniaLMKBC22,
TITLE = {Knowledge Base Construction from Pre-trained Language Models 2022 (LM-KBC 2022)},
EDITOR = {Singhania, Sneha and Nguyen, Tuan-Phong and Razniewski, Simon},
LANGUAGE = {eng},
URL = {urn:nbn:de:0074-3274-1; http://ceur-ws.org/Vol-3274/},
PUBLISHER = {CEUR-WS},
YEAR = {2022},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {3274},
ADDRESS = {Virtual Event, Hanghzou, China},
}

Endnote

%0 Conference Proceedings
%E Singhania, Sneha
%E Nguyen, Tuan-Phong
%E Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Base Construction from Pre-trained Language Models 2022 : Proceedings of the Semantic Web Challenge on Knowledge Base Construction from Pre-trained Language Models 2022
co-located with the 21st International Semantic Web Conference (ISWC2022)
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-C723-D
%U urn:nbn:de:0074-3274-1
%U http://ceur-ws.org/Vol-3274/
%I CEUR-WS
%D 2022
%B Semantic Web Challenge on Knowledge Base Construction from Pre-trained Language Models
%Z date of event: 2022-10 - 2022-10
%D 2022
%C Virtual Event, Hanghzou, China
%S CEUR Workshop Proceedings
%V 3274

Article

S. Singhania, S. Razniewski, and G. Weikum

“Predicting Document Coverage for Relation Extraction,” Transactions of the Association of Computational Linguistics, vol. 10, 2022.

mehr

BibTeX

@article{Singhania2022,
TITLE = {Predicting Document Coverage for Relation Extraction},
AUTHOR = {Singhania, Sneha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {2307-387X},
DOI = {10.1162/tacl_a_00456},
PUBLISHER = {ACL},
ADDRESS = {Cambridge, MA},
YEAR = {2022},
JOURNAL = {Transactions of the Association of Computational Linguistics},
VOLUME = {10},
PAGES = {207--223},
}

Endnote

%0 Journal Article
%A Singhania, Sneha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Predicting Document Coverage for Relation Extraction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-27B8-B
%R 10.1162/tacl_a_00456
%7 2022
%D 2022
%J Transactions of the Association of Computational Linguistics
%V 10
%& 207
%P 207 - 223
%I ACL
%C Cambridge, MA
%@ false

Thesis

D5IMPR-CS

A. Tigunova

“Extracting Personal Information from Conversations,” Universität des Saarlandes, Saarbrücken, 2022.

mehr

Abstract

Personal knowledge is a versatile resource that is valuable for a wide range of downstream applications. Background facts about users can allow chatbot assistants to produce more topical and empathic replies. In the context of recommendation and retrieval models, personal facts can be used to customize the ranking results for individual users. A Personal Knowledge Base, populated with personal facts, such as demographic information, interests and interpersonal relationships, is a unique endpoint for storing and querying personal knowledge. Such knowledge bases are easily interpretable and can provide users with full control over their own personal knowledge, including revising stored facts and managing access by downstream services for personalization purposes. To alleviate users from extensive manual effort to build such personal knowledge base, we can leverage automated extraction methods applied to the textual content of the users, such as dialogue transcripts or social media posts. Mainstream extraction methods specialize on well-structured data, such as biographical texts or encyclopedic articles, which are rare for most people. In turn, conversational data is abundant but challenging to process and requires specialized methods for extraction of personal facts. In this dissertation we address the acquisition of personal knowledge from conversational data. We propose several novel deep learning models for inferring speakers’ personal attributes: • Demographic attributes, age, gender, profession and family status, are inferred by HAMs - hierarchical neural classifiers with attention mechanism. Trained HAMs can be transferred between different types of conversational data and provide interpretable predictions. • Long-tailed personal attributes, hobby and profession, are predicted with CHARM - a zero-shot learning model, overcoming the lack of labeled training samples for rare attribute values. By linking conversational utterances to external sources, CHARM is able to predict attribute values which it never saw during training. • Interpersonal relationships are inferred with PRIDE - a hierarchical transformer-based model. To accurately predict fine-grained relationships, PRIDE leverages personal traits of the speakers and the style of conversational utterances. Experiments with various conversational texts, including Reddit discussions and movie scripts, demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.

BibTeX

@phdthesis{Tiguphd2022,
TITLE = {Extracting Personal Information from Conversations},
AUTHOR = {Tigunova, Anna},
LANGUAGE = {eng},
URL = {nbn:de:bsz:291--ds-356280},
DOI = {10.22028/D291-35628},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2022},
DATE = {2022},
ABSTRACT = {Personal knowledge is a versatile resource that is valuable for a wide range of downstream applications. Background facts about users can allow chatbot assistants to produce more topical and empathic replies. In the context of recommendation and retrieval models, personal facts can be used to customize the ranking results for individual users. A Personal Knowledge Base, populated with personal facts, such as demographic information, interests and interpersonal relationships, is a unique endpoint for storing and querying personal knowledge. Such knowledge bases are easily interpretable and can provide users with full control over their own personal knowledge, including revising stored facts and managing access by downstream services for personalization purposes. To alleviate users from extensive manual effort to build such personal knowledge base, we can leverage automated extraction methods applied to the textual content of the users, such as dialogue transcripts or social media posts. Mainstream extraction methods specialize on well-structured data, such as biographical texts or encyclopedic articles, which are rare for most people. In turn, conversational data is abundant but challenging to process and requires specialized methods for extraction of personal facts. In this dissertation we address the acquisition of personal knowledge from conversational data. We propose several novel deep learning models for inferring speakers{\textquoteright} personal attributes: \mbox{$\bullet$} Demographic attributes, age, gender, profession and family status, are inferred by HAMs -- hierarchical neural classifiers with attention mechanism. Trained HAMs can be transferred between different types of conversational data and provide interpretable predictions. \mbox{$\bullet$} Long-tailed personal attributes, hobby and profession, are predicted with CHARM -- a zero-shot learning model, overcoming the lack of labeled training samples for rare attribute values. By linking conversational utterances to external sources, CHARM is able to predict attribute values which it never saw during training. \mbox{$\bullet$} Interpersonal relationships are inferred with PRIDE -- a hierarchical transformer-based model. To accurately predict fine-grained relationships, PRIDE leverages personal traits of the speakers and the style of conversational utterances. Experiments with various conversational texts, including Reddit discussions and movie scripts, demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.},
}

Endnote

%0 Thesis
%A Tigunova, Anna
%Y Weikum, Gerhard
%A referee: Yates, Andrew
%A referee: Demberg, Vera
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Extracting Personal Information from
Conversations :
%G eng
%U http://hdl.handle.net/21.11116/0000-000B-3FE1-1
%R 10.22028/D291-35628
%U nbn:de:bsz:291--ds-356280
%F OTHER: hdl:20.500.11880/32546
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2022
%P 139 p.
%V phd
%9 phd
%X Personal knowledge is a versatile resource that is valuable for a wide range of downstream applications. Background facts about users can allow chatbot assistants to produce more topical and empathic replies. In the context of recommendation and retrieval models, personal facts can be used to customize the ranking results for individual users. A Personal Knowledge Base, populated with personal facts, such as demographic information, interests and interpersonal relationships, is a unique endpoint for storing and querying personal knowledge. Such knowledge bases are easily interpretable and can provide users with full control over their own personal knowledge, including revising stored facts and managing access by downstream services for personalization purposes. To alleviate users from extensive manual effort to build such personal knowledge base, we can leverage automated extraction methods applied to the textual content of the users, such as dialogue transcripts or social media posts. Mainstream extraction methods specialize on well-structured data, such as biographical texts or encyclopedic articles, which are rare for most people. In turn, conversational data is abundant but challenging to process and requires specialized methods for extraction of personal facts. In this dissertation we address the acquisition of personal knowledge from conversational data. We propose several novel deep learning models for inferring speakers&#8217; personal attributes: &#8226; Demographic attributes, age, gender, profession and family status, are inferred by HAMs - hierarchical neural classifiers with attention mechanism. Trained HAMs can be transferred between different types of conversational data and provide interpretable predictions. &#8226; Long-tailed personal attributes, hobby and profession, are predicted with CHARM - a zero-shot learning model, overcoming the lack of labeled training samples for rare attribute values. By linking conversational utterances to external sources, CHARM is able to predict attribute values which it never saw during training. &#8226; Interpersonal relationships are inferred with PRIDE - a hierarchical transformer-based model. To accurately predict fine-grained relationships, PRIDE leverages personal traits of the speakers and the style of conversational utterances. Experiments with various conversational texts, including Reddit discussions and movie scripts, demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/32546

Conference paper

H. D. Tran and A. Yates

“Dense Retrieval with Entity Views,” in CIKM ’22, 31st ACM International Conference on Information and Knowledge Management, Atlanta GA USA, 2022.

mehr

BibTeX

@inproceedings{TranCIKM2022,
TITLE = {Dense Retrieval with Entity Views},
AUTHOR = {Tran, Hai Dang and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-1-4503-9236-5},
DOI = {10.1145/3511808.3557285},
PUBLISHER = {ACM},
YEAR = {2022},
BOOKTITLE = {CIKM '22, 31st ACM International Conference on Information and Knowledge Management},
EDITOR = {Al Hasan, Mohammad and Xiong, Li},
PAGES = {1955--1964},
ADDRESS = {Atlanta GA USA},
}

Endnote

%0 Conference Proceedings
%A Tran, Hai Dang
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Dense Retrieval with Entity Views : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-1669-6
%R 10.1145/3511808.3557285
%D 2022
%B 31st ACM International Conference on Information and Knowledge Management
%Z date of event: 2022-10-17 - 2022-10-21
%C Atlanta GA USA
%B CIKM '22
%E Al Hasan, Mohammad; Xiong, Li
%P 1955 - 1964
%I ACM
%@ 978-1-4503-9236-5

Article

A. S. Varde

“Computational Estimation by Scientific Data Mining with Classical Methods to Automate Learning Strategies of Scientists,” ACM Transactions on Knowledge Discovery from Data, vol. 16, no. 5, 2022.

mehr

BibTeX

@article{Varde2022b,
TITLE = {Computational Estimation by Scientific Data Mining with Classical Methods to Automate Learning Strategies of Scientists},
AUTHOR = {Varde, Aparna S.},
LANGUAGE = {eng},
DOI = {10.1145/3502736},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2022},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {16},
NUMBER = {5},
PAGES = {1--52},
EID = {86},
}

Endnote

%0 Journal Article
%A Varde, Aparna S.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Computational Estimation by Scientific Data Mining with Classical Methods to Automate Learning Strategies of Scientists : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-9D92-0
%R 10.1145/3502736
%7 2022
%D 2022
%J ACM Transactions on Knowledge Discovery from Data
%V 16
%N 5
%& 1
%P 1 - 52
%Z sequence number: 86
%I ACM
%C New York, NY

Article

A. S. Varde, A. Pandey, and X. Du

“Prediction Tool on Fine Particle Pollutants and Air Quality for Environmental Engineering,” SN Computer Science, vol. 3, no. 3, 2022.

mehr

BibTeX

@article{Varde2022,
TITLE = {Prediction Tool on Fine Particle Pollutants and Air Quality for Environmental Engineering},
AUTHOR = {Varde, Aparna S. and Pandey, Abidha and Du, Xu},
LANGUAGE = {eng},
ISSN = {2661-8907},
DOI = {10.1007/s42979-022-01068-2},
PUBLISHER = {Springer Nature},
ADDRESS = {Singapore},
YEAR = {2022},
JOURNAL = {SN Computer Science},
VOLUME = {3},
NUMBER = {3},
EID = {184},
}

Endnote

%0 Journal Article
%A Varde, Aparna S.
%A Pandey, Abidha
%A Du, Xu
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Prediction Tool on Fine Particle Pollutants and Air Quality for Environmental Engineering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-2F55-3
%R 10.1007/s42979-022-01068-2
%7 2022
%D 2022
%J SN Computer Science
%V 3
%N 3
%Z sequence number: 184
%I Springer Nature
%C Singapore
%@ false

Thesis

Y. Wang

“Coreference Resolution for Extracting Quantity-Facts from Multiple Sentences,” Universität des Saarlandes, Saarbrücken, 2022.

mehr

Abstract

Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.

BibTeX

@mastersthesis{WangMSc2020,
TITLE = {Coreference Resolution for Extracting Quantity-Facts from Multiple Sentences},
AUTHOR = {Wang, Yongqing},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2022},
DATE = {2022},
ABSTRACT = {Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.},
}

Endnote

%0 Thesis
%A Wang, Yongqing
%Y Pal, Koninika
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Coreference Resolution for Extracting Quantity-Facts from Multiple Sentences : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-F5F1-F
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2022
%P XI, 58 p.
%V master
%9 master
%X Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.

2021

Article

D. I. Adelani, J. Abbott, G. Neubig, D. D’souza, J. Kreutzer, C. Lignos, C. Palen-Michel, H. Buzaaba, S. Rijhwani, S. Ruder, S. Mayhew, I. A. Azime, S. H. Muhammad, C. C. Emezue, J. Nakatumba-Nabende, P. Ogayo, A. Anuoluwapo, C. Gitau, D. Mbaye, J. Alabi, S. M. Yimam, T. R. Gwadabe, I. Ezeani, R. A. Niyongabo, J. Mukiibi, V. Otiende, I. Orife, D. David, S. Ngom, T. Adewumi, P. Rayson, M. Adeyemi, G. Muriuki, E. Anebi, C. Chukwuneke, N. Odu, E. P. Wairagala, S. Oyerinde, C. Siro, T. S. Bateesa, T. Oloyede, Y. Wambui, V. Akinode, D. Nabagereka, M. Katusiime, A. Awokoya, M. MBOUP, D. Gebreyohannes, H. Tilaye, K. Nwaike, D. Wolde, A. Faye, B. Sibanda, O. Ahia, B. F. P. Dossou, K. Ogueji, T. I. DIOP, A. Diallo, A. Akinfaderin, T. Marengereke, and S. Osei

“MasakhaNER: Named Entity Recognition for African Languages,” Transactions of the Association for Computational Linguistics, vol. 9, 2021.

mehr

BibTeX

@article{Adelani2021,
TITLE = {{MasakhaNER}: {N}amed Entity Recognition for {A}frican Languages},
AUTHOR = {Adelani, David Ifeoluwa and Abbott, Jade and Neubig, Graham and D{\textquoteright}souza, Daniel and Kreutzer, Julia and Lignos, Constantine and Palen-Michel, Chester and Buzaaba, Happy and Rijhwani, Shruti and Ruder, Sebastian and Mayhew, Stephen and Azime, Israel Abebe and Muhammad, Shamsuddeen H. and Emezue, Chris Chinenye and Nakatumba-Nabende, Joyce and Ogayo, Perez and Anuoluwapo, Aremu and Gitau, Catherine and Mbaye, Derguene and Alabi, Jesujoba and Yimam, Seid Muhie and Gwadabe, Tajuddeen Rabiu and Ezeani, Ignatius and Niyongabo, Rubungo Andre and Mukiibi, Jonathan and Otiende, Verrah and Orife, Iroro and David, Davis and Ngom, Samba and Adewumi, Tosin and Rayson, Paul and Adeyemi, Mofetoluwa and Muriuki, Gerald and Anebi, Emmanuel and Chukwuneke, Chiamaka and Odu, Nkiruka and Wairagala, Eric Peter and Oyerinde, Samuel and Siro, Clemencia and Bateesa, Tobius Saul and Oloyede, Temilola and Wambui, Yvonne and Akinode, Victor and Nabagereka, Deborah and Katusiime, Maurice and Awokoya, Ayodele and MBOUP, Mouhamadane and Gebreyohannes, Dibora and Tilaye, Henok and Nwaike, Kelechi and Wolde, Degaga and Faye, Abdoulaye and Sibanda, Blessing and Ahia, Orevaoghene and Dossou, Bonaventure F. P. and Ogueji, Kelechi and DIOP, Thierno Ibrahima and Diallo, Abdoulaye and Akinfaderin, Adewale and Marengereke, Tendai and Osei, Salomey},
LANGUAGE = {eng},
ISSN = {2307-387X},
DOI = {10.1162/tacl_a_00416},
PUBLISHER = {ACL},
YEAR = {2021},
JOURNAL = {Transactions of the Association for Computational Linguistics},
VOLUME = {9},
PAGES = {1116--1131},
}

Endnote

%0 Journal Article
%A Adelani, David Ifeoluwa
%A Abbott, Jade
%A Neubig, Graham
%A D&#8217;souza, Daniel
%A Kreutzer, Julia
%A Lignos, Constantine
%A Palen-Michel, Chester
%A Buzaaba, Happy
%A Rijhwani, Shruti
%A Ruder, Sebastian
%A Mayhew, Stephen
%A Azime, Israel Abebe
%A Muhammad, Shamsuddeen H.
%A Emezue, Chris Chinenye
%A Nakatumba-Nabende, Joyce
%A Ogayo, Perez
%A Anuoluwapo, Aremu
%A Gitau, Catherine
%A Mbaye, Derguene
%A Alabi, Jesujoba
%A Yimam, Seid Muhie
%A Gwadabe, Tajuddeen Rabiu
%A Ezeani, Ignatius
%A Niyongabo, Rubungo Andre
%A Mukiibi, Jonathan
%A Otiende, Verrah
%A Orife, Iroro
%A David, Davis
%A Ngom, Samba
%A Adewumi, Tosin
%A Rayson, Paul
%A Adeyemi, Mofetoluwa
%A Muriuki, Gerald
%A Anebi, Emmanuel
%A Chukwuneke, Chiamaka
%A Odu, Nkiruka
%A Wairagala, Eric Peter
%A Oyerinde, Samuel
%A Siro, Clemencia
%A Bateesa, Tobius Saul
%A Oloyede, Temilola
%A Wambui, Yvonne
%A Akinode, Victor
%A Nabagereka, Deborah
%A Katusiime, Maurice
%A Awokoya, Ayodele
%A MBOUP, Mouhamadane
%A Gebreyohannes, Dibora
%A Tilaye, Henok
%A Nwaike, Kelechi
%A Wolde, Degaga
%A Faye, Abdoulaye
%A Sibanda, Blessing
%A Ahia, Orevaoghene
%A Dossou, Bonaventure F. P.
%A Ogueji, Kelechi
%A DIOP, Thierno Ibrahima
%A Diallo, Abdoulaye
%A Akinfaderin, Adewale
%A Marengereke, Tendai
%A Osei, Salomey
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T MasakhaNER: Named Entity Recognition for African Languages : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-115A-E
%R 10.1162/tacl_a_00416
%7 2021
%D 2021
%J Transactions of the Association for Computational Linguistics
%V 9
%& 1116
%P 1116 - 1131
%I ACL
%@ false

Conference paper

D4D5

J. Ali, P. Lahoti, and K. P. Gummadi

“Accounting for Model Uncertainty in Algorithmic Discrimination,” in AIES ’21, Fourth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society, Virtual Conference, 2021.

mehr

BibTeX

@inproceedings{Ali_AIES2021,
TITLE = {Accounting for Model Uncertainty in Algorithmic Discrimination},
AUTHOR = {Ali, Junaid and Lahoti, Preethi and Gummadi, Krishna P.},
LANGUAGE = {eng},
ISBN = {978-1-4503-8473-5},
DOI = {10.1145/3461702.3462630},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {AIES '21, Fourth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society},
EDITOR = {Fourcade, Marion and Kuipers, Benjamin and Lazar, Seth and Mulligan, Deirdre},
PAGES = {336--345},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Ali, Junaid
%A Lahoti, Preethi
%A Gummadi, Krishna P.
%+ Computer Graphics, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Accounting for Model Uncertainty in Algorithmic Discrimination : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-72E3-7
%R 10.1145/3461702.3462630
%D 2021
%B Fourth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society
%Z date of event: 2021-05-19 - 2021-05-21
%C Virtual Conference
%B AIES '21
%E Fourcade, Marion; Kuipers, Benjamin; Lazar, Seth; Mulligan, Deirdre
%P 336 - 345
%I ACM
%@ 978-1-4503-8473-5

Conference paper

H. Arnaout, S. Razniewski, G. Weikum, and J. Z. Pan

“Negative Knowledge for Open-world Wikidata,” in The Web Conference (WWW 2021), Ljubljana, Slovenia, 2021.

mehr

BibTeX

@inproceedings{Arnaout_WWW21,
TITLE = {Negative Knowledge for Open-world {W}ikidata},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard and Pan, Jeff Z.},
LANGUAGE = {eng},
ISBN = {978-1-4503-8313-4},
DOI = {10.1145/3442442.3452339},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {The Web Conference (WWW 2021)},
EDITOR = {Leskovec, Jure and Grobelnik, Marko and Najork, Mark and Tan, Jie and Zia, Leila},
PAGES = {544--551},
ADDRESS = {Ljubljana, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%A Pan, Jeff Z.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Negative Knowledge for Open-world Wikidata : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6351-C
%R 10.1145/3442442.3452339
%D 2021
%B The Web Conference
%Z date of event: 2021-04-19 - 2021-04-23
%C Ljubljana, Slovenia
%B The Web Conference
%E Leskovec, Jure; Grobelnik, Marko; Najork, Mark; Tan, Jie; Zia, Leila
%P 544 - 551
%I ACM
%@ 978-1-4503-8313-4

Article

H. Arnaout, S. Razniewski, G. Weikum, and J. Z. Pan

“Negative Statements Considered Useful,” Journal of Web Semantics, vol. 71, 2021.

mehr

BibTeX

@article{Arnaout2021,
TITLE = {Negative Statements Considered Useful},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard and Pan, Jeff Z.},
LANGUAGE = {eng},
DOI = {10.1016/j.websem.2021.100661},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2021},
DATE = {2021},
JOURNAL = {Journal of Web Semantics},
VOLUME = {71},
EID = {100661},
}

Endnote

%0 Journal Article
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%A Pan, Jeff Z.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Negative Statements Considered Useful : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-A586-5
%R 10.1016/j.websem.2021.100661
%7 2021
%D 2021
%J Journal of Web Semantics
%V 71
%Z sequence number: 100661
%I Elsevier
%C Amsterdam

Article

H. Arnaout, S. Razniewski, G. Weikum, and J. Z. Pan

“Wikinegata: a Knowledge Base with Interesting Negative Statements,” Proceedings of the VLDB Endowment (Proc. VLDB 2021), vol. 14, no. 12, 2021.

mehr

BibTeX

@article{Arnaout2021_PVLDB,
TITLE = {Wikinegata: {A} Knowledge Base with Interesting Negative Statements},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard and Pan, Jeff Z.},
LANGUAGE = {eng},
PUBLISHER = {VLDB Endowment Inc.},
YEAR = {2021},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {14},
NUMBER = {12},
PAGES = {2807--2810},
BOOKTITLE = {Proceedings of the 47th International Conference on Very Large Data Bases (VLDB 2021)},
EDITOR = {Dong, Xin Luna and Naumann, Felix},
}

Endnote

%0 Journal Article
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%A Pan, Jeff Z.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Wikinegata: a Knowledge Base with Interesting Negative
Statements : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6319-C
%7 2021
%D 2021
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 14
%N 12
%& 2807
%P 2807 - 2810
%I VLDB Endowment Inc.
%B Proceedings of the 47th International Conference on Very Large Data Bases
%O VLDB 2021 Copenhagen, Denmark, 16-20 August 2021

Conference paper

A. B. Biswas, H. Arnaout, and S. Razniewski

“Neguess: Wikidata-entity Guessing Game with Negative Clues,” in Proceedings of the ISWC 2021 Posters, Demos and Industry Tracks (ISWC-Posters-Demos-Industry 2021), Virtual Conference, 2021.

mehr

BibTeX

@inproceedings{Biswas_ISWC21,
TITLE = {Neguess: {W}ikidata-entity Guessing Game with Negative Clues},
AUTHOR = {Biswas, Aditya Bikram and Arnaout, Hiba and Razniewski, Simon},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-2980/paper350.pdf; urn:nbn:de:0074-2980-6},
PUBLISHER = {CEUR-WS.org},
YEAR = {2021},
BOOKTITLE = {Proceedings of the ISWC 2021 Posters, Demos and Industry Tracks (ISWC-Posters-Demos-Industry 2021)},
EDITOR = {Seneviratne, Oshani and Pesquita, Catia and Sequeda, Juan and Etcheverry, Lorena},
EID = {350},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2980},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Biswas, Aditya Bikram
%A Arnaout, Hiba
%A Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Neguess: Wikidata-entity Guessing Game with Negative Clues : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-65AD-3
%U http://ceur-ws.org/Vol-2980/paper350.pdf
%D 2021
%B 20th International Semantic Web Conference
%Z date of event: 2021-10-24 - 2021-10-28
%C Virtual Conference
%B Proceedings of the ISWC 2021 Posters, Demos and Industry Tracks
%E Seneviratne, Oshani; Pesquita, Catia; Sequeda, Juan; Etcheverry, Lorena
%Z sequence number: 350
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 2980
%@ false

Conference paper

K. Budhathoki, M. Boley, and J. Vreeken

“Discovering Reliable Causal Rules,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2021), Virtual Conference, 2021.

mehr

BibTeX

@inproceedings{budhathoki:21:dice,
TITLE = {Discovering Reliable Causal Rules},
AUTHOR = {Budhathoki, Kailash and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-670-0},
DOI = {10.1137/1.9781611976700.1},
PUBLISHER = {SIAM},
YEAR = {2021},
BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2021)},
EDITOR = {Demeniconi, Carlotta and Davidson, Ian},
PAGES = {1--9},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Discovering Reliable Causal Rules : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-2571-F
%R 10.1137/1.9781611976700.1
%D 2021
%B SIAM International Conference on Data Mining
%Z date of event: 2021-04-29 - 2021-05-01
%C Virtual Conference
%B Proceedings of the SIAM International Conference on Data Mining
%E Demeniconi, Carlotta; Davidson, Ian
%P 1 - 9
%I SIAM
%@ 978-1-61197-670-0

Conference paper

E. Chang, X. Shen, D. Zhu, V. Demberg, and H. Su

“Neural Data-to-Text Generation with LM-based Text Augmentation,” in The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), Online, 2021.

mehr

BibTeX

@inproceedings{chang2021neural,
TITLE = {Neural Data-to-Text Generation with {LM}-based Text Augmentation},
AUTHOR = {Chang, Ernie and Shen, Xiaoyu and Zhu, Dawei and Demberg, Vera and Su, Hui},
LANGUAGE = {eng},
ISBN = {978-1-954085-02-2},
DOI = {10.18653/v1/2021.eacl-main.64},
PUBLISHER = {ACL},
YEAR = {2021},
BOOKTITLE = {The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)},
EDITOR = {Merlo, Paola},
PAGES = {758--768},
ADDRESS = {Online},
}

Endnote

%0 Conference Proceedings
%A Chang, Ernie
%A Shen, Xiaoyu
%A Zhu, Dawei
%A Demberg, Vera
%A Su, Hui
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Neural Data-to-Text Generation with LM-based Text Augmentation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-149E-0
%R 10.18653/v1/2021.eacl-main.64
%D 2021
%B 16th Conference of the European Chapter of the Association for Computational Linguistics
%Z date of event: 2021-04-19 - 2021-04-23
%C Online
%B The 16th Conference of the European Chapter of the Association for Computational Linguistics
%E Merlo, Paola
%P 758 - 768
%I ACL
%@ 978-1-954085-02-2

Paper

P. Christmann, R. Saha Roy, and G. Weikum

“Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases,” 2021. [Online]. Available: https://arxiv.org/abs/2108.08597.

mehr

Abstract

Answering complex questions over knowledge bases (KB-QA) faces huge input
data with billions of facts, involving millions of entities and thousands of
predicates. For efficiency, QA systems first reduce the answer search space by
identifying a set of facts that is likely to contain all answers and relevant
cues. The most common technique or doing this is to apply named entity
disambiguation (NED) systems to the question, and retrieve KB facts for the
disambiguated entities. This work presents CLOCQ, an efficient method that
prunes irrelevant parts of the search space using KB-aware signals. CLOCQ uses
a top-k query processor over score-ordered lists of KB items that combine
signals about lexical matching, relevance to the question, coherence among
candidate items, and connectivity in the KB graph. Experiments with two recent
QA benchmarks for complex questions demonstrate the superiority of CLOCQ over
state-of-the-art baselines with respect to answer presence, size of the search
space, and runtimes.

BibTeX

@online{Christmann_2108.08597,
TITLE = {Beyond {NED}: {F}ast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2108.08597},
EPRINT = {2108.08597},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Answering complex questions over knowledge bases (KB-QA) faces huge input<br>data with billions of facts, involving millions of entities and thousands of<br>predicates. For efficiency, QA systems first reduce the answer search space by<br>identifying a set of facts that is likely to contain all answers and relevant<br>cues. The most common technique or doing this is to apply named entity<br>disambiguation (NED) systems to the question, and retrieve KB facts for the<br>disambiguated entities. This work presents CLOCQ, an efficient method that<br>prunes irrelevant parts of the search space using KB-aware signals. CLOCQ uses<br>a top-k query processor over score-ordered lists of KB items that combine<br>signals about lexical matching, relevance to the question, coherence among<br>candidate items, and connectivity in the KB graph. Experiments with two recent<br>QA benchmarks for complex questions demonstrate the superiority of CLOCQ over<br>state-of-the-art baselines with respect to answer presence, size of the search<br>space, and runtimes.<br>},
}

Endnote

%0 Report
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6360-B
%U https://arxiv.org/abs/2108.08597
%D 2021
%X   Answering complex questions over knowledge bases (KB-QA) faces huge input<br>data with billions of facts, involving millions of entities and thousands of<br>predicates. For efficiency, QA systems first reduce the answer search space by<br>identifying a set of facts that is likely to contain all answers and relevant<br>cues. The most common technique or doing this is to apply named entity<br>disambiguation (NED) systems to the question, and retrieve KB facts for the<br>disambiguated entities. This work presents CLOCQ, an efficient method that<br>prunes irrelevant parts of the search space using KB-aware signals. CLOCQ uses<br>a top-k query processor over score-ordered lists of KB items that combine<br>signals about lexical matching, relevance to the question, coherence among<br>candidate items, and connectivity in the KB graph. Experiments with two recent<br>QA benchmarks for complex questions demonstrate the superiority of CLOCQ over<br>state-of-the-art baselines with respect to answer presence, size of the search<br>space, and runtimes.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Thesis

P. Christmann

“CLOCQ: Efficient Search Space Reduction for Complex Question Answering over Knowledge Bases,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

Abstract

BibTeX

@mastersthesis{ChristmannMSc2021,
TITLE = {{CLOCQ}: Efficient Search Space Reduction for Complex Question Answering over Knowledge Bases},
AUTHOR = {Christmann, Philipp},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
ABSTRACT = {Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.},
}

Endnote

%0 Thesis
%A Christmann, Philipp
%Y Saha Roy, Rishiraj
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CLOCQ: Efficient Search Space Reduction for Complex Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-BEF6-9
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 54 p.
%V master
%9 master
%X Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.

Conference paper

C. X. Chu, S. Razniewski, and G. Weikum

“KnowFi: Knowledge Extraction from Long Fictional Texts,” in Automated Knowledge Base Construction (AKBC 2021), Virtual Conference, 2021.

mehr

BibTeX

@inproceedings{DBLP:conf/akbc/ChuRW21,
TITLE = {{KnowFi}: {K}nowledge Extraction from Long Fictional Texts},
AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://openreview.net/forum?id=8smkJ2ekBRC},
PUBLISHER = {OpenReview},
YEAR = {2021},
BOOKTITLE = {Automated Knowledge Base Construction (AKBC 2021)},
PAGES = {1--19},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Chu, Cuong Xuan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T KnowFi: Knowledge Extraction from Long Fictional Texts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DC15-5
%U https://openreview.net/forum?id=8smkJ2ekBRC
%D 2021
%B 3rd Conference on Automated Knowledge Base Construction
%Z date of event: 2021-10-04 - 2021-10-08
%C Virtual Conference
%B Automated Knowledge Base Construction
%P 1 - 19
%I OpenReview

Conference paper

D. Dave, V. Anu, and A. S. Varde

“Automating the Classification of Requirements Data,” in IEEE International Conference on Big Data, Orlando, FL, USA (Virtual Event), 2021.

mehr

BibTeX

@inproceedings{Dave_BigData21,
TITLE = {Automating the Classification of Requirements Data},
AUTHOR = {Dave, Dev and Anu, Vaibhav and Varde, Aparna S.},
LANGUAGE = {eng},
ISBN = {978-1-6654-3902-2},
DOI = {10.1109/BigData52589.2021.9671548},
PUBLISHER = {IEEE},
YEAR = {2021},
BOOKTITLE = {IEEE International Conference on Big Data},
EDITOR = {Chen, Yixin and Ludwig, Heiko and Tu, Yicheng and Fayyad, Usama and Zhu, Xingquan and Xu, Xiaohua and Byna, Suren and Liu, Xiong and Zyhang, Jianping and Pan, Shirui and Papalexakis, Vagelis and Wang, Jianwu and Cuzzocrea, Alfredo and Ordonez, Carlos},
PAGES = {5878--5880},
ADDRESS = {Orlando, FL, USA (Virtual Event)},
}

Endnote

%0 Conference Proceedings
%A Dave, Dev
%A Anu, Vaibhav
%A Varde, Aparna S.
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Automating the Classification of Requirements Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-C562-9
%R 10.1109/BigData52589.2021.9671548
%D 2021
%B IEEE International Conference on Big Data
%Z date of event: 2021-12-15 - 2021-12-18
%C Orlando, FL, USA  (Virtual Event)
%B IEEE International Conference on Big Data
%E Chen, Yixin; Ludwig, Heiko; Tu, Yicheng; Fayyad, Usama; Zhu, Xingquan; Xu, Xiaohua; Byna, Suren; Liu, Xiong; Zyhang, Jianping; Pan, Shirui; Papalexakis, Vagelis; Wang, Jianwu; Cuzzocrea, Alfredo; Ordonez, Carlos
%P 5878 - 5880
%I IEEE
%@ 978-1-6654-3902-2

Article

L. De Stefani, E. Terolli, and E. Upfal

“Tiered Sampling: An Efficient Method for Counting Sparse Motifs in Massive Graph Streams,” ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 5, 2021.

mehr

BibTeX

@article{DeStefani2021,
TITLE = {Tiered Sampling: {A}n Efficient Method for Counting Sparse Motifs in Massive Graph Streams},
AUTHOR = {De Stefani, Lorenzo and Terolli, Erisa and Upfal, Eli},
LANGUAGE = {eng},
ISSN = {1556-4681},
DOI = {10.1145/3441299},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2021},
DATE = {2021},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {15},
NUMBER = {5},
PAGES = {1--52},
EID = {79},
}

Endnote

%0 Journal Article
%A De Stefani, Lorenzo
%A Terolli, Erisa
%A Upfal, Eli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Tiered Sampling: An Efficient Method for Counting Sparse Motifs in Massive Graph Streams : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-ED51-2
%R 10.1145/3441299
%7 2021
%D 2021
%J ACM Transactions on Knowledge Discovery from Data
%V 15
%N 5
%& 1
%P 1 - 52
%Z sequence number: 79
%I ACM
%C New York, NY
%@ false

Article

D5BIO

J. Fischer, F. B. Ardakani, K. Kattler, J. Walter, and M. H. Schulz

“CpG Content-dependent Associations between Transcription Factors and Histone Modifications,” PLoS One, vol. 16, no. 4, 2021.

mehr

BibTeX

@article{fischer:21:cpgtfhm,
TITLE = {{CpG} content-dependent associations between transcription factors and histone modifications},
AUTHOR = {Fischer, Jonas and Ardakani, Fatemeh Behjati and Kattler, Kathrin and Walter, J{\"o}rn and Schulz, Marcel Holger},
LANGUAGE = {eng},
ISSN = {1932-6203},
DOI = {10.1371/journal.pone.0249985},
PUBLISHER = {Public Library of Science},
ADDRESS = {San Francisco, CA},
YEAR = {2021},
JOURNAL = {PLoS One},
VOLUME = {16},
NUMBER = {4},
EID = {0249985},
}

Endnote

%0 Journal Article
%A Fischer, Jonas
%A Ardakani, Fatemeh Behjati
%A Kattler, Kathrin
%A Walter, J&#246;rn
%A Schulz, Marcel Holger
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society
%T CpG Content-dependent Associations between Transcription Factors and Histone Modifications  : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-5602-5
%R 10.1371/journal.pone.0249985
%7 2021
%D 2021
%J PLoS One
%V 16
%N 4
%Z sequence number: 0249985
%I Public Library of Science
%C San Francisco, CA
%@ false

Conference paper

J. Fischer, A. Oláh, and J. Vreeken

“What’s in the Box? Exploring the Inner Life of Neural Networks with Robust Rules,” in Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual Event, 2021.

mehr

BibTeX

@inproceedings{Fischer_ICML2021,
TITLE = {What's in the Box? {Exploring} the Inner Life of Neural Networks with Robust Rules},
AUTHOR = {Fischer, Jonas and Ol{\'a}h, Anna and Vreeken, Jilles},
LANGUAGE = {eng},
PUBLISHER = {MLR Press},
YEAR = {2021},
BOOKTITLE = {Proceedings of the 38th International Conference on Machine Learning (ICML 2021)},
EDITOR = {Meila, Marina and Zhang, Tong},
PAGES = {3352--3362},
EID = {26},
SERIES = {Proceedings of the Machine Learning},
VOLUME = {139},
ADDRESS = {Virtual Event},
}

Endnote

%0 Conference Proceedings
%A Fischer, Jonas
%A Ol&#225;h, Anna
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T What&#8217;s in the Box? Exploring the Inner Life of Neural Networks with Robust Rules : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-49F8-E
%D 2021
%B 38th International Conference on Machine Learning
%Z date of event: 2021-07-18 - 2021-07-24
%C Virtual Event
%B Proceedings of the 38th International Conference on Machine Learning
%E Meila, Marina; Zhang, Tong
%P 3352 - 3362
%Z sequence number: 26
%I MLR Press
%B Proceedings of the Machine Learning
%N 139

Conference paper

J. Fischer and R. Burkholz

“Plant ‘n’ Seek: Can You Find the Winning Ticket?,” in International Conference on Learning Representations (ICLR 2022), Virtual, 2021.

mehr

BibTeX

@inproceedings{FischerICLR22,
TITLE = {Plant 'n' Seek: Can You Find the Winning Ticket?},
AUTHOR = {Fischer, Jonas and Burkholz, Rebekka},
LANGUAGE = {eng},
URL = {https://iclr.cc/Conferences/2022},
PUBLISHER = {OpenReview.net},
YEAR = {2022},
BOOKTITLE = {International Conference on Learning Representations (ICLR 2022)},
ADDRESS = {Virtual},
}

Endnote

%0 Conference Proceedings
%A Fischer, Jonas
%A Burkholz, Rebekka
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Plant 'n' Seek: Can You Find the Winning Ticket? : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-B124-6
%D 2021
%B Tenth International Conference on Learning Representations
%Z date of event: 2022-04-25 - 2022-04-29
%C Virtual
%B International Conference on Learning Representations
%I OpenReview.net
%U https://iclr.cc/Conferences/2022

Paper

J. Fischer and R. Burkholz

“Towards Strong Pruning for Lottery Tickets with Non-Zero Biases,” 2021. [Online]. Available: https://arxiv.org/abs/2110.11150.

mehr

Abstract

The strong lottery ticket hypothesis holds the promise that pruning randomly
initialized deep neural networks could offer a computationally efficient
alternative to deep learning with stochastic gradient descent. Common parameter
initialization schemes and existence proofs, however, are focused on networks
with zero biases, thus foregoing the potential universal approximation property
of pruning. To fill this gap, we extend multiple initialization schemes and
existence proofs to non-zero biases, including explicit 'looks-linear'
approaches for ReLU activation functions. These do not only enable truly
orthogonal parameter initialization but also reduce potential pruning errors.
In experiments on standard benchmark data sets, we further highlight the
practical benefits of non-zero bias initialization schemes, and present
theoretically inspired extensions for state-of-the-art strong lottery ticket
pruning.

BibTeX

@online{Fischer_arXiv2110.11150,
TITLE = {Towards Strong Pruning for Lottery Tickets with Non-Zero Biases},
AUTHOR = {Fischer, Jonas and Burkholz, Rebekka},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2110.11150},
EPRINT = {2110.11150},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {The strong lottery ticket hypothesis holds the promise that pruning randomly<br>initialized deep neural networks could offer a computationally efficient<br>alternative to deep learning with stochastic gradient descent. Common parameter<br>initialization schemes and existence proofs, however, are focused on networks<br>with zero biases, thus foregoing the potential universal approximation property<br>of pruning. To fill this gap, we extend multiple initialization schemes and<br>existence proofs to non-zero biases, including explicit 'looks-linear'<br>approaches for ReLU activation functions. These do not only enable truly<br>orthogonal parameter initialization but also reduce potential pruning errors.<br>In experiments on standard benchmark data sets, we further highlight the<br>practical benefits of non-zero bias initialization schemes, and present<br>theoretically inspired extensions for state-of-the-art strong lottery ticket<br>pruning.<br>},
}

Endnote

%0 Report
%A Fischer, Jonas
%A Burkholz, Rebekka
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Towards Strong Pruning for Lottery Tickets with Non-Zero Biases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-B12A-0
%U https://arxiv.org/abs/2110.11150
%D 2021
%X   The strong lottery ticket hypothesis holds the promise that pruning randomly<br>initialized deep neural networks could offer a computationally efficient<br>alternative to deep learning with stochastic gradient descent. Common parameter<br>initialization schemes and existence proofs, however, are focused on networks<br>with zero biases, thus foregoing the potential universal approximation property<br>of pruning. To fill this gap, we extend multiple initialization schemes and<br>existence proofs to non-zero biases, including explicit 'looks-linear'<br>approaches for ReLU activation functions. These do not only enable truly<br>orthogonal parameter initialization but also reduce potential pruning errors.<br>In experiments on standard benchmark data sets, we further highlight the<br>practical benefits of non-zero bias initialization schemes, and present<br>theoretically inspired extensions for state-of-the-art strong lottery ticket<br>pruning.<br>
%K Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI

Conference paper

J. Fischer and J. Vreeken

“Differentiable Pattern Set Mining,” in KDD ’21, 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 2021.

mehr

BibTeX

@inproceedings{Fischer_KDD2021,
TITLE = {Differentiable Pattern Set Mining},
AUTHOR = {Fischer, Jonas and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-4503-8332-5},
DOI = {10.1145/3447548.3467348},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {KDD '21, 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
EDITOR = {Zhu, Fieda and Ooi, Beng Chin and Miao, Chunyan and Cong, Gao and Tang, Jiliang and Derr, Tyler},
PAGES = {383--392},
ADDRESS = {Virtual Event, Singapore},
}

Endnote

%0 Conference Proceedings
%A Fischer, Jonas
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Differentiable Pattern Set Mining : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-652F-2
%R 10.1145/3447548.3467348
%D 2021
%B 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%Z date of event: 2021-08-14 - 2021-08-18
%C Virtual Event, Singapore
%B KDD '21
%E Zhu, Fieda; Ooi, Beng Chin; Miao, Chunyan; Cong, Gao; Tang, Jiliang; Derr, Tyler
%P 383 - 392
%I ACM
%@ 978-1-4503-8332-5

Thesis

D5IMPR-CS

M. H. Gad-Elrab

“Explainable Methods for Knowledge Graph Refinement and Exploration via Symbolic Reasoning,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

Abstract

Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.

BibTeX

@phdthesis{Elrabphd2021,
TITLE = {Explainable Methods for Knowledge Graph Refinement and Exploration via Symbolic Reasoning},
AUTHOR = {Gad-Elrab, Mohamed Hassan},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-344237},
DOI = {10.22028/D291-34423},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
ABSTRACT = {Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.},
}

Endnote

%0 Thesis
%A Gad-Elrab, Mohamed Hassan
%Y Weikum, Gerhard
%A referee: Theobald, Martin
%A referee: Stepanova, Daria
%A referee: Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Explainable Methods for Knowledge Graph Refinement and Exploration via Symbolic Reasoning :
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-427E-0
%R 10.22028/D291-34423
%U urn:nbn:de:bsz:291--ds-344237
%F OTHER: hdl:20.500.11880/31629
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 176 p.
%V phd
%9 phd
%X Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.
%K knowledge graphs
symbolic learning
embedding models
rule learning
Big Data
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/31629

Thesis

D5IMPR-CS

A. Ghazimatin

“Enhancing Explainability and Scrutability of Recommender Systems,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

Abstract

Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithm’s behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in ﬁltering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the system’s behavior can be modiﬁed accordingly. In this thesis, we present our contributions towards explainability and scrutability of recommender systems: • We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms. These explanations reveal relationships between users’ proﬁles and their feed items and are extracted from the local interaction graphs of users. FAIRY employs a learning-to-rank (LTR) method to score candidate explanations based on their relevance and surprisal. • We propose a method, PRINCE, to facilitate provider-side explainability in graph-based recommender systems that use personalized PageRank at their core. PRINCE explanations are comprehensible for users, because they present subsets of the user’s prior actions responsible for the received recommendations. PRINCE operates in a counterfactual setup and builds on a polynomial-time algorithm for ﬁnding the smallest counterfactual explanations. • We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations. ELIXIR enables recommender systems to collect user feedback on pairs of recommendations and explanations. The feedback is incorporated into the model by imposing a soft constraint for learning user-speciﬁc item representations. We evaluate all proposed models and methods with real user studies and demonstrate their beneﬁts at achieving explainability and scrutability in recommender systems.

BibTeX

@phdthesis{Ghazphd2021,
TITLE = {Enhancing Explainability and Scrutability of Recommender Systems},
AUTHOR = {Ghazimatin, Azin},
LANGUAGE = {eng},
URL = {nbn:de:bsz:291--ds-355166},
DOI = {10.22028/D291-35516},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
ABSTRACT = {Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithm{\textquoteright}s behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in {fi}ltering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the system{\textquoteright}s behavior can be modi{fi}ed accordingly. In this thesis, we present our contributions towards explainability and scrutability of recommender systems: \mbox{$\bullet$} We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms. These explanations reveal relationships between users{\textquoteright} pro{fi}les and their feed items and are extracted from the local interaction graphs of users. FAIRY employs a learning-to-rank (LTR) method to score candidate explanations based on their relevance and surprisal. \mbox{$\bullet$} We propose a method, PRINCE, to facilitate provider-side explainability in graph-based recommender systems that use personalized PageRank at their core. PRINCE explanations are comprehensible for users, because they present subsets of the user{\textquoteright}s prior actions responsible for the received recommendations. PRINCE operates in a counterfactual setup and builds on a polynomial-time algorithm for {fi}nding the smallest counterfactual explanations. \mbox{$\bullet$} We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations. ELIXIR enables recommender systems to collect user feedback on pairs of recommendations and explanations. The feedback is incorporated into the model by imposing a soft constraint for learning user-speci{fi}c item representations. We evaluate all proposed models and methods with real user studies and demonstrate their bene{fi}ts at achieving explainability and scrutability in recommender systems.},
}

Endnote

%0 Thesis
%A Ghazimatin, Azin
%Y Weikum, Gerhard
%A referee: Saha Roy, Rishiraj
%A referee: Amer-Yahia, Sihem
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Enhancing Explainability and Scrutability of Recommender Systems :
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-3C99-7
%R 10.22028/D291-35516
%U nbn:de:bsz:291--ds-355166
%F OTHER: hdl:20.500.11880/32590
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 136 p.
%V phd
%9 phd
%X Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithm&#8217;s behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in &#64257;ltering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the system&#8217;s behavior can be modi&#64257;ed accordingly. In this thesis, we present our contributions towards explainability and scrutability of recommender systems: &#8226; We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms. These explanations reveal relationships between users&#8217; pro&#64257;les and their feed items and are extracted from the local interaction graphs of users. FAIRY employs a learning-to-rank (LTR) method to score candidate explanations based on their relevance and surprisal. &#8226; We propose a method, PRINCE, to facilitate provider-side explainability in graph-based recommender systems that use personalized PageRank at their core. PRINCE explanations are comprehensible for users, because they present subsets of the user&#8217;s prior actions responsible for the received recommendations. PRINCE operates in a counterfactual setup and builds on a polynomial-time algorithm for &#64257;nding the smallest counterfactual explanations. &#8226; We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations. ELIXIR enables recommender systems to collect user feedback on pairs of recommendations and explanations. The feedback is incorporated into the model by imposing a soft constraint for learning user-speci&#64257;c item representations. We evaluate all proposed models and methods with real user studies and demonstrate their bene&#64257;ts at achieving explainability and scrutability in recommender systems.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/32590

Paper

A. Ghazimatin, S. Pramanik, R. Saha Roy, and G. Weikum

“ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models,” 2021. [Online]. Available: https://arxiv.org/abs/2102.09388.

mehr

Abstract

System-provided explanations for recommendations are an important component
towards transparent and trustworthy AI. In state-of-the-art research, this is a
one-way signal, though, to improve user acceptance. In this paper, we turn the
role of explanations around and investigate how they can contribute to
enhancing the quality of generated recommendations themselves. We devise a
human-in-the-loop framework, called ELIXIR, where user feedback on explanations
is leveraged for pairwise learning of user preferences. ELIXIR leverages
feedback on pairs of recommendations and explanations to learn user-specific
latent preference vectors, overcoming sparseness by label propagation with
item-similarity-based neighborhoods. Our framework is instantiated using
generalized graph recommendation via Random Walk with Restart. Insightful
experiments with a real user study show significant improvements in movie and
book recommendations over item-level feedback.

BibTeX

@online{Ghazimatin_2102.09388,
TITLE = {{ELIXIR}: {L}earning from User Feedback on Explanations to Improve Recommender Models},
AUTHOR = {Ghazimatin, Azin and Pramanik, Soumajit and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2102.09388},
EPRINT = {2102.09388},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {System-provided explanations for recommendations are an important component<br>towards transparent and trustworthy AI. In state-of-the-art research, this is a<br>one-way signal, though, to improve user acceptance. In this paper, we turn the<br>role of explanations around and investigate how they can contribute to<br>enhancing the quality of generated recommendations themselves. We devise a<br>human-in-the-loop framework, called ELIXIR, where user feedback on explanations<br>is leveraged for pairwise learning of user preferences. ELIXIR leverages<br>feedback on pairs of recommendations and explanations to learn user-specific<br>latent preference vectors, overcoming sparseness by label propagation with<br>item-similarity-based neighborhoods. Our framework is instantiated using<br>generalized graph recommendation via Random Walk with Restart. Insightful<br>experiments with a real user study show significant improvements in movie and<br>book recommendations over item-level feedback.<br>},
}

Endnote

%0 Report
%A Ghazimatin, Azin
%A Pramanik, Soumajit
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ELIXIR: Learning from User Feedback on Explanations to Improve
  Recommender Models : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0309-B
%U https://arxiv.org/abs/2102.09388
%D 2021
%X   System-provided explanations for recommendations are an important component<br>towards transparent and trustworthy AI. In state-of-the-art research, this is a<br>one-way signal, though, to improve user acceptance. In this paper, we turn the<br>role of explanations around and investigate how they can contribute to<br>enhancing the quality of generated recommendations themselves. We devise a<br>human-in-the-loop framework, called ELIXIR, where user feedback on explanations<br>is leveraged for pairwise learning of user preferences. ELIXIR leverages<br>feedback on pairs of recommendations and explanations to learn user-specific<br>latent preference vectors, overcoming sparseness by label propagation with<br>item-similarity-based neighborhoods. Our framework is instantiated using<br>generalized graph recommendation via Random Walk with Restart. Insightful<br>experiments with a real user study show significant improvements in movie and<br>book recommendations over item-level feedback.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG

Conference paper

A. Ghazimatin, S. Pramanik, R. Saha Roy, and G. Weikum

“ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models,” in The Web Conference 2021 (WWW 2021), Ljubljana, Slovenia, 2021.

mehr

BibTeX

@inproceedings{Ghazimatin_WWW21,
TITLE = {{ELIXIR}: {L}earning from User Feedback on Explanations to Improve Recommender Models},
AUTHOR = {Ghazimatin, Azin and Pramanik, Soumajit and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8312-7},
DOI = {10.1145/3442381.3449848},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {The Web Conference 2021 (WWW 2021)},
EDITOR = {Leskovec, Jure and Grobelnik, Marko and Najork, Marc and Tang, Jie and Zia, Leila},
PAGES = {3850--3860},
ADDRESS = {Ljubljana, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Ghazimatin, Azin
%A Pramanik, Soumajit
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0303-1
%R 10.1145/3442381.3449848
%D 2021
%B 30th The Web Conference
%Z date of event: 2021-04-19 - 2021-04-23
%C Ljubljana, Slovenia
%B The Web Conference 2021
%E Leskovec, Jure; Grobelnik, Marko; Najork, Marc; Tang, Jie; Zia, Leila
%P 3850 - 3860
%I ACM
%@ 978-1-4503-8312-7

Conference paper

B. Gonzalez-Moodie, S. Daiek, J. Lorenzo-Trueba, and A. S. Varde

“Multispectral Drone Data Analysis on Coastal Dunes,” in IEEE International Conference on Big Data, Orlando, FL, USA (Virtual Event), 2021.

mehr

BibTeX

@inproceedings{Gonzalez-Moodie_BigData21,
TITLE = {Multispectral Drone Data Analysis on Coastal Dunes},
AUTHOR = {Gonzalez-Moodie, Britnie and Daiek, Shane and Lorenzo-Trueba, Jorge and Varde, Aparna S.},
LANGUAGE = {eng},
ISBN = {978-1-6654-3902-2},
DOI = {10.1109/BigData52589.2021.9671340},
PUBLISHER = {IEEE},
YEAR = {2021},
BOOKTITLE = {IEEE International Conference on Big Data},
EDITOR = {Chen, Yixin and Ludwig, Heiko and Tu, Yicheng and Fayyad, Usama and Zhu, Xingquan and Xu, Xiaohua and Byna, Suren and Liu, Xiong and Zyhang, Jianping and Pan, Shirui and Papalexakis, Vagelis and Wang, Jianwu and Cuzzocrea, Alfredo and Ordonez, Carlos},
PAGES = {5903--5905},
ADDRESS = {Orlando, FL, USA (Virtual Event)},
}

Endnote

%0 Conference Proceedings
%A Gonzalez-Moodie, Britnie
%A Daiek, Shane
%A Lorenzo-Trueba, Jorge
%A Varde, Aparna S.
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Multispectral Drone Data Analysis on Coastal Dunes : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-C6F1-6
%R 10.1109/BigData52589.2021.9671340
%D 2021
%B IEEE International Conference on Big Data
%Z date of event: 2021-12-15 - 2021-12-18
%C Orlando, FL, USA  (Virtual Event)
%B IEEE International Conference on Big Data
%E Chen, Yixin; Ludwig, Heiko; Tu, Yicheng; Fayyad, Usama; Zhu, Xingquan; Xu, Xiaohua; Byna, Suren; Liu, Xiong; Zyhang, Jianping; Pan, Shirui; Papalexakis, Vagelis; Wang, Jianwu; Cuzzocrea, Alfredo; Ordonez, Carlos
%P 5903 - 5905
%I IEEE
%@ 978-1-6654-3902-2

Conference paper

A. Guimarães and G. Weikum

“X-Posts Explained: Analyzing and Predicting Controversial Contributions in Thematically Diverse Reddit Forums,” in Proceedings of the Fifteenth International Conference on Web and Social Media (ICWSM 2021), Atlanta, GA, USA, 2021.

mehr

BibTeX

@inproceedings{Guimaraes_ICWSM2021,
TITLE = {X-Posts Explained: {A}nalyzing and Predicting Controversial Contributions in Thematically Diverse {R}eddit Forums},
AUTHOR = {Guimar{\~a}es, Anna and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-57735-869-5},
URL = {https://ojs.aaai.org/index.php/ICWSM/article/view/18050},
PUBLISHER = {AAAI},
YEAR = {2021},
BOOKTITLE = {Proceedings of the Fifteenth International Conference on Web and Social Media (ICWSM 2021)},
PAGES = {163--172},
ADDRESS = {Atlanta, GA, USA},
}

Endnote

%0 Conference Proceedings
%A Guimar&#227;es, Anna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T X-Posts Explained: Analyzing and Predicting Controversial Contributions in Thematically Diverse Reddit Forums : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0345-7
%U https://ojs.aaai.org/index.php/ICWSM/article/view/18050
%D 2021
%B 15th International Conference on Web and Social Media
%Z date of event: 2021-06-07 - 2021-06-10
%C Atlanta, GA, USA
%B Proceedings of the Fifteenth International Conference on Web and Social Media
%P 163 - 172
%I AAAI
%@ 978-1-57735-869-5
%U https://ojs.aaai.org/index.php/ICWSM/article/view/18050/17853

Conference paper

A. Guimarães, E. Terolli, and G. Weikum

“Comparing Health Forums: User Engagement, Salient Entities, Medical Detail,” in CSCW ’21 Companion, Virtual Event, USA, 2021.

mehr

BibTeX

@inproceedings{Guimaraes21,
TITLE = {Comparing Health Forums: {U}ser Engagement, Salient Entities, Medical Detail},
AUTHOR = {Guimar{\~a}es, Anna and Terolli, Erisa and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8479-7},
DOI = {10.1145/3462204.3481748},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {CSCW '21 Companion},
EDITOR = {Ding, Sharon and Fussell, Susan and Monroy-Hern{\'a}ndez, Andr{\'e}s and Munson, Sean and Shklovski, Irina and Naaman, Mor},
PAGES = {57--61},
ADDRESS = {Virtual Event, USA},
}

Endnote

%0 Conference Proceedings
%A Guimar&#227;es, Anna
%A Terolli, Erisa
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Comparing Health Forums: User Engagement, Salient Entities, Medical Detail : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-BDA5-7
%R 10.1145/3462204.3481748
%D 2021
%B 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing
%Z date of event: 2021-10-23 - 2021-10-27
%C Virtual Event, USA
%B CSCW  '21 Companion
%E Ding, Sharon; Fussell, Susan; Monroy-Hern&#225;ndez, Andr&#233;s; Munson, Sean; Shklovski, Irina; Naaman, Mor
%P 57 - 61
%I ACM
%@ 978-1-4503-8479-7

Paper

M. Hedderich, J. Fischer, D. Klakow, and J. Vreeken

“Label-Descriptive Patterns and their Application to Characterizing Classification Errors,” 2021. [Online]. Available: https://arxiv.org/abs/2110.09599.

mehr

Abstract

State-of-the-art deep learning methods achieve human-like performance on many
tasks, but make errors nevertheless. Characterizing these errors in easily
interpretable terms gives insight into whether a model is prone to making
systematic errors, but also gives a way to act and improve the model. In this
paper we propose a method that allows us to do so for arbitrary classifiers by
mining a small set of patterns that together succinctly describe the input data
that is partitioned according to correctness of prediction. We show this is an
instance of the more general label description problem, which we formulate in
terms of the Minimum Description Length principle. To discover good pattern
sets we propose the efficient and hyperparameter-free Premise algorithm, which
through an extensive set of experiments we show on both synthetic and
real-world data performs very well in practice; unlike existing solutions it
ably recovers ground truth patterns, even on highly imbalanced data over many
unique items, or where patterns are only weakly associated to labels. Through
two real-world case studies we confirm that Premise gives clear and actionable
insight into the systematic errors made by modern NLP classifiers.

BibTeX

@online{Hedderich_arXiv2110.09599,
TITLE = {Label-Descriptive Patterns and their Application to Characterizing Classification Errors},
AUTHOR = {Hedderich, Michael and Fischer, Jonas and Klakow, Dietrich and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2110.09599},
EPRINT = {2110.09599},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {State-of-the-art deep learning methods achieve human-like performance on many<br>tasks, but make errors nevertheless. Characterizing these errors in easily<br>interpretable terms gives insight into whether a model is prone to making<br>systematic errors, but also gives a way to act and improve the model. In this<br>paper we propose a method that allows us to do so for arbitrary classifiers by<br>mining a small set of patterns that together succinctly describe the input data<br>that is partitioned according to correctness of prediction. We show this is an<br>instance of the more general label description problem, which we formulate in<br>terms of the Minimum Description Length principle. To discover good pattern<br>sets we propose the efficient and hyperparameter-free Premise algorithm, which<br>through an extensive set of experiments we show on both synthetic and<br>real-world data performs very well in practice; unlike existing solutions it<br>ably recovers ground truth patterns, even on highly imbalanced data over many<br>unique items, or where patterns are only weakly associated to labels. Through<br>two real-world case studies we confirm that Premise gives clear and actionable<br>insight into the systematic errors made by modern NLP classifiers.<br>},
}

Endnote

%0 Report
%A Hedderich, Michael
%A Fischer, Jonas
%A Klakow, Dietrich
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Label-Descriptive Patterns and their Application to Characterizing
  Classification Errors : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-B127-3
%U https://arxiv.org/abs/2110.09599
%D 2021
%X   State-of-the-art deep learning methods achieve human-like performance on many<br>tasks, but make errors nevertheless. Characterizing these errors in easily<br>interpretable terms gives insight into whether a model is prone to making<br>systematic errors, but also gives a way to act and improve the model. In this<br>paper we propose a method that allows us to do so for arbitrary classifiers by<br>mining a small set of patterns that together succinctly describe the input data<br>that is partitioned according to correctness of prediction. We show this is an<br>instance of the more general label description problem, which we formulate in<br>terms of the Minimum Description Length principle. To discover good pattern<br>sets we propose the efficient and hyperparameter-free Premise algorithm, which<br>through an extensive set of experiments we show on both synthetic and<br>real-world data performs very well in practice; unlike existing solutions it<br>ably recovers ground truth patterns, even on highly imbalanced data over many<br>unique items, or where patterns are only weakly associated to labels. Through<br>two real-world case studies we confirm that Premise gives clear and actionable<br>insight into the systematic errors made by modern NLP classifiers.<br>
%K Computer Science, Learning, cs.LG,Computer Science, Computation and Language, cs.CL

Paper

E. Heiter, J. Fischer, and J. Vreeken

“Factoring Out Prior Knowledge from Low-dimensional Embeddings,” 2021. [Online]. Available: https://arxiv.org/abs/2103.01828.

mehr

Abstract

Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing
high-dimensional data and therewith facilitate the discovery of interesting
structure. Although they are widely used, they visualize data as is, rather
than in light of the background knowledge we have about the data. What we
already know, however, strongly determines what is novel and hence interesting.
In this paper we propose two methods for factoring out prior knowledge in the
form of distance matrices from low-dimensional embeddings. To factor out prior
knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective
in a principled way using Jensen-Shannon divergence. To factor out prior
knowledge from any downstream embedding approach, we propose CONFETTI, in which
we directly operate on the input distance matrices. Extensive experiments on
both synthetic and real world data show that both methods work well, providing
embeddings that exhibit meaningful structure that would otherwise remain
hidden.

BibTeX

@online{heiter:21:factoring,
TITLE = {Factoring Out Prior Knowledge from Low-dimensional Embeddings},
AUTHOR = {Heiter, Edith and Fischer, Jonas and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2103.01828},
EPRINT = {2103.01828},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing<br>high-dimensional data and therewith facilitate the discovery of interesting<br>structure. Although they are widely used, they visualize data as is, rather<br>than in light of the background knowledge we have about the data. What we<br>already know, however, strongly determines what is novel and hence interesting.<br>In this paper we propose two methods for factoring out prior knowledge in the<br>form of distance matrices from low-dimensional embeddings. To factor out prior<br>knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective<br>in a principled way using Jensen-Shannon divergence. To factor out prior<br>knowledge from any downstream embedding approach, we propose CONFETTI, in which<br>we directly operate on the input distance matrices. Extensive experiments on<br>both synthetic and real world data show that both methods work well, providing<br>embeddings that exhibit meaningful structure that would otherwise remain<br>hidden.<br>},
}

Endnote

%0 Report
%A Heiter, Edith
%A Fischer, Jonas
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Factoring Out Prior Knowledge from Low-dimensional Embeddings : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-16ED-5
%U https://arxiv.org/abs/2103.01828
%D 2021
%X   Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing<br>high-dimensional data and therewith facilitate the discovery of interesting<br>structure. Although they are widely used, they visualize data as is, rather<br>than in light of the background knowledge we have about the data. What we<br>already know, however, strongly determines what is novel and hence interesting.<br>In this paper we propose two methods for factoring out prior knowledge in the<br>form of distance matrices from low-dimensional embeddings. To factor out prior<br>knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective<br>in a principled way using Jensen-Shannon divergence. To factor out prior<br>knowledge from any downstream embedding approach, we propose CONFETTI, in which<br>we directly operate on the input distance matrices. Extensive experiments on<br>both synthetic and real world data show that both methods work well, providing<br>embeddings that exhibit meaningful structure that would otherwise remain<br>hidden.<br>
%K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML

Conference paper

V. T. Ho, K. Pal, and G. Weikum

“QuTE: Answering Quantity Queries from Web Tables,” in SIGMOD ’21, International Conference on Management of Data, Xi’an, Shaanxi, China, 2021.

mehr

BibTeX

@inproceedings{Thinh_SIG21,
TITLE = {Qu{TE}: {A}nswering Quantity Queries from Web Tables},
AUTHOR = {Ho, Vinh Thinh and Pal, Koninika and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8343-1},
DOI = {10.1145/3448016.3452763},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGMOD '21, International Conference on Management of Data},
EDITOR = {Li, Guoliang and Li, Zhanhuai and Idreos, Stratos and Srivastava, Divesh},
PAGES = {2740--2744},
ADDRESS = {Xi'an, Shaanxi, China},
}

Endnote

%0 Conference Proceedings
%A Ho, Vinh Thinh
%A Pal, Koninika
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T QuTE: Answering Quantity Queries from Web Tables : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-052E-0
%R 10.1145/3448016.3452763
%D 2021
%B International Conference on Management of Data
%Z date of event: 2021-06-19 - 2021-06-25
%C Xi'an, Shaanxi, China
%B SIGMOD '21
%E Li, Guoliang; Li, Zhanhuai; Idreos, Stratos; Srivastava, Divesh
%P 2740 - 2744
%I ACM
%@ 978-1-4503-8343-1

Conference paper

V. T. Ho, K. Pal, S. Razniewski, K. Berberich, and G. Weikum

“Extracting Contextualized Quantity Facts from Web Tables,” in The Web Conference 2021 (WWW 2021), Ljubljana, Slovenia, 2021.

mehr

BibTeX

@inproceedings{Thinh_WWW21,
TITLE = {Extracting Contextualized Quantity Facts from Web Tables},
AUTHOR = {Ho, Vinh Thinh and Pal, Koninika and Razniewski, Simon and Berberich, Klaus and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8312-7},
DOI = {10.1145/3442381.3450072},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {The Web Conference 2021 (WWW 2021)},
EDITOR = {Leskovec, Jure and Grobelnik, Marko and Najork, Mark and Tang, Jie and Zia, Leila},
PAGES = {4033--4042},
ADDRESS = {Ljubljana, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Ho, Vinh Thinh
%A Pal, Koninika
%A Razniewski, Simon
%A Berberich, Klaus
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Extracting Contextualized Quantity Facts from Web Tables : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-04A0-E
%R 10.1145/3442381.3450072
%D 2021
%B 30th The Web Conference
%Z date of event: 2021-04-19 - 2021-04-23
%C Ljubljana, Slovenia
%B The Web Conference 2021
%E Leskovec, Jure; Grobelnik, Marko; Najork, Mark; Tang, Jie; Zia, Leila
%P 4033 - 4042
%I ACM
%@ 978-1-4503-8312-7

Paper

K. Hui and K. Berberich

“Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing,” 2021. [Online]. Available: https://arxiv.org/abs/2104.08926.

mehr

Abstract

Preference judgments have been demonstrated as a better alternative to graded
judgments to assess the relevance of documents relative to queries. Existing
work has verified transitivity among preference judgments when collected from
trained judges, which reduced the number of judgments dramatically. Moreover,
strict preference judgments and weak preference judgments, where the latter
additionally allow judges to state that two documents are equally relevant for
a given query, are both widely used in literature. However, whether
transitivity still holds when collected from crowdsourcing, i.e., whether the
two kinds of preference judgments behave similarly remains unclear. In this
work, we collect judgments from multiple judges using a crowdsourcing platform
and aggregate them to compare the two kinds of preference judgments in terms of
transitivity, time consumption, and quality. That is, we look into whether
aggregated judgments are transitive, how long it takes judges to make them, and
whether judges agree with each other and with judgments from TREC. Our key
findings are that only strict preference judgments are transitive. Meanwhile,
weak preference judgments behave differently in terms of transitivity, time
consumption, as well as of quality of judgment.

BibTeX

@online{Hui2104.08926,
TITLE = {Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing},
AUTHOR = {Hui, Kai and Berberich, Klaus},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2104.08926},
EPRINT = {2104.08926},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Preference judgments have been demonstrated as a better alternative to graded<br>judgments to assess the relevance of documents relative to queries. Existing<br>work has verified transitivity among preference judgments when collected from<br>trained judges, which reduced the number of judgments dramatically. Moreover,<br>strict preference judgments and weak preference judgments, where the latter<br>additionally allow judges to state that two documents are equally relevant for<br>a given query, are both widely used in literature. However, whether<br>transitivity still holds when collected from crowdsourcing, i.e., whether the<br>two kinds of preference judgments behave similarly remains unclear. In this<br>work, we collect judgments from multiple judges using a crowdsourcing platform<br>and aggregate them to compare the two kinds of preference judgments in terms of<br>transitivity, time consumption, and quality. That is, we look into whether<br>aggregated judgments are transitive, how long it takes judges to make them, and<br>whether judges agree with each other and with judgments from TREC. Our key<br>findings are that only strict preference judgments are transitive. Meanwhile,<br>weak preference judgments behave differently in terms of transitivity, time<br>consumption, as well as of quality of judgment.<br>},
}

Endnote

%0 Report
%A Hui, Kai
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-651A-9
%U https://arxiv.org/abs/2104.08926
%D 2021
%X   Preference judgments have been demonstrated as a better alternative to graded<br>judgments to assess the relevance of documents relative to queries. Existing<br>work has verified transitivity among preference judgments when collected from<br>trained judges, which reduced the number of judgments dramatically. Moreover,<br>strict preference judgments and weak preference judgments, where the latter<br>additionally allow judges to state that two documents are equally relevant for<br>a given query, are both widely used in literature. However, whether<br>transitivity still holds when collected from crowdsourcing, i.e., whether the<br>two kinds of preference judgments behave similarly remains unclear. In this<br>work, we collect judgments from multiple judges using a crowdsourcing platform<br>and aggregate them to compare the two kinds of preference judgments in terms of<br>transitivity, time consumption, and quality. That is, we look into whether<br>aggregated judgments are transitive, how long it takes judges to make them, and<br>whether judges agree with each other and with judgments from TREC. Our key<br>findings are that only strict preference judgments are transitive. Meanwhile,<br>weak preference judgments behave differently in terms of transitivity, time<br>consumption, as well as of quality of judgment.<br>
%K Computer Science, Information Retrieval, cs.IR

Paper

Z. Jia, S. Pramanik, R. Saha Roy, and G. Weikum

“Complex Temporal Question Answering on Knowledge Graphs,” 2021. [Online]. Available: https://arxiv.org/abs/2109.08935.

mehr

Abstract

Question answering over knowledge graphs (KG-QA) is a vital topic in IR.
Questions with temporal intent are a special class of practical importance, but
have not received much attention in research. This work presents EXAQT, the
first end-to-end system for answering complex temporal questions that have
multiple entities and predicates, and associated temporal conditions. EXAQT
answers natural language questions over KGs in two stages, one geared towards
high recall, the other towards precision at top ranks. The first step computes
question-relevant compact subgraphs within the KG, and judiciously enhances
them with pertinent temporal facts, using Group Steiner Trees and fine-tuned
BERT models. The second step constructs relational graph convolutional networks
(R-GCNs) from the first step's output, and enhances the R-GCNs with time-aware
entity embeddings and attention over temporal relations. We evaluate EXAQT on
TimeQuestions, a large dataset of 16k temporal questions we compiled from a
variety of general purpose KG-QA benchmarks. Results show that EXAQT
outperforms three state-of-the-art systems for answering complex questions over
KGs, thereby justifying specialized treatment of temporal QA.

BibTeX

@online{Jia2109.08935,
TITLE = {Complex Temporal Question Answering on Knowledge Graphs},
AUTHOR = {Jia, Zhen and Pramanik, Soumajit and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2109.08935},
EPRINT = {2109.08935},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Question answering over knowledge graphs (KG-QA) is a vital topic in IR.<br>Questions with temporal intent are a special class of practical importance, but<br>have not received much attention in research. This work presents EXAQT, the<br>first end-to-end system for answering complex temporal questions that have<br>multiple entities and predicates, and associated temporal conditions. EXAQT<br>answers natural language questions over KGs in two stages, one geared towards<br>high recall, the other towards precision at top ranks. The first step computes<br>question-relevant compact subgraphs within the KG, and judiciously enhances<br>them with pertinent temporal facts, using Group Steiner Trees and fine-tuned<br>BERT models. The second step constructs relational graph convolutional networks<br>(R-GCNs) from the first step's output, and enhances the R-GCNs with time-aware<br>entity embeddings and attention over temporal relations. We evaluate EXAQT on<br>TimeQuestions, a large dataset of 16k temporal questions we compiled from a<br>variety of general purpose KG-QA benchmarks. Results show that EXAQT<br>outperforms three state-of-the-art systems for answering complex questions over<br>KGs, thereby justifying specialized treatment of temporal QA.<br>},
}

Endnote

%0 Report
%A Jia, Zhen
%A Pramanik, Soumajit
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Complex Temporal Question Answering on Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-64F7-0
%U https://arxiv.org/abs/2109.08935
%D 2021
%X   Question answering over knowledge graphs (KG-QA) is a vital topic in IR.<br>Questions with temporal intent are a special class of practical importance, but<br>have not received much attention in research. This work presents EXAQT, the<br>first end-to-end system for answering complex temporal questions that have<br>multiple entities and predicates, and associated temporal conditions. EXAQT<br>answers natural language questions over KGs in two stages, one geared towards<br>high recall, the other towards precision at top ranks. The first step computes<br>question-relevant compact subgraphs within the KG, and judiciously enhances<br>them with pertinent temporal facts, using Group Steiner Trees and fine-tuned<br>BERT models. The second step constructs relational graph convolutional networks<br>(R-GCNs) from the first step's output, and enhances the R-GCNs with time-aware<br>entity embeddings and attention over temporal relations. We evaluate EXAQT on<br>TimeQuestions, a large dataset of 16k temporal questions we compiled from a<br>variety of general purpose KG-QA benchmarks. Results show that EXAQT<br>outperforms three state-of-the-art systems for answering complex questions over<br>KGs, thereby justifying specialized treatment of temporal QA.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

Z. Jia, S. Pramanik, R. Saha Roy, and G. Weikum

“Complex Temporal Question Answering on Knowledge Graphs,” in CIKM ’21, 30th ACM International Conference on Information & Knowledge Management, Virtual Event, Australia, 2021.

mehr

BibTeX

@inproceedings{jia2021complex,
TITLE = {Complex Temporal Question Answering on Knowledge Graphs},
AUTHOR = {Jia, Zhen and Pramanik, Soumajit and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8446-9},
DOI = {10.1145/3459637.3482416},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {CIKM '21, 30th ACM International Conference on Information \& Knowledge Management},
EDITOR = {Demartini, Gianluca and Zuccon, Guido and Culpepper, J. Shane and Huang, Zi and Tong, Hanghang},
PAGES = {792--802},
ADDRESS = {Virtual Event, Australia},
}

Endnote

%0 Conference Proceedings
%A Jia, Zhen
%A Pramanik, Soumajit
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Complex Temporal Question Answering on Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-A3A2-4
%R 10.1145/3459637.3482416
%D 2021
%B 30th ACM International Conference on Information & Knowledge Management
%Z date of event: 2021-11-01 - 2021-11-05
%C Virtual Event, Australia
%B CIKM '21
%E Demartini, Gianluca; Zuccon, Guido; Culpepper, J. Shane; Huang, Zi; Tong, Hanghang
%P 792 - 802
%I ACM
%@ 978-1-4503-8446-9

Thesis

K. M. Jose

“Improving Efficiency of Dense Retrieval Methods with Query Expansion,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

BibTeX

@mastersthesis{JoseMSc21,
TITLE = {Improving Efficiency of Dense Retrieval Methods with Query Expansion},
AUTHOR = {Jose, Kevin Martin},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
}

Endnote

%0 Thesis
%A Jose, Kevin Martin
%Y Yates, Andrew
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Improving Efficiency of Dense Retrieval Methods with Query Expansion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-17AB-9
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P X, 51 p.
%V master
%9 master

Conference paper

K. M. Jose, T. Nguyen, S. MacAvaney, J. Dalton, and A. Yates

“DiffIR: Exploring Differences in Ranking Models’ Behavior,” in SIGIR ’21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021.

mehr

BibTeX

@inproceedings{Jose_SIGIR21,
TITLE = {{DiffIR}: {E}xploring Differences in Ranking Models' Behavior},
AUTHOR = {Jose, Kevin Martin and Nguyen, Thong and MacAvaney, Sean and Dalton, Jeffrey and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-1-4503-8037-9},
DOI = {10.1145/3404835.3462784},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGIR '21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Diaz, Fernando and Shah, Chirag and Suel, Torsten and Castells, Pablo and Jones, Rosie and Sakai, Tetsuya and Bellog{\'i}n, Alejandro and Yushioka, Massaharu},
PAGES = {2595--2599},
ADDRESS = {Virtual Event, Canada},
}

Endnote

%0 Conference Proceedings
%A Jose, Kevin Martin
%A Nguyen, Thong
%A MacAvaney, Sean
%A Dalton, Jeffrey
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T DiffIR: Exploring Differences in Ranking Models' Behavior : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-666D-B
%R 10.1145/3404835.3462784
%D 2021
%B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2021-07-11 - 2021-07-15
%C Virtual Event, Canada
%B SIGIR '21
%E Diaz, Fernando; Shah, Chirag; Suel, Torsten; Castells, Pablo; Jones, Rosie; Sakai, Tetsuya; Bellog&#237;n, Alejandro; Yushioka, Massaharu
%P 2595 - 2599
%I ACM
%@ 978-1-4503-8037-9

Conference paper

M. Kaiser, R. Saha Roy, and G. Weikum

“Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs,” in SIGIR ’21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021.

mehr

BibTeX

@inproceedings{kaiser2021reinforcement,
TITLE = {Reinforcement Learning from Reformulations in~Conversational Question Answering over Knowledge Graphs},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8037-9},
DOI = {10.1145/3404835.3462859},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGIR '21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Diaz, Fernando and Shah, Chirag and Suel, Torsten and Castells, Pablo and Jones, Rosie and Sakai, Tetsuya and Bellog{\'i}n, Alejandro and Yushioka, Massaharu},
PAGES = {459--469},
ADDRESS = {Virtual Event, Canada},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Reinforcement Learning from Reformulations in&#160;Conversational Question Answering over Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-513E-8
%R 10.1145/3404835.3462859
%D 2021
%B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2021-07-11 - 2021-07-15
%C Virtual Event, Canada
%B SIGIR '21
%E Diaz, Fernando; Shah, Chirag; Suel, Torsten; Castells, Pablo; Jones, Rosie; Sakai, Tetsuya; Bellog&#237;n, Alejandro; Yushioka, Massaharu
%P 459 - 469
%I ACM
%@ 978-1-4503-8037-9

Paper

M. Kaiser, R. Saha Roy, and G. Weikum

“Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs,” 2021. [Online]. Available: https://arxiv.org/abs/2105.04850.

mehr

Abstract

The rise of personal assistants has made conversational question answering
(ConvQA) a very popular mechanism for user-system interaction. State-of-the-art
methods for ConvQA over knowledge graphs (KGs) can only learn from crisp
question-answer pairs found in popular benchmarks. In reality, however, such
training data is hard to come by: users would rarely mark answers explicitly as
correct or wrong. In this work, we take a step towards a more natural learning
paradigm - from noisy and implicit feedback via question reformulations. A
reformulation is likely to be triggered by an incorrect system response,
whereas a new follow-up question could be a positive signal on the previous
turn's answer. We present a reinforcement learning model, termed CONQUER, that
can learn from a conversational stream of questions and reformulations. CONQUER
models the answering process as multiple agents walking in parallel on the KG,
where the walks are determined by actions sampled using a policy network. This
policy network takes the question along with the conversational context as
inputs and is trained via noisy rewards obtained from the reformulation
likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark
with about 11k natural conversations containing around 205k reformulations.
Experiments show that CONQUER successfully learns to answer conversational
questions from noisy reward signals, significantly improving over a
state-of-the-art baseline.

BibTeX

@online{Kaiser_2105.04850,
TITLE = {Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2105.04850},
EPRINT = {2105.04850},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {The rise of personal assistants has made conversational question answering<br>(ConvQA) a very popular mechanism for user-system interaction. State-of-the-art<br>methods for ConvQA over knowledge graphs (KGs) can only learn from crisp<br>question-answer pairs found in popular benchmarks. In reality, however, such<br>training data is hard to come by: users would rarely mark answers explicitly as<br>correct or wrong. In this work, we take a step towards a more natural learning<br>paradigm -- from noisy and implicit feedback via question reformulations. A<br>reformulation is likely to be triggered by an incorrect system response,<br>whereas a new follow-up question could be a positive signal on the previous<br>turn's answer. We present a reinforcement learning model, termed CONQUER, that<br>can learn from a conversational stream of questions and reformulations. CONQUER<br>models the answering process as multiple agents walking in parallel on the KG,<br>where the walks are determined by actions sampled using a policy network. This<br>policy network takes the question along with the conversational context as<br>inputs and is trained via noisy rewards obtained from the reformulation<br>likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark<br>with about 11k natural conversations containing around 205k reformulations.<br>Experiments show that CONQUER successfully learns to answer conversational<br>questions from noisy reward signals, significantly improving over a<br>state-of-the-art baseline.<br>},
}

Endnote

%0 Report
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-67C9-1
%U https://arxiv.org/abs/2105.04850
%D 2021
%X   The rise of personal assistants has made conversational question answering<br>(ConvQA) a very popular mechanism for user-system interaction. State-of-the-art<br>methods for ConvQA over knowledge graphs (KGs) can only learn from crisp<br>question-answer pairs found in popular benchmarks. In reality, however, such<br>training data is hard to come by: users would rarely mark answers explicitly as<br>correct or wrong. In this work, we take a step towards a more natural learning<br>paradigm - from noisy and implicit feedback via question reformulations. A<br>reformulation is likely to be triggered by an incorrect system response,<br>whereas a new follow-up question could be a positive signal on the previous<br>turn's answer. We present a reinforcement learning model, termed CONQUER, that<br>can learn from a conversational stream of questions and reformulations. CONQUER<br>models the answering process as multiple agents walking in parallel on the KG,<br>where the walks are determined by actions sampled using a policy network. This<br>policy network takes the question along with the conversational context as<br>inputs and is trained via noisy rewards obtained from the reformulation<br>likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark<br>with about 11k natural conversations containing around 205k reformulations.<br>Experiments show that CONQUER successfully learns to answer conversational<br>questions from noisy reward signals, significantly improving over a<br>state-of-the-art baseline.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

J. Kalofolias, P. Welke, and J. Vreeken

“SUSAN: The Structural Similarity Random Walk Kernel,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2021), Virtual Conference, 2021.

mehr

BibTeX

@inproceedings{kalofolias:21:susan,
TITLE = {{SUSAN}: The Structural Similarity Random Walk Kernel},
AUTHOR = {Kalofolias, Janis and Welke, Pascal and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-670-0},
DOI = {10.1137/1.9781611976700.34},
PUBLISHER = {SIAM},
YEAR = {2021},
BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2021)},
EDITOR = {Demeniconi, Carlotta and Davidson, Ian},
PAGES = {298--306},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Kalofolias, Janis
%A Welke, Pascal
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T SUSAN: The Structural Similarity Random Walk Kernel : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-26C9-B
%R 10.1137/1.9781611976700.34
%D 2021
%B SIAM International Conference on Data Mining
%Z date of event: 2021-04-29 - 2021-05-01
%C Virtual Conference
%B Proceedings of the SIAM International Conference on Data Mining
%E Demeniconi, Carlotta; Davidson, Ian
%P 298 - 306
%I SIAM
%@ 978-1-61197-670-0

Paper

M. Kamp, J. Fischer, and J. Vreeken

“Federated Learning from Small Datasets,” 2021. [Online]. Available: https://arxiv.org/abs/2110.03469.

mehr

Abstract

Federated learning allows multiple parties to collaboratively train a joint
model without sharing local data. This enables applications of machine learning
in settings of inherently distributed, undisclosable data such as in the
medical domain. In practice, joint training is usually achieved by aggregating
local models, for which local training objectives have to be in expectation
similar to the joint (global) objective. Often, however, local datasets are so
small that local objectives differ greatly from the global objective, resulting
in federated learning to fail. We propose a novel approach that intertwines
model aggregations with permutations of local models. The permutations expose
each local model to a daisy chain of local datasets resulting in more efficient
training in data-sparse domains. This enables training on extremely small local
datasets, such as patient data across hospitals, while retaining the training
efficiency and privacy benefits of federated learning.

BibTeX

@online{Kamp2110.03469,
TITLE = {Federated Learning from Small Datasets},
AUTHOR = {Kamp, Michael and Fischer, Jonas and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2110.03469},
EPRINT = {2110.03469},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Federated learning allows multiple parties to collaboratively train a joint<br>model without sharing local data. This enables applications of machine learning<br>in settings of inherently distributed, undisclosable data such as in the<br>medical domain. In practice, joint training is usually achieved by aggregating<br>local models, for which local training objectives have to be in expectation<br>similar to the joint (global) objective. Often, however, local datasets are so<br>small that local objectives differ greatly from the global objective, resulting<br>in federated learning to fail. We propose a novel approach that intertwines<br>model aggregations with permutations of local models. The permutations expose<br>each local model to a daisy chain of local datasets resulting in more efficient<br>training in data-sparse domains. This enables training on extremely small local<br>datasets, such as patient data across hospitals, while retaining the training<br>efficiency and privacy benefits of federated learning.<br>},
}

Endnote

%0 Report
%A Kamp, Michael
%A Fischer, Jonas
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Federated Learning from Small Datasets : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-653B-4
%U https://arxiv.org/abs/2110.03469
%D 2021
%X   Federated learning allows multiple parties to collaboratively train a joint<br>model without sharing local data. This enables applications of machine learning<br>in settings of inherently distributed, undisclosable data such as in the<br>medical domain. In practice, joint training is usually achieved by aggregating<br>local models, for which local training objectives have to be in expectation<br>similar to the joint (global) objective. Often, however, local datasets are so<br>small that local objectives differ greatly from the global objective, resulting<br>in federated learning to fail. We propose a novel approach that intertwines<br>model aggregations with permutations of local models. The permutations expose<br>each local model to a daisy chain of local datasets resulting in more efficient<br>training in data-sparse domains. This enables training on extremely small local<br>datasets, such as patient data across hospitals, while retaining the training<br>efficiency and privacy benefits of federated learning.<br>
%K Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Distributed, Parallel, and Cluster Computing, cs.DC

Paper

P. Lahoti, K. Gummadi, and G. Weikum

“Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning,” 2021. [Online]. Available: https://arxiv.org/abs/2109.04432.

mehr

Abstract

Reliably predicting potential failure risks of machine learning (ML) systems
when deployed with production data is a crucial aspect of trustworthy AI. This
paper introduces Risk Advisor, a novel post-hoc meta-learner for estimating
failure risks and predictive uncertainties of any already-trained black-box
classification model. In addition to providing a risk score, the Risk Advisor
decomposes the uncertainty estimates into aleatoric and epistemic uncertainty
components, thus giving informative insights into the sources of uncertainty
inducing the failures. Consequently, Risk Advisor can distinguish between
failures caused by data variability, data shifts and model limitations and
advise on mitigation actions (e.g., collecting more data to counter data
shift). Extensive experiments on various families of black-box classification
models and on real-world and synthetic datasets covering common ML failure
scenarios show that the Risk Advisor reliably predicts deployment-time failure
risks in all the scenarios, and outperforms strong baselines.

BibTeX

@online{Lahoti2109.04432,
TITLE = {Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning},
AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2109.04432},
EPRINT = {2109.04432},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Reliably predicting potential failure risks of machine learning (ML) systems<br>when deployed with production data is a crucial aspect of trustworthy AI. This<br>paper introduces Risk Advisor, a novel post-hoc meta-learner for estimating<br>failure risks and predictive uncertainties of any already-trained black-box<br>classification model. In addition to providing a risk score, the Risk Advisor<br>decomposes the uncertainty estimates into aleatoric and epistemic uncertainty<br>components, thus giving informative insights into the sources of uncertainty<br>inducing the failures. Consequently, Risk Advisor can distinguish between<br>failures caused by data variability, data shifts and model limitations and<br>advise on mitigation actions (e.g., collecting more data to counter data<br>shift). Extensive experiments on various families of black-box classification<br>models and on real-world and synthetic datasets covering common ML failure<br>scenarios show that the Risk Advisor reliably predicts deployment-time failure<br>risks in all the scenarios, and outperforms strong baselines.<br>},
}

Endnote

%0 Report
%A Lahoti, Preethi
%A Gummadi, Krishna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Detecting and Mitigating Test-time Failure Risks via Model-agnostic
  Uncertainty Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6491-2
%U https://arxiv.org/abs/2109.04432
%D 2021
%X   Reliably predicting potential failure risks of machine learning (ML) systems<br>when deployed with production data is a crucial aspect of trustworthy AI. This<br>paper introduces Risk Advisor, a novel post-hoc meta-learner for estimating<br>failure risks and predictive uncertainties of any already-trained black-box<br>classification model. In addition to providing a risk score, the Risk Advisor<br>decomposes the uncertainty estimates into aleatoric and epistemic uncertainty<br>components, thus giving informative insights into the sources of uncertainty<br>inducing the failures. Consequently, Risk Advisor can distinguish between<br>failures caused by data variability, data shifts and model limitations and<br>advise on mitigation actions (e.g., collecting more data to counter data<br>shift). Extensive experiments on various families of black-box classification<br>models and on real-world and synthetic datasets covering common ML failure<br>scenarios show that the Risk Advisor reliably predicts deployment-time failure<br>risks in all the scenarios, and outperforms strong baselines.<br>
%K Computer Science, Learning, cs.LG,Computer Science, Information Retrieval, cs.IR,Statistics, Machine Learning, stat.ML

Book

J. Lin, R. Nogueira, and A. Yates

Pretrained Transformers for Text Ranking : BERT and Beyond. San Rafael, CA: Morgan & Claypool Publishers, 2021.

mehr

BibTeX

@book{DBLP:series/synthesis/2021LinNY,
TITLE = {Pretrained Transformers for Text Ranking : {BERT} and Beyond},
AUTHOR = {Lin, Jimmy and Nogueira, Rodrigo and Yates, Andrew},
LANGUAGE = {eng},
ISSN = {1947-4040},
ISBN = {978-1-63639-228-8; 978-1-63639-230-1},
DOI = {10.2200/S01123ED1V01Y202108HLT053},
PUBLISHER = {Morgan \& Claypool Publishers},
ADDRESS = {San Rafael, CA},
YEAR = {2021},
DATE = {2021},
PAGES = {XVII, 307},
SERIES = {Synthesis Lectures on Human Language Technologies},
VOLUME = {53},
}

Endnote

%0 Book
%A Lin, Jimmy
%A Nogueira, Rodrigo
%A Yates, Andrew
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Pretrained Transformers for Text Ranking : BERT and Beyond : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-FE79-F
%@ 978-1-63639-228-8
%@ 978-1-63639-230-1
%R 10.2200/S01123ED1V01Y202108HLT053
%I Morgan & Claypool Publishers
%C San Rafael, CA
%D 2021
%P XVII, 307
%B Synthesis Lectures on Human Language Technologies
%N 53
%@ false

Conference paper

S. MacAvaney, A. Yates, S. Feldman, D. Downey, A. Cohan, and N. Goharian

“Simplified Data Wrangling with ir_datasets,” in SIGIR ’21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021.

mehr

BibTeX

@inproceedings{MacAvaney_SIGIR21,
TITLE = {Simplified Data Wrangling with ir{\textunderscore}datasets},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Feldman, Sergey and Downey, Doug and Cohan, Arman and Goharian, Nazli},
LANGUAGE = {eng},
ISBN = {978-1-4503-8037-9},
DOI = {10.1145/3404835.3463254},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGIR '21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Diaz, Fernando and Shah, Chirag and Suel, Torsten and Castells, Pablo and Jones, Rosie and Sakai, Tetsuya and Bellog{\'i}n, Alejandro and Yushioka, Massaharu},
PAGES = {2429--2436},
ADDRESS = {Virtual Event, Canada},
}

Endnote

%0 Conference Proceedings
%A MacAvaney, Sean
%A Yates, Andrew
%A Feldman, Sergey
%A Downey, Doug
%A Cohan, Arman
%A Goharian, Nazli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Simplified Data Wrangling with ir_datasets : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-665F-B
%R 10.1145/3404835.3463254
%D 2021
%B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2021-07-11 - 2021-07-15
%C Virtual Event, Canada
%B SIGIR '21
%E Diaz, Fernando; Shah, Chirag; Suel, Torsten; Castells, Pablo; Jones, Rosie; Sakai, Tetsuya; Bellog&#237;n, Alejandro; Yushioka, Massaharu
%P 2429 - 2436
%I ACM
%@ 978-1-4503-8037-9

Paper

S. MacAvaney, A. Yates, S. Feldman, D. Downey, A. Cohan, and N. Goharian

“Simplified Data Wrangling with ir_datasets,” 2021. [Online]. Available: https://arxiv.org/abs/2103.02280.

mehr

Abstract

Managing the data for Information Retrieval (IR) experiments can be
challenging. Dataset documentation is scattered across the Internet and once
one obtains a copy of the data, there are numerous different data formats to
work with. Even basic formats can have subtle dataset-specific nuances that
need to be considered for proper use. To help mitigate these challenges, we
introduce a new robust and lightweight tool (ir_datasets) for acquiring,
managing, and performing typical operations over datasets used in IR. We
primarily focus on textual datasets used for ad-hoc search. This tool provides
both a Python and command line interface to numerous IR datasets and
benchmarks. To our knowledge, this is the most extensive tool of its kind.
Integrations with popular IR indexing and experimentation toolkits demonstrate
the tool's utility. We also provide documentation of these datasets through the
ir_datasets catalog: ir-datasets.com. The catalog acts as a hub for
information on datasets used in IR, providing core information about what data
each benchmark provides as well as links to more detailed information. We
welcome community contributions and intend to continue to maintain and grow
this tool.

BibTeX

@online{MacAvaney_2103.02280,
TITLE = {Simplified Data Wrangling with ir{\textunderscore}datasets},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Feldman, Sergey and Downey, Doug and Cohan, Arman and Goharian, Nazli},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2103.02280},
EPRINT = {2103.02280},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Managing the data for Information Retrieval (IR) experiments can be<br>challenging. Dataset documentation is scattered across the Internet and once<br>one obtains a copy of the data, there are numerous different data formats to<br>work with. Even basic formats can have subtle dataset-specific nuances that<br>need to be considered for proper use. To help mitigate these challenges, we<br>introduce a new robust and lightweight tool (ir_datasets) for acquiring,<br>managing, and performing typical operations over datasets used in IR. We<br>primarily focus on textual datasets used for ad-hoc search. This tool provides<br>both a Python and command line interface to numerous IR datasets and<br>benchmarks. To our knowledge, this is the most extensive tool of its kind.<br>Integrations with popular IR indexing and experimentation toolkits demonstrate<br>the tool's utility. We also provide documentation of these datasets through the<br>ir_datasets catalog: https://ir-datasets.com/. The catalog acts as a hub for<br>information on datasets used in IR, providing core information about what data<br>each benchmark provides as well as links to more detailed information. We<br>welcome community contributions and intend to continue to maintain and grow<br>this tool.<br>},
}

Endnote

%0 Report
%A MacAvaney, Sean
%A Yates, Andrew
%A Feldman, Sergey
%A Downey, Doug
%A Cohan, Arman
%A Goharian, Nazli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Simplified Data Wrangling with ir_datasets : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6679-D
%U https://arxiv.org/abs/2103.02280
%D 2021
%X   Managing the data for Information Retrieval (IR) experiments can be<br>challenging. Dataset documentation is scattered across the Internet and once<br>one obtains a copy of the data, there are numerous different data formats to<br>work with. Even basic formats can have subtle dataset-specific nuances that<br>need to be considered for proper use. To help mitigate these challenges, we<br>introduce a new robust and lightweight tool (ir_datasets) for acquiring,<br>managing, and performing typical operations over datasets used in IR. We<br>primarily focus on textual datasets used for ad-hoc search. This tool provides<br>both a Python and command line interface to numerous IR datasets and<br>benchmarks. To our knowledge, this is the most extensive tool of its kind.<br>Integrations with popular IR indexing and experimentation toolkits demonstrate<br>the tool's utility. We also provide documentation of these datasets through the<br>ir_datasets catalog: https://ir-datasets.com/. The catalog acts as a hub for<br>information on datasets used in IR, providing core information about what data<br>each benchmark provides as well as links to more detailed information. We<br>welcome community contributions and intend to continue to maintain and grow<br>this tool.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Paper

I. Mackie, J. Dalton, and A. Yates

“How Deep is your Learning: The DL-HARD Annotated Deep Learning Dataset,” 2021. [Online]. Available: https://arxiv.org/abs/2105.07975.

mehr

Abstract

Deep Learning Hard (DL-HARD) is a new annotated dataset designed to more
effectively evaluate neural ranking models on complex topics. It builds on TREC
Deep Learning (DL) topics by extensively annotating them with question intent
categories, answer types, wikified entities, topic categories, and result type
metadata from a commercial web search engine. Based on this data, we introduce
a framework for identifying challenging queries. DL-HARD contains fifty topics
from the official DL 2019/2020 evaluation benchmark, half of which are newly
and independently assessed. We perform experiments using the official submitted
runs to DL on DL-HARD and find substantial differences in metrics and the
ranking of participating systems. Overall, DL-HARD is a new resource that
promotes research on neural ranking methods by focusing on challenging and
complex topics.

BibTeX

@online{Mackie_2105.07975,
TITLE = {How Deep is your Learning: The {DL}-{HARD} Annotated Deep Learning Dataset},
AUTHOR = {Mackie, Iain and Dalton, Jeffery and Yates, Andrew},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2105.07975},
EPRINT = {2105.07975},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Deep Learning Hard (DL-HARD) is a new annotated dataset designed to more<br>effectively evaluate neural ranking models on complex topics. It builds on TREC<br>Deep Learning (DL) topics by extensively annotating them with question intent<br>categories, answer types, wikified entities, topic categories, and result type<br>metadata from a commercial web search engine. Based on this data, we introduce<br>a framework for identifying challenging queries. DL-HARD contains fifty topics<br>from the official DL 2019/2020 evaluation benchmark, half of which are newly<br>and independently assessed. We perform experiments using the official submitted<br>runs to DL on DL-HARD and find substantial differences in metrics and the<br>ranking of participating systems. Overall, DL-HARD is a new resource that<br>promotes research on neural ranking methods by focusing on challenging and<br>complex topics.<br>},
}

Endnote

%0 Report
%A Mackie, Iain
%A Dalton, Jeffery
%A Yates, Andrew
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T How Deep is your Learning: The DL-HARD Annotated Deep Learning Dataset : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-67AB-3
%U https://arxiv.org/abs/2105.07975
%D 2021
%X   Deep Learning Hard (DL-HARD) is a new annotated dataset designed to more<br>effectively evaluate neural ranking models on complex topics. It builds on TREC<br>Deep Learning (DL) topics by extensively annotating them with question intent<br>categories, answer types, wikified entities, topic categories, and result type<br>metadata from a commercial web search engine. Based on this data, we introduce<br>a framework for identifying challenging queries. DL-HARD contains fifty topics<br>from the official DL 2019/2020 evaluation benchmark, half of which are newly<br>and independently assessed. We perform experiments using the official submitted<br>runs to DL on DL-HARD and find substantial differences in metrics and the<br>ranking of participating systems. Overall, DL-HARD is a new resource that<br>promotes research on neural ranking methods by focusing on challenging and<br>complex topics.<br>
%K Computer Science, Information Retrieval, cs.IR

Conference paper

I. Mackie, J. Dalton, and A. Yates

“How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset,” in SIGIR ’21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021.

mehr

BibTeX

@inproceedings{Mackie_SIGIR21,
TITLE = {How Deep is your Learning: {T}he {DL}-{HARD} Annotated Deep Learning Dataset},
AUTHOR = {Mackie, Iain and Dalton, Jeffrey and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-1-4503-8037-9},
DOI = {10.1145/3404835.3463262},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGIR '21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Diaz, Fernando and Shah, Chirag and Suel, Torsten and Castells, Pablo and Jones, Rosie and Sakai, Tetsuya and Bellog{\'i}n, Alejandro and Yushioka, Massaharu},
PAGES = {2335--2341},
ADDRESS = {Virtual Event, Canada},
}

Endnote

%0 Conference Proceedings
%A Mackie, Iain
%A Dalton, Jeffrey
%A Yates, Andrew
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6640-C
%R 10.1145/3404835.3463262
%D 2021
%B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2021-07-11 - 2021-07-15
%C Virtual Event, Canada
%B SIGIR '21
%E Diaz, Fernando; Shah, Chirag; Suel, Torsten; Castells, Pablo; Jones, Rosie; Sakai, Tetsuya; Bellog&#237;n, Alejandro; Yushioka, Massaharu
%P 2335 - 2341
%I ACM
%@ 978-1-4503-8037-9

Thesis

D5IMPR-CS

P. Mandros

“Discovering robust dependencies from data,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

BibTeX

@phdthesis{Panphd2020,
TITLE = {Discovering robust dependencies from data},
AUTHOR = {Mandros, Panagiotis},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-342919},
DOI = {10.22028/D291-34291},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
}

Endnote

%0 Thesis
%A Mandros, Panagiotis
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%A referee: Webb, Geoffrey
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Discovering robust dependencies from data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-E4CF-E
%R 10.22028/D291-34291 
%U urn:nbn:de:bsz:291--ds-342919
%F OTHER: hdl:20.500.11880/31535
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 194 p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/31535

Thesis

D5IMPR-CS

A. Marx

“Information-Theoretic Causal Discovery,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

BibTeX

@phdthesis{Marxphd2020,
TITLE = {Information-Theoretic Causal Discovery},
AUTHOR = {Marx, Alexander},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-342908},
DOI = {10.22028/D291-34290},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
}

Endnote

%0 Thesis
%A Marx, Alexander
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%A referee: Ommen, Thijs van
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Information-Theoretic Causal Discovery : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-EECA-9
%R 10.22028/D291-34290
%U urn:nbn:de:bsz:291--ds-342908
%F OTHER: hdl:20.500.11880/31480
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 195 p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/31480

Paper

A. Marx, A. Gretton, and J. M. Mooij

“A Weaker Faithfulness Assumption based on Triple Interactions,” 2021. [Online]. Available: https://arxiv.org/abs/2010.14265.

mehr

Abstract

One of the core assumptions in causal discovery is the faithfulness
assumption---i.e. assuming that independencies found in the data are due to
separations in the true causal graph. This assumption can, however, be violated
in many ways, including xor connections, deterministic functions or cancelling
paths. In this work, we propose a weaker assumption that we call 2-adjacency
faithfulness. In contrast to adjacency faithfulness, which assumes that there
is no conditional independence between each pair of variables that are
connected in the causal graph, we only require no conditional independence
between a node and a subset of its Markov blanket that can contain up to two
nodes. Equivalently, we adapt orientation faithfulness to this setting. We
further propose a sound orientation rule for causal discovery that applies
under weaker assumptions. As a proof of concept, we derive a modified Grow and
Shrink algorithm that recovers the Markov blanket of a target node and prove
its correctness under strictly weaker assumptions than the standard
faithfulness assumption.

BibTeX

@online{Marxarxiv21,
TITLE = {A Weaker Faithfulness Assumption based on Triple Interactions},
AUTHOR = {Marx, Alexander and Gretton, Arthur and Mooij, Joris M.},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2010.14265},
EPRINT = {2010.14265},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {One of the core assumptions in causal discovery is the faithfulness<br>assumption---i.e. assuming that independencies found in the data are due to<br>separations in the true causal graph. This assumption can, however, be violated<br>in many ways, including xor connections, deterministic functions or cancelling<br>paths. In this work, we propose a weaker assumption that we call 2-adjacency<br>faithfulness. In contrast to adjacency faithfulness, which assumes that there<br>is no conditional independence between each pair of variables that are<br>connected in the causal graph, we only require no conditional independence<br>between a node and a subset of its Markov blanket that can contain up to two<br>nodes. Equivalently, we adapt orientation faithfulness to this setting. We<br>further propose a sound orientation rule for causal discovery that applies<br>under weaker assumptions. As a proof of concept, we derive a modified Grow and<br>Shrink algorithm that recovers the Markov blanket of a target node and prove<br>its correctness under strictly weaker assumptions than the standard<br>faithfulness assumption.<br>},
}

Endnote

%0 Report
%A Marx, Alexander
%A Gretton, Arthur
%A Mooij, Joris M.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T A Weaker Faithfulness Assumption based on Triple Interactions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0BCE-5
%U https://arxiv.org/abs/2010.14265
%D 2021
%X   One of the core assumptions in causal discovery is the faithfulness<br>assumption---i.e. assuming that independencies found in the data are due to<br>separations in the true causal graph. This assumption can, however, be violated<br>in many ways, including xor connections, deterministic functions or cancelling<br>paths. In this work, we propose a weaker assumption that we call 2-adjacency<br>faithfulness. In contrast to adjacency faithfulness, which assumes that there<br>is no conditional independence between each pair of variables that are<br>connected in the causal graph, we only require no conditional independence<br>between a node and a subset of its Markov blanket that can contain up to two<br>nodes. Equivalently, we adapt orientation faithfulness to this setting. We<br>further propose a sound orientation rule for causal discovery that applies<br>under weaker assumptions. As a proof of concept, we derive a modified Grow and<br>Shrink algorithm that recovers the Markov blanket of a target node and prove<br>its correctness under strictly weaker assumptions than the standard<br>faithfulness assumption.<br>
%K Statistics, Machine Learning, stat.ML,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG

Conference paper

A. Marx, L. Yang, and M. van Leeuwen

“Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multidimensional Adaptive Histograms,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2021), Virtual Conference, 2021.

mehr

BibTeX

@inproceedings{marx:20:myl,
TITLE = {Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multidimensional Adaptive Histograms},
AUTHOR = {Marx, Alexander and Yang, Lincen and van Leeuwen, Matthijs},
LANGUAGE = {eng},
ISBN = {978-1-61197-670-0},
DOI = {10.1137/1.9781611976700.44},
PUBLISHER = {SIAM},
YEAR = {2021},
BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2021)},
PAGES = {387--395},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Yang, Lincen
%A van Leeuwen, Matthijs
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multidimensional Adaptive Histograms : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0BC7-C
%R 10.1137/1.9781611976700.44
%D 2021
%B SIAM International Conference on Data Mining
%Z date of event: 2021-04-29 - 2021-05-01
%C Virtual Conference
%B Proceedings of the SIAM International Conference on Data Mining
%P 387 - 395
%I SIAM
%@ 978-1-61197-670-0

Paper

A. Marx and J. Fischer

“Estimating Mutual Information via Geodesic kNN,” 2021. [Online]. Available: https://arxiv.org/abs/2110.13883.

mehr

Abstract

Estimating mutual information (MI) between two continuous random variables
$X$ and $Y$ allows to capture non-linear dependencies between them,
non-parametrically. As such, MI estimation lies at the core of many data
science applications. Yet, robustly estimating MI for high-dimensional $X$ and
$Y$ is still an open research question.
In this paper, we formulate this problem through the lens of manifold
learning. That is, we leverage the common assumption that the information of
$X$ and $Y$ is captured by a low-dimensional manifold embedded in the observed
high-dimensional space and transfer it to MI estimation. As an extension to
state-of-the-art $k$NN estimators, we propose to determine the $k$-nearest
neighbours via geodesic distances on this manifold rather than form the ambient
space, which allows us to estimate MI even in the high-dimensional setting. An
empirical evaluation of our method, G-KSG, against the state-of-the-art shows
that it yields good estimations of the MI in classical benchmark, and manifold
tasks, even for high dimensional datasets, which none of the existing methods
can provide.

BibTeX

@online{Marx_arXiv2110.13883,
TITLE = {{Estimating Mutual Information via Geodesic $k$NN}},
AUTHOR = {Marx, Alexander and Fischer, Jonas},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2110.13883},
EPRINT = {2110.13883},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Estimating mutual information (MI) between two continuous random variables<br>$X$ and $Y$ allows to capture non-linear dependencies between them,<br>non-parametrically. As such, MI estimation lies at the core of many data<br>science applications. Yet, robustly estimating MI for high-dimensional $X$ and<br>$Y$ is still an open research question.<br> In this paper, we formulate this problem through the lens of manifold<br>learning. That is, we leverage the common assumption that the information of<br>$X$ and $Y$ is captured by a low-dimensional manifold embedded in the observed<br>high-dimensional space and transfer it to MI estimation. As an extension to<br>state-of-the-art $k$NN estimators, we propose to determine the $k$-nearest<br>neighbours via geodesic distances on this manifold rather than form the ambient<br>space, which allows us to estimate MI even in the high-dimensional setting. An<br>empirical evaluation of our method, G-KSG, against the state-of-the-art shows<br>that it yields good estimations of the MI in classical benchmark, and manifold<br>tasks, even for high dimensional datasets, which none of the existing methods<br>can provide.<br>},
}

Endnote

%0 Report
%A Marx, Alexander
%A Fischer, Jonas
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Estimating Mutual Information via Geodesic kNN : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-B130-8
%U https://arxiv.org/abs/2110.13883
%D 2021
%X   Estimating mutual information (MI) between two continuous random variables<br>$X$ and $Y$ allows to capture non-linear dependencies between them,<br>non-parametrically. As such, MI estimation lies at the core of many data<br>science applications. Yet, robustly estimating MI for high-dimensional $X$ and<br>$Y$ is still an open research question.<br>  In this paper, we formulate this problem through the lens of manifold<br>learning. That is, we leverage the common assumption that the information of<br>$X$ and $Y$ is captured by a low-dimensional manifold embedded in the observed<br>high-dimensional space and transfer it to MI estimation. As an extension to<br>state-of-the-art $k$NN estimators, we propose to determine the $k$-nearest<br>neighbours via geodesic distances on this manifold rather than form the ambient<br>space, which allows us to estimate MI even in the high-dimensional setting. An<br>empirical evaluation of our method, G-KSG, against the state-of-the-art shows<br>that it yields good estimations of the MI in classical benchmark, and manifold<br>tasks, even for high dimensional datasets, which none of the existing methods<br>can provide.<br>
%K Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT

Conference paper

O. A. Mian, A. Marx, and J. Vreeken

“Discovering Fully Oriented Causal Networks,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2021.

mehr

BibTeX

@inproceedings{mian:20:globe,
TITLE = {Discovering Fully Oriented Causal Networks},
AUTHOR = {Mian, Osman A. and Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-57735-866-4},
DOI = {10.1609/aaai.v35i10.17085},
PUBLISHER = {AAAI},
YEAR = {2021},
BOOKTITLE = {Thirty-Fifth AAAI Conference on Artificial Intelligence},
PAGES = {8975--8982},
ADDRESS = {Vancouver, Canada},
}

Endnote

%0 Conference Proceedings
%A Mian, Osman A.
%A Marx, Alexander
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Fully Oriented Causal Networks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0BCB-8
%R 10.1609/aaai.v35i10.17085
%D 2021
%B The Thirty-Fifth Conference on Artificial Intelligence
%Z date of event: 2021-02-02 - 2021-02-09
%C Vancouver, Canada
%B Thirty-Fifth AAAI Conference on Artificial Intelligence 
%P 8975 - 8982
%I AAAI
%@ 978-1-57735-866-4

Conference paper

P. Mirza, M. Abouhamra, and G. Weikum

“AligNarr: Aligning Narratives on Movies,” in The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2019), Virtual, 2021.

mehr

BibTeX

@inproceedings{Mirza_ACL-short.54,
TITLE = {{AligNarr}: {A}ligning Narratives on Movies},
AUTHOR = {Mirza, Paramita and Abouhamra, Mostafa and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-954085-53-4},
URL = {https://aclanthology.org/2021.acl-short.54},
DOI = {10.18653/v1/2021.acl-short.54},
PUBLISHER = {ACL},
YEAR = {2021},
BOOKTITLE = {The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2019)},
EDITOR = {Xia, Fei and Li, Wenjie and Navigli, Roberto},
PAGES = {427--433},
ADDRESS = {Virtual},
}

Endnote

%0 Conference Proceedings
%A Mirza, Paramita
%A Abouhamra, Mostafa
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T AligNarr: Aligning Narratives on Movies : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-4A1F-3
%U https://aclanthology.org/2021.acl-short.54
%R 10.18653/v1/2021.acl-short.54
%D 2021
%B The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
%Z date of event: 2021-08-01 - 2021-08-06
%C Virtual
%B The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
%E Xia, Fei; Li, Wenjie; Navigli, Roberto
%P 427 - 433
%I ACL
%@ 978-1-954085-53-4

Conference paper

S. Nag Chowdhury, R. Bhowmik, H. Ravi, G. de Melo, S. Razniewski, and G. Weikum

“Exploiting Image-Text Synergy for Contextual Image Captioning,” in Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN), Kyiv, Ukraine (Online), 2021.

mehr

BibTeX

@inproceedings{Chod_ECAL2021,
TITLE = {Exploiting Image-Text Synergy for Contextual Image Captioning},
AUTHOR = {Nag Chowdhury, Sreyasi and Bhowmik, Rajarshi and Ravi, Hareesh and de Melo, Gerard and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-954085-15-2},
URL = {https://aclanthology.org/2021.lantern-1.3},
PUBLISHER = {ACL},
YEAR = {2021},
BOOKTITLE = {Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)},
EDITOR = {Mosbach, Marius and Hedderich, Michael A. and Pezzelle, Sandro and Mogadala, Aditya and Klakow, Dietrich and Moens, Marie-Francine and Akata, Zeynep},
PAGES = {30--37},
ADDRESS = {Kyiv, Ukraine (Online)},
}

Endnote

%0 Conference Proceedings
%A Nag Chowdhury, Sreyasi
%A Bhowmik, Rajarshi
%A Ravi, Hareesh
%A de Melo, Gerard
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Exploiting Image-Text Synergy for Contextual Image Captioning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0E60-D
%U https://aclanthology.org/2021.lantern-1.3
%D 2021
%B The Third Workshop Beyond Vision and LANguage: inTEgrating Real-world kNowledge

%Z date of event: 2021-04-20 - 2021-04-20
%C Kyiv, Ukraine (Online)
%B Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
%E Mosbach, Marius; Hedderich, Michael A.; Pezzelle, Sandro; Mogadala, Aditya; Klakow, Dietrich; Moens, Marie-Francine; Akata, Zeynep
%P 30 - 37
%I ACL
%@ 978-1-954085-15-2

Conference paper

S. Nag Chowdhury, R. Wickramarachchi, M. H. Gad-Elrab, D. Stepanova, and C. Henson

“Towards Leveraging Commonsense Knowledge for Autonomous Driving,” in International Semantic Web Conference (ISWC) 2021: Posters, Demos, and Industry Tracks, Virtual Conference, 2021.

mehr

BibTeX

@inproceedings{NagChowdhury_ISWC2021,
TITLE = {Towards Leveraging Commonsense Knowledge for Autonomous Driving},
AUTHOR = {Nag Chowdhury, Sreyasi and Wickramarachchi, Ruwan and Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Henson, Cory},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {https://ceur-ws.org/Vol-2980/paper396.pdf; urn:nbn:de:0074-2980-6},
PUBLISHER = {CEUR-WS.org},
YEAR = {2021},
BOOKTITLE = {International Semantic Web Conference (ISWC) 2021: Posters, Demos, and Industry Tracks},
EDITOR = {Seneviratne, Oshani and Pesquita, Catia and Sequeda, Juan and Etcheverry, Lorena},
PAGES = {1--5},
EID = {396},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2980},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Nag Chowdhury, Sreyasi
%A Wickramarachchi, Ruwan
%A Gad-Elrab, Mohamed Hassan
%A Stepanova, Daria
%A Henson, Cory
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Towards Leveraging Commonsense Knowledge for Autonomous Driving : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-42CD-6
%U https://ceur-ws.org/Vol-2980/paper396.pdf
%D 2021
%B 20th International Semantic Web Conference
%Z date of event: 2021-10-24 - 2021-10-28
%C Virtual Conference
%B International Semantic Web Conference (ISWC) 2021: Posters, Demos, and Industry Tracks
%E Seneviratne, Oshani; Pesquita, Catia; Sequeda, Juan; Etcheverry, Lorena
%P 1 - 5
%Z sequence number: 396
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 2980
%@ false

Thesis

D5IMPR-CS

S. Nag Chowdhury

“Text-image synergy for multimodal retrieval and annotation,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

Abstract

Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.

BibTeX

@phdthesis{Chowphd2021,
TITLE = {Text-image synergy for multimodal retrieval and annotation},
AUTHOR = {Nag Chowdhury, Sreyasi},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-345092},
DOI = {10.22028/D291-34509},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
ABSTRACT = {Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.},
}

Endnote

%0 Thesis
%A Nag Chowdhury, Sreyasi
%A referee: Weikum, Gerhard
%A referee: de Melo, Gerard
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Text-image synergy for multimodal retrieval and annotation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-428A-1
%R 10.22028/D291-34509
%U urn:nbn:de:bsz:291--ds-345092
%F OTHER: hdl:20.500.11880/31690
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 131 p.
%V phd
%9 phd
%X 	Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.
%K image retrieval
image-text alignment
image captioning
commonsense knowledge
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/31690

Conference paper

S. Nag Chowdhury, S. Razniewski, and G. Weikum

“SANDI: Story-and-Images Alignment,” in The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), Online, 2021.

mehr

BibTeX

@inproceedings{Thinh_EACL21,
TITLE = {{SANDI}: {S}tory-and-Images Alignment},
AUTHOR = {Nag Chowdhury, Sreyasi and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-954085-02-2},
URL = {https://aclanthology.org/2021.eacl-main.85},
PUBLISHER = {ACL},
YEAR = {2021},
BOOKTITLE = {The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)},
EDITOR = {Merlo, Paola and Tiedemann, Jorg and Tsarfaty, Reut},
PAGES = {989--999},
ADDRESS = {Online},
}

Endnote

%0 Conference Proceedings
%A Nag Chowdhury, Sreyasi
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T SANDI: Story-and-Images Alignment : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-04A2-C
%U https://aclanthology.org/2021.eacl-main.85
%D 2021
%B 16th Conference of the European Chapter of the Association for Computational Linguistics
%Z date of event: 2021-04-19 - 2021-04-23
%C Online
%B The 16th Conference of the European Chapter of the
Association for Computational Linguistics
%E Merlo, Paola; Tiedemann, Jorg; Tsarfaty, Reut
%P 989 - 999
%I ACL
%@ 978-1-954085-02-2

Paper

S. Naseri, J. Dalton, A. Yates, and J. Allan

“CEQE: Contextualized Embeddings for Query Expansion,” 2021. [Online]. Available: https://arxiv.org/abs/2103.05256.

mehr

Abstract

In this work we leverage recent advances in context-sensitive language models
to improve the task of query expansion. Contextualized word representation
models, such as ELMo and BERT, are rapidly replacing static embedding models.
We propose a new model, Contextualized Embeddings for Query Expansion (CEQE),
that utilizes query-focused contextualized embedding vectors. We study the
behavior of contextual representations generated for query expansion in ad-hoc
document retrieval. We conduct our experiments on probabilistic retrieval
models as well as in combination with neural ranking models. We evaluate CEQE
on two standard TREC collections: Robust and Deep Learning. We find that CEQE
outperforms static embedding-based expansion methods on multiple collections
(by up to 18% on Robust and 31% on Deep Learning on average precision) and also
improves over proven probabilistic pseudo-relevance feedback (PRF) models. We
further find that multiple passes of expansion and reranking result in
continued gains in effectiveness with CEQE-based approaches outperforming other
approaches. The final model incorporating neural and CEQE-based expansion score
achieves gains of up to 5% in P@20 and 2% in AP on Robust over the
state-of-the-art transformer-based re-ranking model, Birch.

BibTeX

@online{Naseri_2103.05256,
TITLE = {{CEQE}: Contextualized Embeddings for Query Expansion},
AUTHOR = {Naseri, Shahrzad and Dalton, Jeffrey and Yates, Andrew and Allan, James},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2103.05256},
EPRINT = {2103.05256},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {In this work we leverage recent advances in context-sensitive language models<br>to improve the task of query expansion. Contextualized word representation<br>models, such as ELMo and BERT, are rapidly replacing static embedding models.<br>We propose a new model, Contextualized Embeddings for Query Expansion (CEQE),<br>that utilizes query-focused contextualized embedding vectors. We study the<br>behavior of contextual representations generated for query expansion in ad-hoc<br>document retrieval. We conduct our experiments on probabilistic retrieval<br>models as well as in combination with neural ranking models. We evaluate CEQE<br>on two standard TREC collections: Robust and Deep Learning. We find that CEQE<br>outperforms static embedding-based expansion methods on multiple collections<br>(by up to 18% on Robust and 31% on Deep Learning on average precision) and also<br>improves over proven probabilistic pseudo-relevance feedback (PRF) models. We<br>further find that multiple passes of expansion and reranking result in<br>continued gains in effectiveness with CEQE-based approaches outperforming other<br>approaches. The final model incorporating neural and CEQE-based expansion score<br>achieves gains of up to 5% in P@20 and 2% in AP on Robust over the<br>state-of-the-art transformer-based re-ranking model, Birch.<br>},
}

Endnote

%0 Report
%A Naseri, Shahrzad
%A Dalton, Jeffrey
%A Yates, Andrew
%A Allan, James
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T CEQE: Contextualized Embeddings for Query Expansion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6779-C
%U https://arxiv.org/abs/2103.05256
%D 2021
%X   In this work we leverage recent advances in context-sensitive language models<br>to improve the task of query expansion. Contextualized word representation<br>models, such as ELMo and BERT, are rapidly replacing static embedding models.<br>We propose a new model, Contextualized Embeddings for Query Expansion (CEQE),<br>that utilizes query-focused contextualized embedding vectors. We study the<br>behavior of contextual representations generated for query expansion in ad-hoc<br>document retrieval. We conduct our experiments on probabilistic retrieval<br>models as well as in combination with neural ranking models. We evaluate CEQE<br>on two standard TREC collections: Robust and Deep Learning. We find that CEQE<br>outperforms static embedding-based expansion methods on multiple collections<br>(by up to 18% on Robust and 31% on Deep Learning on average precision) and also<br>improves over proven probabilistic pseudo-relevance feedback (PRF) models. We<br>further find that multiple passes of expansion and reranking result in<br>continued gains in effectiveness with CEQE-based approaches outperforming other<br>approaches. The final model incorporating neural and CEQE-based expansion score<br>achieves gains of up to 5% in P@20 and 2% in AP on Robust over the<br>state-of-the-art transformer-based re-ranking model, Birch.<br>
%K Computer Science, Information Retrieval, cs.IR

Conference paper

S. Naseri, J. Dalton, A. Yates, and J. Allan

“CEQE: Contextualized Embeddings for Query Expansion,” in Advances in Information Retrieval (ECIR 2021), Lucca, Italy (Online Event), 2021.

mehr

BibTeX

@inproceedings{Naseri_ECIR2021,
TITLE = {{CEQE}: {C}ontextualized Embeddings for Query Expansion},
AUTHOR = {Naseri, Shahrzad and Dalton, Jeff and Yates, Andrew and Allan, James},
LANGUAGE = {eng},
ISBN = {978-3-030-72112-1},
DOI = {10.1007/978-3-030-72113-8_31},
PUBLISHER = {Springer},
YEAR = {2021},
DATE = {2021},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2021)},
EDITOR = {Hiemstra, Djoerd and Moens, Marie-Francine and Mothe, Josiane and Perego, Raffaele and Potthast, Martin and Sebastiani, Fabrizio},
PAGES = {467--482},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12656},
ADDRESS = {Lucca, Italy (Online Event)},
}

Endnote

%0 Conference Proceedings
%A Naseri, Shahrzad
%A Dalton, Jeff
%A Yates, Andrew
%A Allan, James
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T CEQE: Contextualized Embeddings for Query Expansion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6628-8
%R 10.1007/978-3-030-72113-8_31
%D 2021
%B 43rd European Conference on IR Research
%Z date of event: 2021-03-28 - 2021-04-01
%C Lucca, Italy (Online Event)
%B Advances in Information Retrieval
%E Hiemstra, Djoerd; Moens, Marie-Francine; Mothe, Josiane; Perego, Raffaele; Potthast, Martin; Sebastiani, Fabrizio
%P 467 - 482
%I Springer
%@ 978-3-030-72112-1
%B Lecture Notes in Computer Science
%N 12656

Thesis

T. Nguyen

“Grounding Depression Detection in Clinical Questionnaires by Detecting Mental Health Symptoms,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

BibTeX

@mastersthesis{NguyenMSc21,
TITLE = {Grounding Depression Detection in Clinical Questionnaires by Detecting Mental Health Symptoms},
AUTHOR = {Nguyen, Thong},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
}

Endnote

%0 Thesis
%A Nguyen, Thong
%Y Yates, Andrew
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Grounding Depression Detection in Clinical Questionnaires by Detecting
Mental Health Symptoms : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-2DA3-9
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P X, 68 p.
%V master
%9 master

Conference paper

T.-P. Nguyen, S. Razniewski, and G. Weikum

“Advanced Semantics for Commonsense Knowledge Extraction,” in The Web Conference 2021 (WWW 2021), Ljubljana, Slovenia, 2021.

mehr

BibTeX

@inproceedings{Nguyen_WWW21,
TITLE = {Advanced Semantics for Commonsense Knowledge Extraction},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8312-7},
DOI = {10.1145/3442381.3449827},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {The Web Conference 2021 (WWW 2021)},
EDITOR = {Leskovec, Jure and Grobelnik, Marko and Najork, Marc and Tang, Jie and Zia, Leila},
PAGES = {2636--2647},
ADDRESS = {Ljubljana, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Advanced Semantics for Commonsense Knowledge Extraction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0196-D
%R 10.1145/3442381.3449827
%D 2021
%B 30th The Web Conference
%Z date of event: 2021-04-30 - 
%C Ljubljana, Slovenia
%B The Web Conference 2021
%E Leskovec, Jure; Grobelnik, Marko; Najork, Marc; Tang, Jie; Zia, Leila
%P 2636 - 2647
%I ACM
%@ 978-1-4503-8312-7

Paper

T.-P. Nguyen, S. Razniewski, and G. Weikum

“Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering,” 2021. [Online]. Available: https://arxiv.org/abs/2105.13662.

mehr

Abstract

ASCENT is a fully automated methodology for extracting and consolidating
commonsense assertions from web contents (Nguyen et al., WWW 2021). It advances
traditional triple-based commonsense knowledge representation by capturing
semantic facets like locations and purposes, and composite concepts, i.e.,
subgroups and related aspects of subjects. In this demo, we present a web
portal that allows users to understand its construction process, explore its
content, and observe its impact in the use case of question answering. The demo
website and an introductory video are both available online.

BibTeX

@online{Nguyen_2105.13662,
TITLE = {Inside {ASCENT}: {E}xploring a Deep Commonsense Knowledge Base and its Usage in Question Answering},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2105.13662},
EPRINT = {2105.13662},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {ASCENT is a fully automated methodology for extracting and consolidating<br>commonsense assertions from web contents (Nguyen et al., WWW 2021). It advances<br>traditional triple-based commonsense knowledge representation by capturing<br>semantic facets like locations and purposes, and composite concepts, i.e.,<br>subgroups and related aspects of subjects. In this demo, we present a web<br>portal that allows users to understand its construction process, explore its<br>content, and observe its impact in the use case of question answering. The demo<br>website and an introductory video are both available online.<br>},
}

Endnote

%0 Report
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-4A2E-2
%U https://arxiv.org/abs/2105.13662
%D 2021
%X   ASCENT is a fully automated methodology for extracting and consolidating<br>commonsense assertions from web contents (Nguyen et al., WWW 2021). It advances<br>traditional triple-based commonsense knowledge representation by capturing<br>semantic facets like locations and purposes, and composite concepts, i.e.,<br>subgroups and related aspects of subjects. In this demo, we present a web<br>portal that allows users to understand its construction process, explore its<br>content, and observe its impact in the use case of question answering. The demo<br>website and an introductory video are both available online.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL
%U https://youtu.be/qMkJXqu_Yd4

Article

S. Razniewski, H. Arnaout, S. Ghosh, and F. Suchanek

“On the Limits of Machine Knowledge: Completeness, Recall and Negation in Web-scale Knowledge Bases,” Proceedings of the VLDB Endowment (Proc. VLDB 2021), vol. 14, no. 12, 2021.

mehr

BibTeX

@article{Razniewski2021_PVLDB,
TITLE = {On the Limits of Machine Knowledge: {C}ompleteness, Recall and Negation in Web-scale Knowledge Bases},
AUTHOR = {Razniewski, Simon and Arnaout, Hiba and Ghosh, Shrestha and Suchanek, Fabian},
LANGUAGE = {eng},
PUBLISHER = {VLDB Endowment Inc.},
YEAR = {2021},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {14},
NUMBER = {12},
PAGES = {3175--3177},
BOOKTITLE = {Proceedings of the 47th International Conference on Very Large Data Bases (VLDB 2021)},
EDITOR = {Dong, Xin Luna and Naumann, Felix},
}

Endnote

%0 Journal Article
%A Razniewski, Simon
%A Arnaout, Hiba
%A Ghosh, Shrestha
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T On the Limits of Machine Knowledge: Completeness, Recall and Negation in Web-scale Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6544-9
%7 2021
%D 2021
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 14
%N 12
%& 3175
%P 3175 - 3177
%I VLDB Endowment Inc.
%B Proceedings of the 47th International Conference on Very Large Data Bases
%O VLDB 2021 Copenhagen, Denmark, 16-20 August 2021

Conference paper

S. Razniewski, N. Tandon, and A. S. Varde

“Information to Wisdom: Commonsense Knowledge Extraction and Compilation,” in WSDM ’21, 14th International Conference on Web Search and Data Mining, Virtual Event, Israel, 2021.

mehr

BibTeX

@inproceedings{Razniewski_WSDM21,
TITLE = {Information to Wisdom: {C}ommonsense Knowledge Extraction and Compilation},
AUTHOR = {Razniewski, Simon and Tandon, Niket and Varde, Aparna S.},
LANGUAGE = {eng},
ISBN = {978-1-4503-8297-7},
DOI = {10.1145/3437963.3441664},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {WSDM '21, 14th International Conference on Web Search and Data Mining},
EDITOR = {Lewin-Eytan, Liane and Carmel, David and Yom-Tov, Elad and Agichtein, Eugene and Gabrilovich, Evgeniy},
PAGES = {1143--1146},
ADDRESS = {Virtual Event, Israel},
}

Endnote

%0 Conference Proceedings
%A Razniewski, Simon
%A Tandon, Niket
%A Varde, Aparna S.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Information to Wisdom: Commonsense Knowledge Extraction and Compilation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-65FE-8
%R 10.1145/3437963.3441664
%D 2021
%B 14th International Conference on Web Search and Data Mining
%Z date of event: 2021-03-08 - 2021-03-12
%C Virtual Event, Israel
%B WSDM '21
%E Lewin-Eytan, Liane; Carmel, David; Yom-Tov, Elad; Agichtein, Eugene; Gabrilovich, Evgeniy
%P 1143 - 1146
%I ACM
%@ 978-1-4503-8297-7

Paper

S. Razniewski, A. Yates, N. Kassner, and G. Weikum

“Language Models As or For Knowledge Bases,” 2021. [Online]. Available: https://arxiv.org/abs/2110.04888.

mehr

Abstract

Pre-trained language models (LMs) have recently gained attention for their
potential as an alternative to (or proxy for) explicit knowledge bases (KBs).
In this position paper, we examine this hypothesis, identify strengths and
limitations of both LMs and KBs, and discuss the complementary nature of the
two paradigms. In particular, we offer qualitative arguments that latent LMs
are not suitable as a substitute for explicit KBs, but could play a major role
for augmenting and curating KBs.

BibTeX

@online{Razniewski_2110.04888,
TITLE = {Language Models As or For Knowledge Bases},
AUTHOR = {Razniewski, Simon and Yates, Andrew and Kassner, Nora and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2110.04888},
EPRINT = {2110.04888},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Pre-trained language models (LMs) have recently gained attention for their<br>potential as an alternative to (or proxy for) explicit knowledge bases (KBs).<br>In this position paper, we examine this hypothesis, identify strengths and<br>limitations of both LMs and KBs, and discuss the complementary nature of the<br>two paradigms. In particular, we offer qualitative arguments that latent LMs<br>are not suitable as a substitute for explicit KBs, but could play a major role<br>for augmenting and curating KBs.<br>},
}

Endnote

%0 Report
%A Razniewski, Simon
%A Yates, Andrew
%A Kassner, Nora
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Language Models As or For Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6510-3
%U https://arxiv.org/abs/2110.04888
%D 2021
%X   Pre-trained language models (LMs) have recently gained attention for their<br>potential as an alternative to (or proxy for) explicit knowledge bases (KBs).<br>In this position paper, we examine this hypothesis, identify strengths and<br>limitations of both LMs and KBs, and discuss the complementary nature of the<br>two paradigms. In particular, we offer qualitative arguments that latent LMs<br>are not suitable as a substitute for explicit KBs, but could play a major role<br>for augmenting and curating KBs.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB

Paper

S. Razniewski

“Commonsense Knowledge Base Construction in the Age of Big Data,” 2021. [Online]. Available: https://arxiv.org/abs/2105.01925.

mehr

Abstract

Compiling commonsense knowledge is traditionally an AI topic approached by
manual labor. Recent advances in web data processing have enabled automated
approaches. In this demonstration we will showcase three systems for automated
commonsense knowledge base construction, highlighting each time one aspect of
specific interest to the data management community. (i) We use Quasimodo to
illustrate knowledge extraction systems engineering, (ii) Dice to illustrate
the role that schema constraints play in cleaning fuzzy commonsense knowledge,
and (iii) Ascent to illustrate the relevance of conceptual modelling. The demos
are available online at quasimodo.r2.enst.fr,
dice.mpi-inf.mpg.de and ascent.mpi-inf.mpg.de.

BibTeX

@online{Razniewski_2105.01925,
TITLE = {Commonsense Knowledge Base Construction in the Age of Big Data},
AUTHOR = {Razniewski, Simon},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2105.01925},
EPRINT = {2105.01925},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Compiling commonsense knowledge is traditionally an AI topic approached by<br>manual labor. Recent advances in web data processing have enabled automated<br>approaches. In this demonstration we will showcase three systems for automated<br>commonsense knowledge base construction, highlighting each time one aspect of<br>specific interest to the data management community. (i) We use Quasimodo to<br>illustrate knowledge extraction systems engineering, (ii) Dice to illustrate<br>the role that schema constraints play in cleaning fuzzy commonsense knowledge,<br>and (iii) Ascent to illustrate the relevance of conceptual modelling. The demos<br>are available online at https://quasimodo.r2.enst.fr,<br>https://dice.mpi-inf.mpg.de and ascent.mpi-inf.mpg.de.<br>},
}

Endnote

%0 Report
%A Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Commonsense Knowledge Base Construction in the Age of Big Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6604-0
%U https://arxiv.org/abs/2105.01925
%D 2021
%X   Compiling commonsense knowledge is traditionally an AI topic approached by<br>manual labor. Recent advances in web data processing have enabled automated<br>approaches. In this demonstration we will showcase three systems for automated<br>commonsense knowledge base construction, highlighting each time one aspect of<br>specific interest to the data management community. (i) We use Quasimodo to<br>illustrate knowledge extraction systems engineering, (ii) Dice to illustrate<br>the role that schema constraints play in cleaning fuzzy commonsense knowledge,<br>and (iii) Ascent to illustrate the relevance of conceptual modelling. The demos<br>are available online at https://quasimodo.r2.enst.fr,<br>https://dice.mpi-inf.mpg.de and ascent.mpi-inf.mpg.de.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Databases, cs.DB

Conference paper

J. Romero

“Pyformlang: An Educational Library for Formal Language Manipulation,” in SIGCSE ’21, The 52nd ACM Technical Symposium on Computer Science Education, Virtual Event, USA, 2021.

mehr

BibTeX

@inproceedings{Romero_SIGCSE21,
TITLE = {Pyformlang: {An} Educational Library for Formal Language Manipulation},
AUTHOR = {Romero, Julien},
LANGUAGE = {eng},
ISBN = {978-1-4503-8062-1},
DOI = {10.1145/3408877.3432464},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGCSE '21, The 52nd ACM Technical Symposium on Computer Science Education},
EDITOR = {Sherriff, Mark and Merkle, Laurence D. and Cutter, Pamela and Monge, Alvaro and Sheard, Judithe},
PAGES = {576--582},
ADDRESS = {Virtual Event, USA},
}

Endnote

%0 Conference Proceedings
%A Romero, Julien
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Pyformlang: An Educational Library for Formal Language Manipulation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F836-5
%R 10.1145/3408877.3432464
%D 2021
%B The 52nd ACM Technical Symposium on Computer Science Education
%Z date of event: 2021-03-13 - 2021-03-20
%C Virtual Event, USA
%B SIGCSE '21
%E Sherriff, Mark; Merkle, Laurence D.; Cutter, Pamela; Monge, Alvaro; Sheard, Judithe
%P 576 - 582
%I ACM
%@ 978-1-4503-8062-1

Book

R. Saha Roy and A. Anand

Question Answering for the Curated Web: Tasks and Methods in QA over Knowledge Bases and Text Collections. San Rafael, CA: Morgan & Claypool, 2021.

mehr

BibTeX

@book{SahaRoy2021,
TITLE = {Question Answering for the Curated Web: Tasks and Methods in {QA} over Knowledge Bases and Text Collections},
AUTHOR = {Saha Roy, Rishiraj and Anand, Avishek},
LANGUAGE = {eng},
ISBN = {978-1636392387},
DOI = {10.2200/S0113ED1V01Y202109ICR076},
PUBLISHER = {Morgan \& Claypool},
ADDRESS = {San Rafael, CA},
YEAR = {2021},
DATE = {2021},
PAGES = {194 p.},
SERIES = {Synthesis Lectures on Information Concepts, Retrieval, and Services},
}

Endnote

%0 Book
%A Saha Roy, Rishiraj
%A Anand, Avishek
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Question Answering for the Curated Web: Tasks and Methods in QA over Knowledge Bases and Text Collections : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-B116-6
%R 10.2200/S0113ED1V01Y202109ICR076
%@ 978-1636392387
%I Morgan & Claypool
%C San Rafael, CA 
%D 2021
%P 194 p.
%B Synthesis Lectures on Information Concepts, Retrieval, and Services

Article

BIOD5

F. Schmidt, A. Marx, N. Baumgarten, M. Hebel, M. Wegner, M. Kaulich, M. S. Leisegang, R. P. Brandes, J. Göke, J. Vreeken, and M. H. Schulz

“Integrative Analysis of Epigenetics Data Identifies Gene-specific Regulatory Elements,” Nucleic Acids Research (London), vol. 49, no. 18, 2021.

mehr

BibTeX

@article{Schmidt_NAR21,
TITLE = {Integrative Analysis of Epigenetics Data Identifies Gene-specific Regulatory Elements},
AUTHOR = {Schmidt, Florian and Marx, Alexander and Baumgarten, Nina and Hebel, Marie and Wegner, Martin and Kaulich, Manuel and Leisegang, Matthias S. and Brandes, Ralf P and G{\"o}ke, Jonathan and Vreeken, Jilles and Schulz, Marcel Holger},
LANGUAGE = {eng},
ISSN = {0305-1048},
DOI = {10.1093/nar/gkab798},
PUBLISHER = {Oxford University Press},
ADDRESS = {Oxford},
YEAR = {2021},
DATE = {2021},
JOURNAL = {Nucleic Acids Research (London)},
VOLUME = {49},
NUMBER = {18},
PAGES = {10397--10418},
}

Endnote

%0 Journal Article
%A Schmidt, Florian
%A Marx, Alexander
%A Baumgarten, Nina
%A Hebel, Marie
%A Wegner, Martin
%A Kaulich, Manuel
%A Leisegang, Matthias S.
%A Brandes, Ralf P
%A G&#246;ke, Jonathan
%A Vreeken, Jilles
%A Schulz, Marcel Holger
%+ Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society
%T Integrative Analysis of Epigenetics Data Identifies Gene-specific Regulatory Elements : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6D54-F
%R 10.1093/nar/gkab798
%2 PMC8501997
%7 2021
%D 2021
%J Nucleic Acids Research (London)
%O Nucleic Acids Res
%V 49
%N 18
%& 10397
%P 10397 - 10418
%I Oxford University Press
%C Oxford
%@ false

Thesis

D5IMPR-CS

X. Shen

“Deep Latent-Variable Models for Neural Text Generation,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

BibTeX

@phdthesis{Shenphd2021,
TITLE = {Deep Latent-Variable Models for Neural Text Generation},
AUTHOR = {Shen, Xiaoyu},
LANGUAGE = {eng},
URL = {nbn:de:bsz:291--ds-350558},
DOI = {10.22028/D291-35055},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
}

Endnote

%0 Thesis
%A Shen, Xiaoyu
%Y Klakow, Dietrich 
%A referee: Weikum, Gerhard
%A referee: Sch&#252;tze, Hinrich
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Deep Latent-Variable Models for Neural Text Generation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-B25D-6
%R 10.22028/D291-35055 
%U nbn:de:bsz:291--ds-350558
%F OTHER: hdl:20.500.11880/32106
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 201 p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/32106

Thesis

S. Shrinivasan

“Knowledge Base Stability,” Universität des Saarlandes, Saarbrücken, 2021.

mehr

BibTeX

@mastersthesis{ShrinivasanMSc21,
TITLE = {Knowledge Base Stability},
AUTHOR = {Shrinivasan, Suhas},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2021},
DATE = {2021},
}

Endnote

%0 Thesis
%A Shrinivasan, Suhas
%Y Razniewski, Simon
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Base Stability : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000D-15A0-6
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2021
%P 87 p.
%V master
%9 master

Paper

S. Singhania, S. Razniewski, and G. Weikum

“Predicting Document Coverage for Relation Extraction,” 2021. [Online]. Available: https://arxiv.org/abs/2111.13611.

mehr

Abstract

This paper presents a new task of predicting the coverage of a text document
for relation extraction (RE): does the document contain many relational tuples
for a given entity? Coverage predictions are useful in selecting the best
documents for knowledge base construction with large input corpora. To study
this problem, we present a dataset of 31,366 diverse documents for 520
entities. We analyze the correlation of document coverage with features like
length, entity mention frequency, Alexa rank, language complexity and
information retrieval scores. Each of these features has only moderate
predictive power. We employ methods combining features with statistical models
like TF-IDF and language models like BERT. The model combining features and
BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of
coverage predictions on two use cases: KB construction and claim refutation.

BibTeX

@online{Singhania2021,
TITLE = {Predicting Document Coverage for Relation Extraction},
AUTHOR = {Singhania, Sneha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2111.13611},
EPRINT = {2111.13611},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {This paper presents a new task of predicting the coverage of a text document<br>for relation extraction (RE): does the document contain many relational tuples<br>for a given entity? Coverage predictions are useful in selecting the best<br>documents for knowledge base construction with large input corpora. To study<br>this problem, we present a dataset of 31,366 diverse documents for 520<br>entities. We analyze the correlation of document coverage with features like<br>length, entity mention frequency, Alexa rank, language complexity and<br>information retrieval scores. Each of these features has only moderate<br>predictive power. We employ methods combining features with statistical models<br>like TF-IDF and language models like BERT. The model combining features and<br>BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of<br>coverage predictions on two use cases: KB construction and claim refutation.<br>},
}

Endnote

%0 Report
%A Singhania, Sneha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Predicting Document Coverage for Relation Extraction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000A-237F-1
%U https://arxiv.org/abs/2111.13611
%D 2021
%X   This paper presents a new task of predicting the coverage of a text document<br>for relation extraction (RE): does the document contain many relational tuples<br>for a given entity? Coverage predictions are useful in selecting the best<br>documents for knowledge base construction with large input corpora. To study<br>this problem, we present a dataset of 31,366 diverse documents for 520<br>entities. We analyze the correlation of document coverage with features like<br>length, entity mention frequency, Alexa rank, language complexity and<br>information retrieval scores. Each of these features has only moderate<br>predictive power. We employ methods combining features with statistical models<br>like TF-IDF and language models like BERT. The model combining features and<br>BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of<br>coverage predictions on two use cases: KB construction and claim refutation.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI

Conference paper

A. Tigunova, P. Mirza, A. Yates, and G. Weikum

“Exploring Personal Knowledge Extraction from Conversations with CHARM,” in WSDM ’21, 14th International Conference on Web Search and Data Mining, Virtual Event, Israel, 2021.

mehr

BibTeX

@inproceedings{Tigunova_WSDM21,
TITLE = {Exploring Personal Knowledge Extraction from Conversations with {CHARM}},
AUTHOR = {Tigunova, Anna and Mirza, Paramita and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-8297-7},
DOI = {10.1145/3437963.3441699},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {WSDM '21, 14th International Conference on Web Search and Data Mining},
EDITOR = {Lewin-Eytan, Liane and Carmel, David and Yom-Tov, Elad and Agichtein, Eugene and Gabrilovich, Evgeniy},
PAGES = {1077--1080},
ADDRESS = {Virtual Event, Israel},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%A Mirza, Paramita
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Exploring Personal Knowledge Extraction from Conversations with CHARM : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F850-7
%R 10.1145/3437963.3441699
%D 2021
%B 14th International Conference on Web Search and Data Mining
%Z date of event: 2021-03-08 - 2021-03-12
%C Virtual Event, Israel
%B WSDM '21
%E Lewin-Eytan, Liane; Carmel, David; Yom-Tov, Elad; Agichtein, Eugene; Gabrilovich, Evgeniy
%P 1077 - 1080
%I ACM
%@ 978-1-4503-8297-7

Conference paper

A. Tigunova, P. Mirza, A. Yates, and G. Weikum

“PRIDE: Predicting Relationships in Conversations,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, 2021.

mehr

BibTeX

@inproceedings{DBLP:conf/emnlp/TigunovaMYW21,
TITLE = {{PRIDE}: {P}redicting Relationships in Conversations},
AUTHOR = {Tigunova, Anna and Mirza, Paramita and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://aclanthology.org/2021.emnlp-main.380/; https://aclanthology.org/2022.emnlp-main},
DOI = {10.18653/v1/2021.emnlp-main.380},
PUBLISHER = {ACL},
YEAR = {2021},
BOOKTITLE = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)},
EDITOR = {Moens, Marie-Francine and Huang, Xuanjing and Specia, Lucia and Yih, Scott Wen-tau},
PAGES = {4636--4650},
ADDRESS = {Punta Cana, Dominican Republic},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%A Mirza, Paramita
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T PRIDE: Predicting Relationships in Conversations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-000C-DBF2-C
%U https://aclanthology.org/2021.emnlp-main.380/
%R 10.18653/v1/2021.emnlp-main.380
%D 2021
%B The Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2021-11-07 - 2021-11-11
%C Punta Cana, Dominican Republic
%B Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
%E Moens, Marie-Francine; Huang, Xuanjing; Specia, Lucia; Yih, Scott Wen-tau
%P 4636 - 4650
%I ACL

Paper

G. H. Torbati, A. Yates, and G. Weikum

“You Get What You Chat: Using Conversations to Personalize Search-based Recommendations,” 2021. [Online]. Available: https://arxiv.org/abs/2109.04716.

mehr

Abstract

Prior work on personalized recommendations has focused on exploiting explicit
signals from user-specific queries, clicks, likes, and ratings. This paper
investigates tapping into a different source of implicit signals of interests
and tastes: online chats between users. The paper develops an expressive model
and effective methods for personalizing search-based entity recommendations.
User models derived from chats augment different methods for re-ranking entity
answers for medium-grained queries. The paper presents specific techniques to
enhance the user models by capturing domain-specific vocabularies and by
entity-based expansion. Experiments are based on a collection of online chats
from a controlled user study covering three domains: books, travel, food. We
evaluate different configurations and compare chat-based user models against
concise user profiles from questionnaires. Overall, these two variants perform
on par in terms of NCDG@20, but each has advantages in certain domains.

BibTeX

@online{Haratinezhad2109.04716,
TITLE = {You Get What You Chat: Using Conversations to Personalize Search-based Recommendations},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2109.04716},
EPRINT = {2109.04716},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Prior work on personalized recommendations has focused on exploiting explicit<br>signals from user-specific queries, clicks, likes, and ratings. This paper<br>investigates tapping into a different source of implicit signals of interests<br>and tastes: online chats between users. The paper develops an expressive model<br>and effective methods for personalizing search-based entity recommendations.<br>User models derived from chats augment different methods for re-ranking entity<br>answers for medium-grained queries. The paper presents specific techniques to<br>enhance the user models by capturing domain-specific vocabularies and by<br>entity-based expansion. Experiments are based on a collection of online chats<br>from a controlled user study covering three domains: books, travel, food. We<br>evaluate different configurations and compare chat-based user models against<br>concise user profiles from questionnaires. Overall, these two variants perform<br>on par in terms of NCDG@20, but each has advantages in certain domains.<br>},
}

Endnote

%0 Report
%A Torbati, Ghazaleh Haratinezhad
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T You Get What You Chat: Using Conversations to Personalize Search-based Recommendations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-64B9-6
%U https://arxiv.org/abs/2109.04716
%D 2021
%X   Prior work on personalized recommendations has focused on exploiting explicit<br>signals from user-specific queries, clicks, likes, and ratings. This paper<br>investigates tapping into a different source of implicit signals of interests<br>and tastes: online chats between users. The paper develops an expressive model<br>and effective methods for personalizing search-based entity recommendations.<br>User models derived from chats augment different methods for re-ranking entity<br>answers for medium-grained queries. The paper presents specific techniques to<br>enhance the user models by capturing domain-specific vocabularies and by<br>entity-based expansion. Experiments are based on a collection of online chats<br>from a controlled user study covering three domains: books, travel, food. We<br>evaluate different configurations and compare chat-based user models against<br>concise user profiles from questionnaires. Overall, these two variants perform<br>on par in terms of NCDG@20, but each has advantages in certain domains.<br>
%K Computer Science, Information Retrieval, cs.IR

Paper

G. H. Torbati, A. Yates, and G. Weikum

“Personalized Entity Search by Sparse and Scrutable User Profiles,” 2021. [Online]. Available: https://arxiv.org/abs/2109.04713.

mehr

Abstract

Prior work on personalizing web search results has focused on considering
query-and-click logs to capture users individual interests. For product search,
extensive user histories about purchases and ratings have been exploited.
However, for general entity search, such as for books on specific topics or
travel destinations with certain features, personalization is largely
underexplored. In this paper, we address personalization of book search, as an
exemplary case of entity search, by exploiting sparse user profiles obtained
through online questionnaires. We devise and compare a variety of re-ranking
methods based on language models or neural learning. Our experiments show that
even very sparse information about individuals can enhance the effectiveness of
the search results.

BibTeX

@online{Haratinezhad2109.04713,
TITLE = {Personalized Entity Search by Sparse and Scrutable User Profiles},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2109.04713},
EPRINT = {2109.04713},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Prior work on personalizing web search results has focused on considering<br>query-and-click logs to capture users individual interests. For product search,<br>extensive user histories about purchases and ratings have been exploited.<br>However, for general entity search, such as for books on specific topics or<br>travel destinations with certain features, personalization is largely<br>underexplored. In this paper, we address personalization of book search, as an<br>exemplary case of entity search, by exploiting sparse user profiles obtained<br>through online questionnaires. We devise and compare a variety of re-ranking<br>methods based on language models or neural learning. Our experiments show that<br>even very sparse information about individuals can enhance the effectiveness of<br>the search results.<br>},
}

Endnote

%0 Report
%A Torbati, Ghazaleh Haratinezhad
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Personalized Entity Search by Sparse and Scrutable User Profiles : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-64AC-5
%U https://arxiv.org/abs/2109.04713
%D 2021
%X   Prior work on personalizing web search results has focused on considering<br>query-and-click logs to capture users individual interests. For product search,<br>extensive user histories about purchases and ratings have been exploited.<br>However, for general entity search, such as for books on specific topics or<br>travel destinations with certain features, personalization is largely<br>underexplored. In this paper, we address personalization of book search, as an<br>exemplary case of entity search, by exploiting sparse user profiles obtained<br>through online questionnaires. We devise and compare a variety of re-ranking<br>methods based on language models or neural learning. Our experiments show that<br>even very sparse information about individuals can enhance the effectiveness of<br>the search results.<br>
%K Computer Science, Information Retrieval, cs.IR

Conference paper

G. H. Torbati, A. Yates, and G. Weikum

“You Get What You Chat: Using Conversations to Personalize Search-based Recommendations,” in Advances in Information Retrieval (ECIR 2021), Lucca, Italy (Online Event), 2021.

mehr

BibTeX

@inproceedings{Torbati_ECIR2021,
TITLE = {You Get What You Chat: {U}sing Conversations to Personalize Search-based Recommendations},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-030-72112-1},
DOI = {10.1007/978-3-030-72113-8_14},
PUBLISHER = {Springer},
YEAR = {2021},
DATE = {2021},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2021)},
EDITOR = {Hiemstra, Djoerd and Moens, Marie-Francine and Mothe, Josiane and Perego, Raffaele and Potthast, Martin and Sebastiani, Fabrizio},
PAGES = {207--223},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12656},
ADDRESS = {Lucca, Italy (Online Event)},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T You Get What You Chat: Using Conversations to Personalize Search-based Recommendations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-ECA2-8
%R 10.1007/978-3-030-72113-8_14
%D 2021
%B 43rd European Conference on IR Research
%Z date of event: 2021-03-28 - 2021-04-01
%C Lucca, Italy (Online Event)
%B Advances in Information Retrieval
%E Hiemstra, Djoerd; Moens, Marie-Francine; Mothe, Josiane; Perego, Raffaele; Potthast, Martin; Sebastiani, Fabrizio
%P 207 - 223
%I Springer
%@ 978-3-030-72112-1
%B Lecture Notes in Computer Science
%N 12656

Paper

K. H. Tran, A. Ghazimatin, and R. Saha Roy

“Counterfactual Explanations for Neural Recommenders,” 2021. [Online]. Available: https://arxiv.org/abs/2105.05008.

mehr

Abstract

Understanding why specific items are recommended to users can significantly
increase their trust and satisfaction in the system. While neural recommenders
have become the state-of-the-art in recent years, the complexity of deep models
still makes the generation of tangible explanations for end users a challenging
problem. Existing methods are usually based on attention distributions over a
variety of features, which are still questionable regarding their suitability
as explanations, and rather unwieldy to grasp for an end user. Counterfactual
explanations based on a small set of the user's own actions have been shown to
be an acceptable solution to the tangibility problem. However, current work on
such counterfactuals cannot be readily applied to neural models. In this work,
we propose ACCENT, the first general framework for finding counterfactual
explanations for neural recommenders. It extends recently-proposed influence
functions for identifying training points most relevant to a recommendation,
from a single to a pair of items, while deducing a counterfactual set in an
iterative process. We use ACCENT to generate counterfactual explanations for
two popular neural models, Neural Collaborative Filtering (NCF) and Relational
Collaborative Filtering (RCF), and demonstrate its feasibility on a sample of
the popular MovieLens 100K dataset.

BibTeX

@online{Tran_2105.05008,
TITLE = {Counterfactual Explanations for Neural Recommenders},
AUTHOR = {Tran, Khanh Hiep and Ghazimatin, Azin and Saha Roy, Rishiraj},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2105.05008},
EPRINT = {2105.05008},
EPRINTTYPE = {arXiv},
YEAR = {2021},
ABSTRACT = {Understanding why specific items are recommended to users can significantly<br>increase their trust and satisfaction in the system. While neural recommenders<br>have become the state-of-the-art in recent years, the complexity of deep models<br>still makes the generation of tangible explanations for end users a challenging<br>problem. Existing methods are usually based on attention distributions over a<br>variety of features, which are still questionable regarding their suitability<br>as explanations, and rather unwieldy to grasp for an end user. Counterfactual<br>explanations based on a small set of the user's own actions have been shown to<br>be an acceptable solution to the tangibility problem. However, current work on<br>such counterfactuals cannot be readily applied to neural models. In this work,<br>we propose ACCENT, the first general framework for finding counterfactual<br>explanations for neural recommenders. It extends recently-proposed influence<br>functions for identifying training points most relevant to a recommendation,<br>from a single to a pair of items, while deducing a counterfactual set in an<br>iterative process. We use ACCENT to generate counterfactual explanations for<br>two popular neural models, Neural Collaborative Filtering (NCF) and Relational<br>Collaborative Filtering (RCF), and demonstrate its feasibility on a sample of<br>the popular MovieLens 100K dataset.<br>},
}

Endnote

%0 Report
%A Tran, Khanh Hiep
%A Ghazimatin, Azin
%A Saha Roy, Rishiraj
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Counterfactual Explanations for Neural Recommenders : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-67C3-7
%U https://arxiv.org/abs/2105.05008
%D 2021
%X   Understanding why specific items are recommended to users can significantly<br>increase their trust and satisfaction in the system. While neural recommenders<br>have become the state-of-the-art in recent years, the complexity of deep models<br>still makes the generation of tangible explanations for end users a challenging<br>problem. Existing methods are usually based on attention distributions over a<br>variety of features, which are still questionable regarding their suitability<br>as explanations, and rather unwieldy to grasp for an end user. Counterfactual<br>explanations based on a small set of the user's own actions have been shown to<br>be an acceptable solution to the tangibility problem. However, current work on<br>such counterfactuals cannot be readily applied to neural models. In this work,<br>we propose ACCENT, the first general framework for finding counterfactual<br>explanations for neural recommenders. It extends recently-proposed influence<br>functions for identifying training points most relevant to a recommendation,<br>from a single to a pair of items, while deducing a counterfactual set in an<br>iterative process. We use ACCENT to generate counterfactual explanations for<br>two popular neural models, Neural Collaborative Filtering (NCF) and Relational<br>Collaborative Filtering (RCF), and demonstrate its feasibility on a sample of<br>the popular MovieLens 100K dataset.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Learning, cs.LG

Conference paper

K. H. Tran, A. Ghazimatin, and R. Saha Roy

“Counterfactual Explanations for Neural Recommenders,” in SIGIR ’21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021.

mehr

BibTeX

@inproceedings{tran2021counterfactual,
TITLE = {Counterfactual Explanations for Neural Recommenders},
AUTHOR = {Tran, Khanh Hiep and Ghazimatin, Azin and Saha Roy, Rishiraj},
LANGUAGE = {eng},
DOI = {10.1145/3404835.3463005},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGIR '21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Diaz, Fernando and Shah, Chirag and Suel, Torsten and Castells, Pablo and Jones, Rosie and Sakai, Tetsuya and Bellogin, Alejandro and Yushioka, Massaharu},
PAGES = {1627--1631},
ADDRESS = {Virtual Event, Canada},
}

Endnote

%0 Conference Proceedings
%A Tran, Khanh Hiep
%A Ghazimatin, Azin
%A Saha Roy, Rishiraj
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Counterfactual Explanations for Neural Recommenders : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-5140-4
%R 10.1145/3404835.3463005
%D 2021
%B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2021-07-11 - 2021-07-15
%C Virtual Event, Canada
%B SIGIR '21
%E Diaz, Fernando; Shah, Chirag; Suel, Torsten; Castells, Pablo; Jones, Rosie; Sakai, Tetsuya; Bellogin, Alejandro; Yushioka, Massaharu
%P 1627 - 1631
%I ACM

Article

G. Weikum

“Knowledge Graphs 2021: A Data Odyssey,” Proceedings of the VLDB Endowment (Proc. VLDB 2021), vol. 14, no. 12, 2021.

mehr

BibTeX

@article{Weikum2021_PVLDB,
TITLE = {Knowledge Graphs 2021: {A} Data Odyssey},
AUTHOR = {Weikum, Gerhard},
LANGUAGE = {eng},
PUBLISHER = {VLDB Endowment Inc.},
YEAR = {2021},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {14},
NUMBER = {12},
PAGES = {3233--3238},
BOOKTITLE = {Proceedings of the 47th International Conference on Very Large Data Bases (VLDB 2021)},
EDITOR = {Dong, Xin Luna and Naumann, Felix},
}

Endnote

%0 Journal Article
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Graphs 2021: A Data Odyssey : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-631F-6
%7 2021
%D 2021
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 14
%N 12
%& 3233
%P 3233 - 3238
%I VLDB Endowment Inc.
%B Proceedings of the 47th International Conference on Very Large Data Bases
%O VLDB 2021 Copenhagen, Denmark, 16-20 August 2021

Article

G. Weikum, L. Dong, S. Razniewski, and F. Suchanek

“Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases,” Foundations and Trends in Databases, vol. 10, no. 2–4, 2021.

mehr

BibTeX

@article{Weikum10.1561/1900000064,
TITLE = {Machine Knowledge: {C}reation and Curation of Comprehensive Knowledge Bases},
AUTHOR = {Weikum, Gerhard and Dong, Luna and Razniewski, Simon and Suchanek, Fabian},
LANGUAGE = {eng},
ISSN = {1931-7883},
ISBN = {978-1-68083-836-7},
DOI = {10.1561/1900000064},
PUBLISHER = {Now Publishers},
ADDRESS = {Boston},
YEAR = {2021},
DATE = {2021},
JOURNAL = {Foundations and Trends in Databases},
VOLUME = {10},
NUMBER = {2-4},
PAGES = {108--490},
}

Endnote

%0 Journal Article
%A Weikum, Gerhard
%A Dong, Luna
%A Razniewski, Simon
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6317-E
%R 10.1561/1900000064
%@ 978-1-68083-836-7
%7 2021
%D 2021
%J Foundations and Trends in Databases
%V 10
%N 2-4
%& 108
%P 108 - 490
%I Now Publishers
%C Boston
%@ false

Conference paper

A. Yates, R. Nogueira, and J. Lin

“Pretrained Transformers for Text Ranking: BERT and Beyond,” in SIGIR ’21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021.

mehr

BibTeX

@inproceedings{Yates_SIGIR21,
TITLE = {Pretrained Transformers for Text Ranking: {BERT} and Beyond},
AUTHOR = {Yates, Andrew and Nogueira, Rodrigo and Lin, Jimmy},
LANGUAGE = {eng},
ISBN = {978-1-4503-8037-9},
DOI = {10.1145/3404835.3462812},
PUBLISHER = {ACM},
YEAR = {2021},
BOOKTITLE = {SIGIR '21, 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Diaz, Fernando and Shah, Chirag and Suel, Torsten and Castells, Pablo and Jones, Rosie and Sakai, Tetsuya and Bellog{\'i}n, Alejandro and Yushioka, Massaharu},
PAGES = {2666--2668},
ADDRESS = {Virtual Event, Canada},
}

Endnote

%0 Conference Proceedings
%A Yates, Andrew
%A Nogueira, Rodrigo
%A Lin, Jimmy
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Pretrained Transformers for Text Ranking: BERT and Beyond : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6674-2
%R 10.1145/3404835.3462812
%D 2021
%B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2021-07-11 - 2021-07-15
%C Virtual Event, Canada
%B SIGIR '21
%E Diaz, Fernando; Shah, Chirag; Suel, Torsten; Castells, Pablo; Jones, Rosie; Sakai, Tetsuya; Bellog&#237;n, Alejandro; Yushioka, Massaharu
%P 2666 - 2668
%I ACM
%@ 978-1-4503-8037-9

Conference paper

X. Zhang, A. Yates, and J. Lin

“Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers,” in Advances in Information Retrieval (ECIR 2021), Lucca, Italy (Online Event), 2021.

mehr

BibTeX

@inproceedings{Zhang_ECIR2021,
TITLE = {Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers},
AUTHOR = {Zhang, Xinyu and Yates, Andrew and Lin, Jimmy},
LANGUAGE = {eng},
ISBN = {978-3-030-72239-5},
DOI = {10.1007/978-3-030-72240-1_11},
PUBLISHER = {Springer},
YEAR = {2021},
DATE = {2021},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2021)},
EDITOR = {Hiemstra, Djoerd and Moens, Marie-Francine and Mothe, Josiane and Perego, Raffaele and Potthast, Martin and Sebastiani, Fabrizio},
PAGES = {150--163},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12657},
ADDRESS = {Lucca, Italy (Online Event)},
}

Endnote

%0 Conference Proceedings
%A Zhang, Xinyu
%A Yates, Andrew
%A Lin, Jimmy
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-6614-E
%R 10.1007/978-3-030-72240-1_11
%D 2021
%B 43rd European Conference on IR Research
%Z date of event: 2021-03-28 - 2021-04-01
%C Lucca, Italy (Online Event)
%B Advances in Information Retrieval
%E Hiemstra, Djoerd; Moens, Marie-Francine; Mothe, Josiane; Perego, Raffaele; Potthast, Martin; Sebastiani, Fabrizio
%P 150 - 163
%I Springer
%@ 978-3-030-72239-5
%B Lecture Notes in Computer Science
%N 12657

Conference paper

X. Zhang, J. Xin, A. Yates, and J. Lin

“Bag-of-Words Baselines for Semantic Code Search,” in The 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), Bangkog, Thailand (Online), 2021.

mehr

BibTeX

@inproceedings{Zhang_NLP4Prog2021,
TITLE = {Bag-of-Words Baselines for Semantic Code Search},
AUTHOR = {Zhang, Xinyu and Xin, Ji and Yates, Andrew and Lin, Jimmy},
LANGUAGE = {eng},
ISBN = {978-1-954085-64-0},
URL = {https://aclanthology.org/2021.nlp4prog-1.0},
PUBLISHER = {ACL},
YEAR = {2021},
BOOKTITLE = {The 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021)},
EDITOR = {Lachmy, Royi and Yao, Ziyu and Durrett, Greg and Gligoric, Milos and Li, Junyi Jessy and Mooney, Ray and Neubig, Graham and Su, Yu and Sun, Huan and Tsarfaty, Reut},
PAGES = {88--94},
ADDRESS = {Bangkog, Thailand (Online)},
}

Endnote

%0 Conference Proceedings
%A Zhang, Xinyu
%A Xin, Ji
%A Yates, Andrew
%A Lin, Jimmy
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Bag-of-Words Baselines for Semantic Code Search : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-619E-8
%U https://aclanthology.org/2021.nlp4prog-1.0
%D 2021
%B 1st Workshop on Natural Language Processing for Programming
%Z date of event: 2021-08-06 - 2021-08-06
%C Bangkog, Thailand (Online)
%B The 1st Workshop on Natural Language Processing for Programming
%E Lachmy, Royi; Yao, Ziyu; Durrett, Greg; Gligoric, Milos; Li, Junyi Jessy; Mooney, Ray; Neubig, Graham; Su, Yu; Sun, Huan; Tsarfaty, Reut
%P 88 - 94
%I ACL
%@ 978-1-954085-64-0

Article

Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates

“Contextualized Query Expansion via Unsupervised Chunk Selection for Text Retrieval,” Information Processing & Management, vol. 58, no. 5, 2021.

mehr

BibTeX

@article{Zheng2021,
TITLE = {Contextualized Query Expansion via Unsupervised Chunk Selection for Text Retrieval},
AUTHOR = {Zheng, Zhi and Hui, Kai and He, Ben and Han, Xianpei and Sun, Le and Yates, Andrew},
LANGUAGE = {eng},
ISSN = {0306-4573},
DOI = {10.1016/j.ipm.2021.102672},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2021},
DATE = {2021},
JOURNAL = {Information Processing \& Management},
VOLUME = {58},
NUMBER = {5},
EID = {102672},
}

Endnote

%0 Journal Article
%A Zheng, Zhi
%A Hui, Kai
%A He, Ben
%A Han, Xianpei
%A Sun, Le
%A Yates, Andrew
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Contextualized Query Expansion via Unsupervised Chunk Selection for Text Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0009-4747-8
%R 10.1016/j.ipm.2021.102672
%7 2021
%D 2021
%J Information Processing & Management
%V 58
%N 5
%Z sequence number: 102672
%I Elsevier
%C Amsterdam
%@ false

2020

Paper

H. Arnaout, S. Razniewski, and G. Weikum

“Negative Statements Considered Useful,” 2020. [Online]. Available: http://arxiv.org/abs/2001.04425.

mehr

Abstract

Knowledge bases (KBs), pragmatic collections of knowledge about notable
entities, are an important asset in applications such as search, question
answering and dialogue. Rooted in a long tradition in knowledge representation,
all popular KBs only store positive information, while they abstain from taking
any stance towards statements not contained in them.
In this paper, we make the case for explicitly stating interesting statements
which are not true. Negative statements would be important to overcome current
limitations of question answering, yet due to their potential abundance, any
effort towards compiling them needs a tight coupling with ranking. We introduce
two approaches towards compiling negative statements. (i) In peer-based
statistical inferences, we compare entities with highly related entities in
order to derive potential negative statements, which we then rank using
supervised and unsupervised features. (ii) In query-log-based text extraction,
we use a pattern-based approach for harvesting search engine query logs.
Experimental results show that both approaches hold promising and complementary
potential. Along with this paper, we publish the first datasets on interesting
negative information, containing over 1.1M statements for 100K popular Wikidata
entities.

BibTeX

@online{Arnaout_arXiv2001.04425,
TITLE = {Negative Statements Considered Useful},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/2001.04425},
EPRINT = {2001.04425},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Knowledge bases (KBs), pragmatic collections of knowledge about notable<br>entities, are an important asset in applications such as search, question<br>answering and dialogue. Rooted in a long tradition in knowledge representation,<br>all popular KBs only store positive information, while they abstain from taking<br>any stance towards statements not contained in them.<br> In this paper, we make the case for explicitly stating interesting statements<br>which are not true. Negative statements would be important to overcome current<br>limitations of question answering, yet due to their potential abundance, any<br>effort towards compiling them needs a tight coupling with ranking. We introduce<br>two approaches towards compiling negative statements. (i) In peer-based<br>statistical inferences, we compare entities with highly related entities in<br>order to derive potential negative statements, which we then rank using<br>supervised and unsupervised features. (ii) In query-log-based text extraction,<br>we use a pattern-based approach for harvesting search engine query logs.<br>Experimental results show that both approaches hold promising and complementary<br>potential. Along with this paper, we publish the first datasets on interesting<br>negative information, containing over 1.1M statements for 100K popular Wikidata<br>entities.<br>},
}

Endnote

%0 Report
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Negative Statements Considered Useful : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-821F-6
%U http://arxiv.org/abs/2001.04425
%D 2020
%X   Knowledge bases (KBs), pragmatic collections of knowledge about notable<br>entities, are an important asset in applications such as search, question<br>answering and dialogue. Rooted in a long tradition in knowledge representation,<br>all popular KBs only store positive information, while they abstain from taking<br>any stance towards statements not contained in them.<br>  In this paper, we make the case for explicitly stating interesting statements<br>which are not true. Negative statements would be important to overcome current<br>limitations of question answering, yet due to their potential abundance, any<br>effort towards compiling them needs a tight coupling with ranking. We introduce<br>two approaches towards compiling negative statements. (i) In peer-based<br>statistical inferences, we compare entities with highly related entities in<br>order to derive potential negative statements, which we then rank using<br>supervised and unsupervised features. (ii) In query-log-based text extraction,<br>we use a pattern-based approach for harvesting search engine query logs.<br>Experimental results show that both approaches hold promising and complementary<br>potential. Along with this paper, we publish the first datasets on interesting<br>negative information, containing over 1.1M statements for 100K popular Wikidata<br>entities.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Databases, cs.DB

Conference paper

H. Arnaout, S. Razniewski, and G. Weikum

“Enriching Knowledge Bases with Interesting Negative Statements,” in Automated Knowledge Base Construction (AKBC 2020), Virtual Conference, 2020.

mehr

BibTeX

@inproceedings{Arnaout_AKBC2020,
TITLE = {Enriching Knowledge Bases with Interesting Negative Statements},
AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.24432/C5101K},
PUBLISHER = {OpenReview},
YEAR = {2020},
BOOKTITLE = {Automated Knowledge Base Construction (AKBC 2020)},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Arnaout, Hiba
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Enriching Knowledge Bases with Interesting Negative Statements : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EBC9-E
%R 10.24432/C5101K
%D 2020
%B 2nd Conference on Automated Knowledge Base Construction
%Z date of event: 2020-06-22 - 2020-06-24
%C Virtual Conference
%B Automated Knowledge Base Construction
%I OpenReview
%U https://openreview.net/forum?id=pSLmyZKaS

Proceedings

K. Balog, V. Setty, C. Lioma, Y. Liu, M. Zhang, and K. Berberich

Eds., ICTIR ’20. ACM, 2020.

mehr

BibTeX

@proceedings{Balog_ICTIR20,
TITLE = {ICTIR '20, ACM SIGIR International Conference on Theory of Information Retrieval},
EDITOR = {Balog, Krisztian and Setty, Vinay and Lioma, Christina and Liu, Yiqun and Zhang, Min and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-8067-6},
DOI = {10.1145/3409256},
PUBLISHER = {ACM},
YEAR = {2020},
ADDRESS = {Virtual Event, Norway},
}

Endnote

%0 Conference Proceedings
%E Balog, Krisztian
%E Setty, Vinay
%E Lioma, Christina
%E Liu, Yiqun
%E Zhang, Min
%E Berberich, Klaus
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ICTIR '20 : Proceedings of the 2020 ACM SIGIR      
International Conference on Theory 
of Information Retrieval 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-041D-4
%R 10.1145/3409256
%@ 978-1-4503-8067-6
%I ACM
%D 2020
%B ACM SIGIR International Conference on Theory of Information Retrieval 
%Z date of event: 2020-09-14 - 2020-09-17
%D 2020
%C Virtual Event, Norway

Conference paper

C. Belth, X. Zheng, J. Vreeken, and D. Koutra

“What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization,” in Proceedings of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.

mehr

BibTeX

@inproceedings{belth:20:kgist,
TITLE = {What is Normal, What is Strange, and What is Missing in a Knowledge Graph: {U}nified Characterization via Inductive Summarization},
AUTHOR = {Belth, Caleb and Zheng, Xinyi and Vreeken, Jilles and Koutra, Danai},
LANGUAGE = {eng},
ISBN = {978-1-4503-7023-3},
DOI = {10.1145/3366423.3380189},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2020)},
EDITOR = {Huang, Yennun and King, Irwin and Liu, Tie-Yan and van Steen, Maarten},
PAGES = {1115--1126},
ADDRESS = {Taipei, Taiwan},
}

Endnote

%0 Conference Proceedings
%A Belth, Caleb
%A Zheng, Xinyi
%A Vreeken, Jilles
%A Koutra, Danai
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-253F-9
%R 10.1145/3366423.3380189
%D 2020
%B The World Wide Web Conference
%Z date of event: 2020-04-20 - 2020-04-24
%C Taipei, Taiwan
%B Proceedings of The World Wide Web Conference
%E Huang, Yennun; King, Irwin; Liu, Tie-Yan; van Steen, Maarten
%P 1115 - 1126
%I ACM
%@ 978-1-4503-7023-3

Paper

J. J. Benjamin, C. Müller-Birn, and S. Razniewski

“Examining the Impact of Algorithm Awareness on Wikidata’s Recommender System Recoin,” 2020. [Online]. Available: https://arxiv.org/abs/2009.09049.

mehr

Abstract

The global infrastructure of the Web, designed as an open and transparent
system, has a significant impact on our society. However, algorithmic systems
of corporate entities that neglect those principles increasingly populated the
Web. Typical representatives of these algorithmic systems are recommender
systems that influence our society both on a scale of global politics and
during mundane shopping decisions. Recently, such recommender systems have come
under critique for how they may strengthen existing or even generate new kinds
of biases. To this end, designers and engineers are increasingly urged to make
the functioning and purpose of recommender systems more transparent. Our
research relates to the discourse of algorithm awareness, that reconsiders the
role of algorithm visibility in interface design. We conducted online
experiments with 105 participants using MTurk for the recommender system
Recoin, a gadget for Wikidata. In these experiments, we presented users with
one of a set of three different designs of Recoin's user interface, each of
them exhibiting a varying degree of explainability and interactivity. Our
findings include a positive correlation between comprehension of and trust in
an algorithmic system in our interactive redesign. However, our results are not
conclusive yet, and suggest that the measures of comprehension, fairness,
accuracy and trust are not yet exhaustive for the empirical study of algorithm
awareness. Our qualitative insights provide a first indication for further
measures. Our study participants, for example, were less concerned with the
details of understanding an algorithmic calculation than with who or what is
judging the result of the algorithm.

BibTeX

@online{Benjamin2009.09049,
TITLE = {Examining the Impact of Algorithm Awareness on {W}ikidata's Recommender System Recoin},
AUTHOR = {Benjamin, Jesse Josua and M{\"u}ller-Birn, Claudia and Razniewski, Simon},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2009.09049},
EPRINT = {2009.09049},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {The global infrastructure of the Web, designed as an open and transparent<br>system, has a significant impact on our society. However, algorithmic systems<br>of corporate entities that neglect those principles increasingly populated the<br>Web. Typical representatives of these algorithmic systems are recommender<br>systems that influence our society both on a scale of global politics and<br>during mundane shopping decisions. Recently, such recommender systems have come<br>under critique for how they may strengthen existing or even generate new kinds<br>of biases. To this end, designers and engineers are increasingly urged to make<br>the functioning and purpose of recommender systems more transparent. Our<br>research relates to the discourse of algorithm awareness, that reconsiders the<br>role of algorithm visibility in interface design. We conducted online<br>experiments with 105 participants using MTurk for the recommender system<br>Recoin, a gadget for Wikidata. In these experiments, we presented users with<br>one of a set of three different designs of Recoin's user interface, each of<br>them exhibiting a varying degree of explainability and interactivity. Our<br>findings include a positive correlation between comprehension of and trust in<br>an algorithmic system in our interactive redesign. However, our results are not<br>conclusive yet, and suggest that the measures of comprehension, fairness,<br>accuracy and trust are not yet exhaustive for the empirical study of algorithm<br>awareness. Our qualitative insights provide a first indication for further<br>measures. Our study participants, for example, were less concerned with the<br>details of understanding an algorithmic calculation than with who or what is<br>judging the result of the algorithm.<br>},
}

Endnote

%0 Report
%A Benjamin, Jesse Josua
%A M&#252;ller-Birn, Claudia
%A Razniewski, Simon
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Examining the Impact of Algorithm Awareness on Wikidata's Recommender System Recoin : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0661-4
%U https://arxiv.org/abs/2009.09049
%D 2020
%X   The global infrastructure of the Web, designed as an open and transparent<br>system, has a significant impact on our society. However, algorithmic systems<br>of corporate entities that neglect those principles increasingly populated the<br>Web. Typical representatives of these algorithmic systems are recommender<br>systems that influence our society both on a scale of global politics and<br>during mundane shopping decisions. Recently, such recommender systems have come<br>under critique for how they may strengthen existing or even generate new kinds<br>of biases. To this end, designers and engineers are increasingly urged to make<br>the functioning and purpose of recommender systems more transparent. Our<br>research relates to the discourse of algorithm awareness, that reconsiders the<br>role of algorithm visibility in interface design. We conducted online<br>experiments with 105 participants using MTurk for the recommender system<br>Recoin, a gadget for Wikidata. In these experiments, we presented users with<br>one of a set of three different designs of Recoin's user interface, each of<br>them exhibiting a varying degree of explainability and interactivity. Our<br>findings include a positive correlation between comprehension of and trust in<br>an algorithmic system in our interactive redesign. However, our results are not<br>conclusive yet, and suggest that the measures of comprehension, fairness,<br>accuracy and trust are not yet exhaustive for the empirical study of algorithm<br>awareness. Our qualitative insights provide a first indication for further<br>measures. Our study participants, for example, were less concerned with the<br>details of understanding an algorithmic calculation than with who or what is<br>judging the result of the algorithm.<br>
%K Computer Science, Human-Computer Interaction, cs.HC,Computer Science, Computers and Society, cs.CY,Computer Science, Digital Libraries, cs.DL

Proceedings

A. Bhattacharya, S. Natarajan, and R. Saha Roy

Eds., Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. ACM, 2020.

mehr

BibTeX

@proceedings{SahaRoy_CoDSCOMAD20,
TITLE = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020)},
EDITOR = {Bhattacharya, Arnab and Natarajan, Sriaam and Saha Roy, Rishiraj},
LANGUAGE = {eng},
ISBN = {978-1-4503-7738-6},
DOI = {10.1145/3371158},
PUBLISHER = {ACM},
YEAR = {2020},
ADDRESS = {Hyderabad, India},
}

Endnote

%0 Conference Proceedings
%E Bhattacharya, Arnab
%E Natarajan, Sriaam
%E Saha Roy, Rishiraj
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Proceedings of the 7th ACM IKDD CoDS and 25th COMAD : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-09CF-6
%R 10.1145/3371158
%@ 978-1-4503-7738-6
%I ACM
%D 2020
%B ACM India Joint International Conferenceon Data Science and Management of Data 
%Z date of event: 2020-01-05 - 2020-01-07
%D 2020
%C Hyderabad, India

Conference paper

A. J. Biega, J. Schmidt, and R. Saha Roy

“Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions,” in Advances in Information Retrieval (ECIR 2020), Lisbon, Portugal, 2020.

mehr

BibTeX

@inproceedings{Biega_ECIR2020,
TITLE = {Towards Query Logs for Privacy Studies: {O}n Deriving Search Queries from Questions},
AUTHOR = {Biega, Asia J. and Schmidt, Jana and Saha Roy, Rishiraj},
LANGUAGE = {eng},
ISBN = {978-3-030-45441-8},
DOI = {10.1007/978-3-030-45442-5_14},
PUBLISHER = {Springer},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2020)},
EDITOR = {Jose, Joemon M. and Yilmaz, Emine and Magalh{\~a}es, Jo{\~a}o and Castells, Pablo and Ferro, Nicola and Silva, M{\'a}rio J. and Martins, Fl{\'a}vio},
PAGES = {110--117},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12036},
ADDRESS = {Lisbon, Portugal},
}

Endnote

%0 Conference Proceedings
%A Biega, Asia J.
%A Schmidt, Jana
%A Saha Roy, Rishiraj
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-02FD-9
%R 10.1007/978-3-030-45442-5_14
%D 2020
%B 42nd European Conference on IR Research
%Z date of event: 2020-04-14 - 2020-04-17
%C Lisbon, Portugal
%B Advances in Information Retrieval
%E Jose, Joemon M.; Yilmaz, Emine; Magalh&#227;es, Jo&#227;o; Castells, Pablo; Ferro, Nicola; Silva, M&#225;rio J.; Martins, Fl&#225;vio
%P 110 - 117
%I Springer
%@ 978-3-030-45441-8
%B Lecture Notes in Computer Science
%N 12036

Paper

A. J. Biega, J. Schmidt, and R. Saha Roy

“Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions,” 2020. [Online]. Available: https://arxiv.org/abs/2004.02023.

mehr

Abstract

Translating verbose information needs into crisp search queries is a
phenomenon that is ubiquitous but hardly understood. Insights into this process
could be valuable in several applications, including synthesizing large
privacy-friendly query logs from public Web sources which are readily available
to the academic research community. In this work, we take a step towards
understanding query formulation by tapping into the rich potential of community
question answering (CQA) forums. Specifically, we sample natural language (NL)
questions spanning diverse themes from the Stack Exchange platform, and conduct
a large-scale conversion experiment where crowdworkers submit search queries
they would use when looking for equivalent information. We provide a careful
analysis of this data, accounting for possible sources of bias during
conversion, along with insights into user-specific linguistic patterns and
search behaviors. We release a dataset of 7,000 question-query pairs from this
study to facilitate further research on query understanding.

BibTeX

@online{Biega2004.02023,
TITLE = {Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions},
AUTHOR = {Biega, Asia J. and Schmidt, Jana and Saha Roy, Rishiraj},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2004.02023},
EPRINT = {2004.02023},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Translating verbose information needs into crisp search queries is a<br>phenomenon that is ubiquitous but hardly understood. Insights into this process<br>could be valuable in several applications, including synthesizing large<br>privacy-friendly query logs from public Web sources which are readily available<br>to the academic research community. In this work, we take a step towards<br>understanding query formulation by tapping into the rich potential of community<br>question answering (CQA) forums. Specifically, we sample natural language (NL)<br>questions spanning diverse themes from the Stack Exchange platform, and conduct<br>a large-scale conversion experiment where crowdworkers submit search queries<br>they would use when looking for equivalent information. We provide a careful<br>analysis of this data, accounting for possible sources of bias during<br>conversion, along with insights into user-specific linguistic patterns and<br>search behaviors. We release a dataset of 7,000 question-query pairs from this<br>study to facilitate further research on query understanding.<br>},
}

Endnote

%0 Report
%A Biega, Asia J.
%A Schmidt, Jana
%A Saha Roy, Rishiraj
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-09C7-E
%U https://arxiv.org/abs/2004.02023
%D 2020
%X   Translating verbose information needs into crisp search queries is a<br>phenomenon that is ubiquitous but hardly understood. Insights into this process<br>could be valuable in several applications, including synthesizing large<br>privacy-friendly query logs from public Web sources which are readily available<br>to the academic research community. In this work, we take a step towards<br>understanding query formulation by tapping into the rich potential of community<br>question answering (CQA) forums. Specifically, we sample natural language (NL)<br>questions spanning diverse themes from the Stack Exchange platform, and conduct<br>a large-scale conversion experiment where crowdworkers submit search queries<br>they would use when looking for equivalent information. We provide a careful<br>analysis of this data, accounting for possible sources of bias during<br>conversion, along with insights into user-specific linguistic patterns and<br>search behaviors. We release a dataset of 7,000 question-query pairs from this<br>study to facilitate further research on query understanding.<br>
%K Computer Science, Information Retrieval, cs.IR

Thesis

D5IMPR-CS

K. Budhathoki

“Causal Inference on Discrete Data,” Universität des Saarlandes, Saarbrücken, 2020.

mehr

BibTeX

@phdthesis{BudDiss_2020,
TITLE = {Causal Inference on Discrete Data},
AUTHOR = {Budhathoki, Kailash},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-329528},
DOI = {10.22028/D291-32952},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2020},
DATE = {2020},
}

Endnote

%0 Thesis
%A Budhathoki, Kailash
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%A referee: Heskes, Tom
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Causal Inference on Discrete Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FE73-A
%R 10.22028/D291-32952
%U urn:nbn:de:bsz:291--ds-329528
%F OTHER: hdl:20.500.11880/30501
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2020
%P 171 p.
%V phd
%9 phd
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30501

Conference paper

D. Calvanes, J. Corman, D. Lanti, and S. Razniewski

“Counting Query Answers over a DL-Lite Knowledge Base,” in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI 2020), Yokohama, Japan (Virtual), 2020.

mehr

Abstract

Counting answers to a query is an operation supported by virtually all
database management systems. In this paper we focus on counting answers over a
Knowledge Base (KB), which may be viewed as a database enriched with background
knowledge about the domain under consideration. In particular, we place our
work in the context of Ontology-Mediated Query Answering/Ontology-based Data
Access (OMQA/OBDA), where the language used for the ontology is a member of the
DL-Lite family and the data is a (usually virtual) set of assertions. We study
the data complexity of query answering, for different members of the DL-Lite
family that include number restrictions, and for variants of conjunctive
queries with counting that differ with respect to their shape (connected,
branching, rooted). We improve upon existing results by providing a PTIME and
coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case,
we define a novel query rewriting technique into first-order logic with
counting.

BibTeX

@inproceedings{RazniewskiIJCAI2020,
TITLE = {Counting Query Answers over a {$DL-Lite$} Knowledge Base},
AUTHOR = {Calvanes, Diego and Corman, Julien and Lanti, Davide and Razniewski, Simon},
LANGUAGE = {eng},
ISBN = {978-0-9992411-6-5},
DOI = {10.24963/ijcai.2020/230},
PUBLISHER = {IJCAI},
YEAR = {2021},
ABSTRACT = {Counting answers to a query is an operation supported by virtually all<br>database management systems. In this paper we focus on counting answers over a<br>Knowledge Base (KB), which may be viewed as a database enriched with background<br>knowledge about the domain under consideration. In particular, we place our<br>work in the context of Ontology-Mediated Query Answering/Ontology-based Data<br>Access (OMQA/OBDA), where the language used for the ontology is a member of the<br>DL-Lite family and the data is a (usually virtual) set of assertions. We study<br>the data complexity of query answering, for different members of the DL-Lite<br>family that include number restrictions, and for variants of conjunctive<br>queries with counting that differ with respect to their shape (connected,<br>branching, rooted). We improve upon existing results by providing a PTIME and<br>coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case,<br>we define a novel query rewriting technique into first-order logic with<br>counting.<br>},
BOOKTITLE = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI 2020)},
EDITOR = {Bessiere, Christian},
PAGES = {1658--1666},
ADDRESS = {Yokohama, Japan (Virtual)},
}

Endnote

%0 Conference Proceedings
%A Calvanes, Diego
%A Corman, Julien
%A Lanti, Davide
%A Razniewski, Simon
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Counting Query Answers over a DL-Lite Knowledge Base  : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-009E-6
%R 10.24963/ijcai.2020/230 
%D 2020
%B Twenty-Ninth International Joint Conference on Artificial Intelligence
%Z date of event: 2021-01-07 - 2021-01-15
%C Yokohama, Japan (Virtual)
%X   Counting answers to a query is an operation supported by virtually all<br>database management systems. In this paper we focus on counting answers over a<br>Knowledge Base (KB), which may be viewed as a database enriched with background<br>knowledge about the domain under consideration. In particular, we place our<br>work in the context of Ontology-Mediated Query Answering/Ontology-based Data<br>Access (OMQA/OBDA), where the language used for the ontology is a member of the<br>DL-Lite family and the data is a (usually virtual) set of assertions. We study<br>the data complexity of query answering, for different members of the DL-Lite<br>family that include number restrictions, and for variants of conjunctive<br>queries with counting that differ with respect to their shape (connected,<br>branching, rooted). We improve upon existing results by providing a PTIME and<br>coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case,<br>we define a novel query rewriting technique into first-order logic with<br>counting.<br>
%K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI
%B Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
%E Bessiere, Christian
%P 1658 - 1666
%I IJCAI
%@ 978-0-9992411-6-5

Paper

D. Calvanes, J. Corman, D. Lanti, and S. Razniewski

“Counting Query Answers over a DL-Lite Knowledge Base (extended version),” 2020. [Online]. Available: https://arxiv.org/abs/2005.05886.

mehr

Abstract

BibTeX

@online{Razniewskiarxiv2020,
TITLE = {Counting Query Answers over a {DL}-Lite Knowledge Base (extended version)},
AUTHOR = {Calvanes, Diego and Corman, Julien and Lanti, Davide and Razniewski, Simon},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2005.05886},
EPRINT = {2005.05886},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Counting answers to a query is an operation supported by virtually all<br>database management systems. In this paper we focus on counting answers over a<br>Knowledge Base (KB), which may be viewed as a database enriched with background<br>knowledge about the domain under consideration. In particular, we place our<br>work in the context of Ontology-Mediated Query Answering/Ontology-based Data<br>Access (OMQA/OBDA), where the language used for the ontology is a member of the<br>DL-Lite family and the data is a (usually virtual) set of assertions. We study<br>the data complexity of query answering, for different members of the DL-Lite<br>family that include number restrictions, and for variants of conjunctive<br>queries with counting that differ with respect to their shape (connected,<br>branching, rooted). We improve upon existing results by providing a PTIME and<br>coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case,<br>we define a novel query rewriting technique into first-order logic with<br>counting.<br>},
}

Endnote

%0 Report
%A Calvanes, Diego
%A Corman, Julien
%A Lanti, Davide
%A Razniewski, Simon
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Counting Query Answers over a DL-Lite Knowledge Base (extended version) : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FF5A-6
%U https://arxiv.org/abs/2005.05886
%D 2020
%X   Counting answers to a query is an operation supported by virtually all<br>database management systems. In this paper we focus on counting answers over a<br>Knowledge Base (KB), which may be viewed as a database enriched with background<br>knowledge about the domain under consideration. In particular, we place our<br>work in the context of Ontology-Mediated Query Answering/Ontology-based Data<br>Access (OMQA/OBDA), where the language used for the ontology is a member of the<br>DL-Lite family and the data is a (usually virtual) set of assertions. We study<br>the data complexity of query answering, for different members of the DL-Lite<br>family that include number restrictions, and for variants of conjunctive<br>queries with counting that differ with respect to their shape (connected,<br>branching, rooted). We improve upon existing results by providing a PTIME and<br>coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case,<br>we define a novel query rewriting technique into first-order logic with<br>counting.<br>
%K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI

Conference paper

D. Calvanese, J. Corman, D. Lanti, and S. Razniewski

“Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions,” in Proceedings of the 33rd International Workshop on Description Logics (DL 2020), Rhodes, Greece (Virtual Event), 2020.

mehr

BibTeX

@inproceedings{Calvanese_DL2020,
TITLE = {Rewriting Count Queries over {DL}-Lite {TBoxes} with Number Restrictions},
AUTHOR = {Calvanese, Diego and Corman, Julien and Lanti, Davide and Razniewski, Simon},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-2663/paper-7.pdf; urn:nbn:de:0074-2663-4},
PUBLISHER = {ceur-ws.org},
YEAR = {2020},
BOOKTITLE = {Proceedings of the 33rd International Workshop on Description Logics (DL 2020)},
EDITOR = {Borgwardt, Stefan and Meyer, Thomas},
EID = {7},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2663},
ADDRESS = {Rhodes, Greece (Virtual Event)},
}

Endnote

%0 Conference Proceedings
%A Calvanese, Diego
%A Corman, Julien
%A Lanti, Davide
%A Razniewski, Simon
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0606-B
%U http://ceur-ws.org/Vol-2663/paper-7.pdf
%D 2020
%B 33rd International Workshop on Description Logics
%Z date of event: 2020-09-12 - 2020-09-14
%C Rhodes, Greece (Virtual Event)
%B Proceedings of the 33rd International Workshop on Description Logics

%E Borgwardt , Stefan; Meyer, Thomas
%Z sequence number: 7
%I ceur-ws.org
%B CEUR Workshop Proceedings
%N 2663
%@ false

Paper

Y. Chalier, S. Razniewski, and G. Weikum

“Joint Reasoning for Multi-Faceted Commonsense Knowledge,” 2020. [Online]. Available: http://arxiv.org/abs/2001.04170.

mehr

Abstract

Commonsense knowledge (CSK) supports a variety of AI applications, from
visual understanding to chatbots. Prior works on acquiring CSK, such as
ConceptNet, have compiled statements that associate concepts, like everyday
objects or activities, with properties that hold for most or some instances of
the concept. Each concept is treated in isolation from other concepts, and the
only quantitative measure (or ranking) of properties is a confidence score that
the statement is valid. This paper aims to overcome these limitations by
introducing a multi-faceted model of CSK statements and methods for joint
reasoning over sets of inter-related statements. Our model captures four
different dimensions of CSK statements: plausibility, typicality, remarkability
and salience, with scoring and ranking along each dimension. For example,
hyenas drinking water is typical but not salient, whereas hyenas eating
carcasses is salient. For reasoning and ranking, we develop a method with soft
constraints, to couple the inference over concepts that are related in in a
taxonomic hierarchy. The reasoning is cast into an integer linear programming
(ILP), and we leverage the theory of reduction costs of a relaxed LP to compute
informative rankings. This methodology is applied to several large CSK
collections. Our evaluation shows that we can consolidate these inputs into
much cleaner and more expressive knowledge. Results are available at
dice.mpi-inf.mpg.de.

BibTeX

@online{Chalier_arXiv2001.04170,
TITLE = {Joint Reasoning for Multi-Faceted Commonsense Knowledge},
AUTHOR = {Chalier, Yohan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/2001.04170},
EPRINT = {2001.04170},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Commonsense knowledge (CSK) supports a variety of AI applications, from<br>visual understanding to chatbots. Prior works on acquiring CSK, such as<br>ConceptNet, have compiled statements that associate concepts, like everyday<br>objects or activities, with properties that hold for most or some instances of<br>the concept. Each concept is treated in isolation from other concepts, and the<br>only quantitative measure (or ranking) of properties is a confidence score that<br>the statement is valid. This paper aims to overcome these limitations by<br>introducing a multi-faceted model of CSK statements and methods for joint<br>reasoning over sets of inter-related statements. Our model captures four<br>different dimensions of CSK statements: plausibility, typicality, remarkability<br>and salience, with scoring and ranking along each dimension. For example,<br>hyenas drinking water is typical but not salient, whereas hyenas eating<br>carcasses is salient. For reasoning and ranking, we develop a method with soft<br>constraints, to couple the inference over concepts that are related in in a<br>taxonomic hierarchy. The reasoning is cast into an integer linear programming<br>(ILP), and we leverage the theory of reduction costs of a relaxed LP to compute<br>informative rankings. This methodology is applied to several large CSK<br>collections. Our evaluation shows that we can consolidate these inputs into<br>much cleaner and more expressive knowledge. Results are available at<br>https://dice.mpi-inf.mpg.de.<br>},
}

Endnote

%0 Report
%A Chalier, Yohan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Joint Reasoning for Multi-Faceted Commonsense Knowledge : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8226-D
%U http://arxiv.org/abs/2001.04170
%D 2020
%X   Commonsense knowledge (CSK) supports a variety of AI applications, from<br>visual understanding to chatbots. Prior works on acquiring CSK, such as<br>ConceptNet, have compiled statements that associate concepts, like everyday<br>objects or activities, with properties that hold for most or some instances of<br>the concept. Each concept is treated in isolation from other concepts, and the<br>only quantitative measure (or ranking) of properties is a confidence score that<br>the statement is valid. This paper aims to overcome these limitations by<br>introducing a multi-faceted model of CSK statements and methods for joint<br>reasoning over sets of inter-related statements. Our model captures four<br>different dimensions of CSK statements: plausibility, typicality, remarkability<br>and salience, with scoring and ranking along each dimension. For example,<br>hyenas drinking water is typical but not salient, whereas hyenas eating<br>carcasses is salient. For reasoning and ranking, we develop a method with soft<br>constraints, to couple the inference over concepts that are related in in a<br>taxonomic hierarchy. The reasoning is cast into an integer linear programming<br>(ILP), and we leverage the theory of reduction costs of a relaxed LP to compute<br>informative rankings. This methodology is applied to several large CSK<br>collections. Our evaluation shows that we can consolidate these inputs into<br>much cleaner and more expressive knowledge. Results are available at<br>https://dice.mpi-inf.mpg.de.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR

Conference paper

Y. Chalier, S. Razniewski, and G. Weikum

“Joint Reasoning for Multi-Faceted Commonsense Knowledge,” in Automated Knowledge Base Construction (AKBC 2020), Virtual Conference, 2020.

mehr

BibTeX

@inproceedings{Chalier_AKBC2020,
TITLE = {Joint Reasoning for Multi-Faceted Commonsense Knowledge},
AUTHOR = {Chalier, Yohan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.24432/C58G6G},
PUBLISHER = {OpenReview},
YEAR = {2020},
BOOKTITLE = {Automated Knowledge Base Construction (AKBC 2020)},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Chalier, Yohan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Joint Reasoning for Multi-Faceted Commonsense Knowledge : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EBCF-8
%R 10.24432/C58G6G
%D 2020
%B 2nd Conference on Automated Knowledge Base Construction
%Z date of event: 2020-06-22 - 2020-06-24
%C Virtual Conference
%B Automated Knowledge Base Construction
%I OpenReview
%U https://openreview.net/forum?id=QnPV72SZVt

Conference paper

Y. Chalier, S. Razniewski, and G. Weikum

“Dice: A Joint Reasoning Framework for Multi-Faceted Commonsense Knowledge,” in ISWC 2020 Posters, Demos, and Industry Tracks, Globally Online, 2020.

mehr

BibTeX

@inproceedings{Chalier_ISCW20,
TITLE = {Dice: {A} Joint Reasoning Framework for Multi-Faceted Commonsense Knowledge},
AUTHOR = {Chalier, Yohan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-2721/paper482.pdf; urn:nbn:de:0074-2721-6},
PUBLISHER = {ceur-ws.org},
YEAR = {2020},
BOOKTITLE = {ISWC 2020 Posters, Demos, and Industry Tracks},
EDITOR = {Taylor, Kerry and Goncalves, Rafael and Lecue, Freddy and Yan, Jun},
PAGES = {16--20},
EID = {482},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2721},
ADDRESS = {Globally Online},
}

Endnote

%0 Conference Proceedings
%A Chalier, Yohan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Dice: A Joint Reasoning Framework for Multi-Faceted Commonsense Knowledge : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F132-0
%U http://ceur-ws.org/Vol-2721/paper482.pdf
%D 2020
%B 19th Internatinal Semantic Web Conference
%Z date of event: 2020-11-01 - 2020-11-06
%C Globally Online
%B ISWC 2020 Posters, Demos, and Industry Tracks
%E Taylor, Kerry; Goncalves, Rafael; Lecue, Freddy; Yan, Jun
%P 16 - 20
%Z sequence number: 482
%I ceur-ws.org
%B CEUR Workshop Proceedings
%N 2721
%@ false
%U http://ceur-ws.org/Vol-2721/paper482.pdf

Conference paper

E. Chang, J. Caplinger, A. Marin, X. Shen, and V. Demberg

“DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool,” in The 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain (Online), 2020.

mehr

BibTeX

@inproceedings{chang2020dart,
TITLE = {{DART}: {A} Lightweight Quality-Suggestive Data-to-Text Annotation Tool},
AUTHOR = {Chang, Ernie and Caplinger, Jeriah and Marin, Alex and Shen, Xiaoyu and Demberg, Vera},
LANGUAGE = {eng},
ISBN = {978-1-952148-28-6},
URL = {https://www.aclweb.org/anthology/2020.coling-demos.3},
DOI = {10.18653/v1/2020.coling-demos.3},
PUBLISHER = {ACL},
YEAR = {2020},
BOOKTITLE = {The 28th International Conference on Computational Linguistics (COLING 2020)},
EDITOR = {Ptaszynski, Michal and Ziolko, Bartosz},
PAGES = {12--17},
ADDRESS = {Barcelona, Spain (Online)},
}

Endnote

%0 Conference Proceedings
%A Chang, Ernie
%A Caplinger, Jeriah
%A Marin, Alex
%A Shen, Xiaoyu
%A Demberg, Vera
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-149C-2
%U https://www.aclweb.org/anthology/2020.coling-demos.3
%R 10.18653/v1/2020.coling-demos.3
%D 2020
%B The 28th International Conferenceon Computational Linguistics
%Z date of event: 2020-12-08 - 2020-12-13
%C Barcelona, Spain (Online)
%B The 28th International Conference on Computational Linguistics
%E Ptaszynski, Michal; Ziolko, Bartosz
%P 12 - 17
%I ACL
%@ 978-1-952148-28-6

Conference paper

C. X. Chu, S. Razniewski, and G. Weikum

“ENTYFI: A System for Fine-grained Entity Typing in Fictional Texts,” in The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Online, 2020.

mehr

BibTeX

@inproceedings{Chu_EMNLP20,
TITLE = {{ENTYFI}: {A} System for Fine-grained Entity Typing in Fictional Texts},
AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-952148-62-0},
URL = {https://www.aclweb.org/anthology/2020.emnlp-demos.14/},
DOI = {10.18653/v1/2020.emnlp-demos.14},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)},
EDITOR = {Liu, Qun and Schlangen, David},
PAGES = {100--106},
ADDRESS = {Online},
}

Endnote

%0 Conference Proceedings
%A Chu, Cuong Xuan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ENTYFI: A System for Fine-grained Entity Typing in Fictional Texts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EED5-D
%U https://www.aclweb.org/anthology/2020.emnlp-demos.14/
%R 10.18653/v1/2020.emnlp-demos.14
%D 2020
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2020-11-16 - 2020-11-20
%C Online
%B The 2020 Conference on Empirical Methods in Natural Language Processing
%E Liu, Qun; Schlangen, David
%P 100 - 106
%I ACM
%@ 978-1-952148-62-0
%U https://www.aclweb.org/anthology/2020.emnlp-demos.14.pdf

Conference paper

C. X. Chu, S. Razniewski, and G. Weikum

“ENTYFI: Entity Typing in Fictional Texts,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.

mehr

BibTeX

@inproceedings{ChuWSDM2020,
TITLE = {{ENTYFI}: {E}ntity Typing in Fictional Texts},
AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450368223},
DOI = {10.1145/3336191.3371808},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining},
EDITOR = {Caverlee, James and Hu, Xia Ben},
PAGES = {124--132},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Chu, Cuong Xuan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ENTYFI: Entity Typing in Fictional Texts  : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-A27E-6
%R 10.1145/3336191.3371808
%D 2020
%B 13th International Conference on Web Search and Data Mining
%Z date of event: 2020-02-03 - 2020-02-07
%C Houston, TX, USA
%B WSDM '20
%E Caverlee, James; Hu, Xia Ben
%P 124 - 132
%I ACM
%@ 9781450368223

Conference paper

S. Dalleiger and J. Vreeken

“Explainable Data Decompositions,” in AAAI Technical Track: Machine Learning, New York, NY, USA, 2020.

mehr

BibTeX

@inproceedings{dalleiger:20:disc,
TITLE = {Explainable Data Decompositions},
AUTHOR = {Dalleiger, Sebastian and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-57735-835-0},
DOI = {10.1609/aaai.v34i04.5780},
PUBLISHER = {AAAI},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {AAAI Technical Track: Machine Learning},
PAGES = {3709--3716},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Dalleiger, Sebastian
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Explainable Data Decompositions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-2559-B
%R 10.1609/aaai.v34i04.5780
%D 2020
%B Thirty-Fourth AAAI Conference on Artificial Intelligence
%Z date of event: 2020-02-07 - 2020-02-12
%C New York, NY, USA
%B AAAI Technical Track: Machine Learning
%P 3709 - 3716
%I AAAI
%@ 978-1-57735-835-0

Conference paper

S. Dalleiger and J. Vreeken

“The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery,” in 20th IEEE International Conference on Data Mining (ICDM 2020), Virtual Conference, 2020.

mehr

BibTeX

@inproceedings{dalleiger:20:reaper,
TITLE = {The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery},
AUTHOR = {Dalleiger, Sebastian and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-7281-8316-9},
DOI = {10.1109/ICDM50108.2020.00112},
PUBLISHER = {IEEE},
YEAR = {2020},
BOOKTITLE = {20th IEEE International Conference on Data Mining (ICDM 2020)},
EDITOR = {Plant, Claudia and Wang, Haixun and Cuzzocrea, Alfredo and Zaniolo, Carlo and Wu, Xidong},
PAGES = {978--983},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Dalleiger, Sebastian
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-254E-8
%R 10.1109/ICDM50108.2020.00112
%D 2020
%B 20th IEEE International Conference on Data Mining 
%Z date of event: 2020-11-17 - 2020-11-20
%C Virtual Conference
%B 20th IEEE International Conference on Data Mining 
%E Plant, Claudia; Wang, Haixun; Cuzzocrea, Alfredo; Zaniolo, Carlo; Wu, Xidong
%P 978 - 983
%I IEEE
%@ 978-1-7281-8316-9

Article

F. Darari, W. Nutt, S. Razniewski, and S. Rudolph

“Completeness and soundness guarantees for conjunctive SPARQL queries over RDF data sources with completeness statements,” Semantic Web, vol. 11, no. 1, 2020.

mehr

BibTeX

@article{Darari2020,
TITLE = {Completeness and soundness guarantees for conjunctive {SPARQL} queries over {RDF} data sources with completeness statements},
AUTHOR = {Darari, Fariza and Nutt, Werner and Razniewski, Simon and Rudolph, Sebastian},
LANGUAGE = {eng},
ISSN = {1570-0844},
DOI = {10.3233/SW-190344},
PUBLISHER = {IOS Press},
ADDRESS = {Amsterdam},
YEAR = {2020},
DATE = {2020},
JOURNAL = {Semantic Web},
VOLUME = {11},
NUMBER = {1},
PAGES = {441--482},
}

Endnote

%0 Journal Article
%A Darari, Fariza
%A Nutt, Werner
%A Razniewski, Simon
%A Rudolph, Sebastian
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Completeness and soundness guarantees for conjunctive SPARQL queries over RDF data sources with completeness statements : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-9A06-6
%R 10.3233/SW-190344
%7 2020
%D 2020
%J Semantic Web
%V 11
%N 1
%& 441
%P 441 - 482
%I IOS Press
%C Amsterdam
%@ false

Conference paper

J. Fischer and J. Vreeken

“Sets of Robust Rules, and How to Find Them,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019), Würzburg, Germany, 2020.

mehr

BibTeX

@inproceedings{fischer:19:grab,
TITLE = {Sets of Robust Rules, and How to Find Them},
AUTHOR = {Fischer, Jonas and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-030-46150-8},
DOI = {10.1007/978-3-030-46150-8_3},
PUBLISHER = {Springer},
YEAR = {2019},
DATE = {2020},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)},
PAGES = {38--54},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {11906},
ADDRESS = {W{\"u}rzburg, Germany},
}

Endnote

%0 Conference Proceedings
%A Fischer, Jonas
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Sets of Robust Rules, and How to Find Them : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FEAE-8
%R 10.1007/978-3-030-46150-8_3
%D 2020
%B European Conference on Machine Learning and Knowledge Discovery in Databases
%Z date of event: 2019-09-19 - 2019-09-20
%C W&#252;rzburg, Germany
%B Machine Learning and Knowledge Discovery in Databases
%P 38 - 54
%I Springer
%@ 978-3-030-46150-8
%B Lecture Notes in Artificial Intelligence
%N 11906

Conference paper

J. Fischer and J. Vreeken

“Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity,” in KDD ’20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, USA, 2020.

mehr

BibTeX

@inproceedings{fischer:20:mexican,
TITLE = {Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity},
AUTHOR = {Fischer, Jonas and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-4503-7998-4},
DOI = {10.1145/3394486.3403124},
PUBLISHER = {ACM},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {KDD '20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
EDITOR = {Gupta, Rajesh and Liu, Yan and Tang, Jilaiang and Prakash, B. Aditya},
PAGES = {813--823},
ADDRESS = {Virtual Event, USA},
}

Endnote

%0 Conference Proceedings
%A Fischer, Jonas
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FEA5-1
%R 10.1145/3394486.3403124
%D 2020
%B 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%Z date of event: 2020-08-23 - 2020-08-27
%C Virtual Event, USA
%B KDD '20
%E Gupta, Rajesh; Liu, Yan; Tang, Jilaiang; Prakash, B. Aditya
%P 813 - 823
%I ACM
%@ 978-1-4503-7998-4

Conference paper

M. H. Gad-Elrab, D. Stepanova, T.-K. Tran, H. Adel, and G. Weikum

“ExCut: Explainable Embedding-Based Clustering over Knowledge Graphs,” in The Semantic Web -- ISWC 2020, Athens, Greece (Virtual Conference), 2020.

mehr

BibTeX

@inproceedings{Gad_Elrab_ISWC2020,
TITLE = {{ExCut}: {E}xplainable Embedding-Based Clustering over Knowledge Graphs},
AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Tran, Trung-Kien and Adel, Heike and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-030-62418-7},
DOI = {10.1007/978-3-030-62419-4_13},
PUBLISHER = {Springer},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {The Semantic Web -- ISWC 2020},
EDITOR = {Pan, Jeff Z. and Tamma, Valentina and D'Amato, Claudia and Janowicz, Krzysztof and Fu, Bo and Polleres, Axel and Seneviratne, Oshani and Kagal, Lalana},
PAGES = {218--237},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12506},
ADDRESS = {Athens, Greece (Virtual Conference)},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed Hassan
%A Stepanova, Daria
%A Tran, Trung-Kien
%A Adel, Heike
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ExCut: Explainable Embedding-Based Clustering over Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-830F-5
%R 10.1007/978-3-030-62419-4_13
%D 2020
%B 19th International Semantic Web Conference
%Z date of event: 2020-11-02 - 2020-11-06
%C Athens, Greece (Virtual Conference)
%B The Semantic Web -- ISWC 2020
%E Pan, Jeff Z.; Tamma, Valentina; D'Amato, Claudia; Janowicz, Krzysztof; Fu, Bo; Polleres, Axel; Seneviratne, Oshani; Kagal, Lalana
%P 218 - 237
%I Springer
%@ 978-3-030-62418-7
%B Lecture Notes in Computer Science
%N 12506

Conference paper

M. H. Gad-Elrab, V. T. Ho, E. Levinkov, T.-K. Tran, and D. Stepanova

“Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering,” in ISWC 2020 Posters, Demos, and Industry Tracks, Globally Online, 2020.

mehr

BibTeX

@inproceedings{Gad-Elrab_ISCW20,
TITLE = {Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering},
AUTHOR = {Gad-Elrab, Mohamed Hassan and Ho, Vinh Thinh and Levinkov, Evgeny and Tran, Trung-Kien and Stepanova, Daria},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-2721/paper572.pdf; urn:nbn:de:0074-2721-6},
PUBLISHER = {ceur-ws.org},
YEAR = {2020},
BOOKTITLE = {ISWC 2020 Posters, Demos, and Industry Tracks},
EDITOR = {Taylor, Kerry and Goncalves, Rafael and Lecue, Freddy and Yan, Jun},
PAGES = {281--286},
EID = {572},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2721},
ADDRESS = {Globally Online},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed Hassan
%A Ho, Vinh Thinh
%A Levinkov, Evgeny
%A Tran, Trung-Kien
%A Stepanova, Daria
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F86B-A
%U http://ceur-ws.org/Vol-2721/paper572.pdf
%D 2020
%B 19th Internatinal Semantic Web Conference
%Z date of event: 2020-11-01 - 2020-11-06
%C Globally Online
%B ISWC 2020 Posters, Demos, and Industry Tracks
%E Taylor, Kerry; Goncalves, Rafael; Lecue, Freddy; Yan, Jun
%P 281 - 286
%Z sequence number: 572
%I ceur-ws.org
%B CEUR Workshop Proceedings
%N 2721
%@ false
%U http://ceur-ws.org/Vol-2721/paper572.pdf

Conference paper

A. Ghazimatin, O. Balalau, R. Saha Roy, and G. Weikum

“PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.

mehr

BibTeX

@inproceedings{GhazimatinWSDM2020,
TITLE = {{PRINCE}: {P}rovider-side Interpretability with Counterfactual Explanations in Recommender Systemsxts},
AUTHOR = {Ghazimatin, Azin and Balalau, Oana and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-6822-3},
DOI = {10.1145/3336191.3371824},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining},
EDITOR = {Caverlee, James and Hu, Xia Ben},
PAGES = {196--204},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Ghazimatin, Azin
%A Balalau, Oana
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F173-7
%R 10.1145/3336191.3371824
%D 2020
%B 13th International Conference on Web Search and Data Mining
%Z date of event: 2020-02-03 - 2020-02-07
%C Houston, TX, USA
%B WSDM '20
%E Caverlee, James; Hu, Xia Ben
%P 196 - 204
%I ACM
%@ 978-1-4503-6822-3

Article

S. Ghosh, S. Razniewski, and G. Weikum

“Uncovering Hidden Semantics of Set Information in Knowledge Bases,” Journal of Web Semantics, vol. 64, 2020.

mehr

BibTeX

@article{Ghosh_2020,
TITLE = {Uncovering Hidden Semantics of Set Information in Knowledge Bases},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1570-8268},
DOI = {10.1016/j.websem.2020.100588},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2020},
DATE = {2020},
JOURNAL = {Journal of Web Semantics},
VOLUME = {64},
EID = {100588},
}

Endnote

%0 Journal Article
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Uncovering Hidden Semantics of Set Information in Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-066D-9
%R 10.1016/j.websem.2020.100588
%7 2020
%D 2020
%J Journal of Web Semantics
%V 64
%Z sequence number: 100588
%I Elsevier
%C Amsterdam
%@ false

Conference paper

S. Ghosh, S. Razniewski, and G. Weikum

“CounQER: A System for Discovering and Linking Count Information in Knowledge Bases,” in The Semantic Web: ESWC 2020 Satellite Events, Heraklion, Greece, 2020.

mehr

BibTeX

@inproceedings{Ghosh_ESWC20,
TITLE = {{CounQER}: {A} System for Discovering and Linking Count Information in Knowledge Bases},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-030-62326-5},
DOI = {10.1007/978-3-030-62327-2_15},
PUBLISHER = {Springer},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {The Semantic Web: ESWC 2020 Satellite Events},
EDITOR = {Harth, Andreas and Presutti, Valentina and Troncy, Rapha{\"e}l and Acosta, Maribel and Polleres, Axel and Fern{\'a}ndez, Javier D. and Xavier Parreira, Josiane and Hartig, Olaf and Hose, Katja and Cochez, Michael},
PAGES = {84--90},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12124},
ADDRESS = {Heraklion, Greece},
}

Endnote

%0 Conference Proceedings
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CounQER: A System for Discovering and Linking Count Information in Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EFB9-C
%R 10.1007/978-3-030-62327-2_15
%D 2020
%B 17th Extended Semantic Web Conference
%Z date of event: 2020-05-31 - 2020-06-04
%C Heraklion, Greece 
%B The Semantic Web: ESWC 2020 Satellite Events
%E Harth, Andreas; Presutti, Valentina; Troncy, Rapha&#235;l; Acosta, Maribel; Polleres, Axel; Fern&#225;ndez, Javier D.; Xavier Parreira, Josiane; Hartig, Olaf; Hose, Katja; Cochez, Michael
%P 84 - 90
%I Springer
%@ 978-3-030-62326-5
%B Lecture Notes in Computer Science
%N 12124

Paper

S. Ghosh, S. Razniewski, and G. Weikum

“CounQER: A System for Discovering and Linking Count Information in Knowledge Bases,” 2020. [Online]. Available: https://arxiv.org/abs/2005.03529.

mehr

Abstract

Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata,
DBpedia and Freebase are often limited to subproperty, domain and range
constraints. In this demo we showcase CounQER, a system that illustrates the
alignment of counting predicates, like staffSize, and enumerating predicates,
like workInstitution^{-1} . In the demonstration session, attendees can inspect
these alignments, and will learn about the importance of these alignments for
KB question answering and curation. CounQER is available at
counqer.mpi-inf.mpg.de/spo.

BibTeX

@online{Ghosh_2005.03529,
TITLE = {{CounQER}: {A} System for Discovering and Linking Count Information in Knowledge Bases},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2005.03529},
EPRINT = {2005.03529},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata,<br>DBpedia and Freebase are often limited to subproperty, domain and range<br>constraints. In this demo we showcase CounQER, a system that illustrates the<br>alignment of counting predicates, like staffSize, and enumerating predicates,<br>like workInstitution^{-1} . In the demonstration session, attendees can inspect<br>these alignments, and will learn about the importance of these alignments for<br>KB question answering and curation. CounQER is available at<br>https://counqer.mpi-inf.mpg.de/spo.<br>},
}

Endnote

%0 Report
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CounQER: A System for Discovering and Linking Count Information in Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F187-0
%U https://arxiv.org/abs/2005.03529
%D 2020
%X   Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata,<br>DBpedia and Freebase are often limited to subproperty, domain and range<br>constraints. In this demo we showcase CounQER, a system that illustrates the<br>alignment of counting predicates, like staffSize, and enumerating predicates,<br>like workInstitution^{-1} . In the demonstration session, attendees can inspect<br>these alignments, and will learn about the importance of these alignments for<br>KB question answering and curation. CounQER is available at<br>https://counqer.mpi-inf.mpg.de/spo.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB

Paper

S. Ghosh, S. Razniewski, and G. Weikum

“Uncovering Hidden Semantics of Set Information in Knowledge Bases,” 2020. [Online]. Available: http://arxiv.org/abs/2003.03155.

mehr

Abstract

Knowledge Bases (KBs) contain a wealth of structured information about
entities and predicates. This paper focuses on set-valued predicates, i.e., the
relationship between an entity and a set of entities. In KBs, this information
is often represented in two formats: (i) via counting predicates such as
numberOfChildren and staffSize, that store aggregated integers, and (ii) via
enumerating predicates such as parentOf and worksFor, that store individual set
memberships. Both formats are typically complementary: unlike enumerating
predicates, counting predicates do not give away individuals, but are more
likely informative towards the true set size, thus this coexistence could
enable interesting applications in question answering and KB curation.
In this paper we aim at uncovering this hidden knowledge. We proceed in two
steps. (i) We identify set-valued predicates from a given KB predicates via
statistical and embedding-based features. (ii) We link counting predicates and
enumerating predicates by a combination of co-occurrence, correlation and
textual relatedness metrics. We analyze the prevalence of count information in
four prominent knowledge bases, and show that our linking method achieves up to
0.55 F1 score in set predicate identification versus 0.40 F1 score of a random
selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75
at position 3 in relevant predicate alignments. Our predicate alignments are
showcased in a demonstration system available at
counqer.mpi-inf.mpg.de/spo.

BibTeX

@online{Ghosh_arXiv2003.03155,
TITLE = {Uncovering Hidden Semantics of Set Information in Knowledge Bases},
AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/2003.03155},
EPRINT = {2003.03155},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Knowledge Bases (KBs) contain a wealth of structured information about<br>entities and predicates. This paper focuses on set-valued predicates, i.e., the<br>relationship between an entity and a set of entities. In KBs, this information<br>is often represented in two formats: (i) via counting predicates such as<br>numberOfChildren and staffSize, that store aggregated integers, and (ii) via<br>enumerating predicates such as parentOf and worksFor, that store individual set<br>memberships. Both formats are typically complementary: unlike enumerating<br>predicates, counting predicates do not give away individuals, but are more<br>likely informative towards the true set size, thus this coexistence could<br>enable interesting applications in question answering and KB curation.<br> In this paper we aim at uncovering this hidden knowledge. We proceed in two<br>steps. (i) We identify set-valued predicates from a given KB predicates via<br>statistical and embedding-based features. (ii) We link counting predicates and<br>enumerating predicates by a combination of co-occurrence, correlation and<br>textual relatedness metrics. We analyze the prevalence of count information in<br>four prominent knowledge bases, and show that our linking method achieves up to<br>0.55 F1 score in set predicate identification versus 0.40 F1 score of a random<br>selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75<br>at position 3 in relevant predicate alignments. Our predicate alignments are<br>showcased in a demonstration system available at<br>https://counqer.mpi-inf.mpg.de/spo.<br>},
}

Endnote

%0 Report
%A Ghosh, Shrestha
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Uncovering Hidden Semantics of Set Information in Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-0662-4
%U http://arxiv.org/abs/2003.03155
%D 2020
%X   Knowledge Bases (KBs) contain a wealth of structured information about<br>entities and predicates. This paper focuses on set-valued predicates, i.e., the<br>relationship between an entity and a set of entities. In KBs, this information<br>is often represented in two formats: (i) via counting predicates such as<br>numberOfChildren and staffSize, that store aggregated integers, and (ii) via<br>enumerating predicates such as parentOf and worksFor, that store individual set<br>memberships. Both formats are typically complementary: unlike enumerating<br>predicates, counting predicates do not give away individuals, but are more<br>likely informative towards the true set size, thus this coexistence could<br>enable interesting applications in question answering and KB curation.<br>  In this paper we aim at uncovering this hidden knowledge. We proceed in two<br>steps. (i) We identify set-valued predicates from a given KB predicates via<br>statistical and embedding-based features. (ii) We link counting predicates and<br>enumerating predicates by a combination of co-occurrence, correlation and<br>textual relatedness metrics. We analyze the prevalence of count information in<br>four prominent knowledge bases, and show that our linking method achieves up to<br>0.55 F1 score in set predicate identification versus 0.40 F1 score of a random<br>selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75<br>at position 3 in relevant predicate alignments. Our predicate alignments are<br>showcased in a demonstration system available at<br>https://counqer.mpi-inf.mpg.de/spo.<br>
%K Computer Science, Databases, cs.DB,Computer Science, Information Retrieval, cs.IR

Conference paper

D. Gupta and K. Berberich

“Weaving Text into Tables,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.

mehr

BibTeX

@inproceedings{DBLP:conf/cikm/0001B20,
TITLE = {Weaving Text into Tables},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-6859-9},
DOI = {10.1145/3340531.3417442},
PUBLISHER = {ACM},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management},
EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan},
PAGES = {3401--34049},
ADDRESS = {Virtual Event, Ireland},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Weaving Text into Tables : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0313-F
%R 10.1145/3340531.3417442
%D 2020
%B 29th ACM International Conference on Information & Knowledge Management
%Z date of event: 2020-10-19 - 2020-10-23
%C Virtual Event, Ireland
%B CIKM '20
%E d&#8217;Aquin, Mathieu; Dietze, Stefan
%P 3401 - 34049
%I ACM
%@ 978-1-4503-6859-9

Conference paper

D. Gupta and K. Berberich

“Optimizing Hyper-Phrase Queries,” in ICTIR ’20, ACM SIGIR International Conference on Theory of Information Retrieval, Virtual Event, Norway, 2020.

mehr

BibTeX

@inproceedings{DBLP:conf/ictir/0002B20,
TITLE = {Optimizing Hyper-Phrase Queries},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-8067-6},
DOI = {10.1145/3409256.3409827},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {ICTIR '20, ACM SIGIR International Conference on Theory of Information Retrieval},
EDITOR = {Balog, Krisztian and Setty, Vinay and Lioma, Christina and Liu, Yiqun and Zhang, Min and Berberich, Klaus},
PAGES = {41--48},
ADDRESS = {Virtual Event, Norway},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Optimizing Hyper-Phrase Queries : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0335-9
%R 10.1145/3409256.3409827
%D 2020
%B ACM SIGIR International Conference on Theory of Information Retrieval 
%Z date of event: 2020-09-14 - 2020-09-17
%C Virtual Event, Norway
%B ICTIR '20
%E Balog, Krisztian; Setty, Vinay; Lioma, Christina; Liu, Yiqun; Zhang, Min; Berberich, Klaus
%P 41 - 48
%I ACM
%@ 978-1-4503-8067-6

Thesis

E. Heiter

“Factoring Out Prior Knowledge from Low-dimensional Embeddings,” Universität des Saarlandes, Saarbrücken, 2020.

mehr

BibTeX

@mastersthesis{heiter:20:confetti,
TITLE = {Factoring Out Prior Knowledge from Low-dimensional Embeddings},
AUTHOR = {Heiter, Edith},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2020},
DATE = {2020},
}

Endnote

%0 Thesis
%A Heiter, Edith
%Y Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Factoring Out Prior Knowledge from Low-dimensional Embeddings : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FEF8-4
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2020
%V master
%9 master

Conference paper

V. T. Ho, K. Pal, N. Kleer, K. Berberich, and G. Weikum

“Entities with Quantities: Extraction, Search, and Ranking,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.

mehr

BibTeX

@inproceedings{HoWSDM2020,
TITLE = {Entities with Quantities: {E}xtraction, Search, and Ranking},
AUTHOR = {Ho, Vinh Thinh and Pal, Koninika and Kleer, Niko and Berberich, Klaus and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450368223},
DOI = {10.1145/3336191.3371860},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining},
EDITOR = {Caverlee, James and Hu, Xia Ben},
PAGES = {833--836},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Ho, Vinh Thinh
%A Pal, Koninika
%A Kleer, Niko
%A Berberich, Klaus
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Entities with Quantities: Extraction, Search, and Ranking  : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-A284-D
%R 10.1145/3336191.3371860
%D 2020
%B 13th International Conference on Web Search and Data Mining
%Z date of event: 2020-02-03 - 2020-02-07
%C Houston, TX, USA
%B WSDM '20
%E Caverlee, James; Hu, Xia Ben
%P 833 - 836
%I ACM
%@ 9781450368223

Conference paper

M. Jain, P. Mirza, and R. Mutharaju

“Cardinality Extraction from Text for Ontology Learning,” in Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020), Hyderabad, India, 2020.

mehr

BibTeX

@inproceedings{Jain_CoDS2020,
TITLE = {Cardinality Extraction from Text for Ontology Learning},
AUTHOR = {Jain, Monika and Mirza, Paramita and Mutharaju, Raghava},
LANGUAGE = {eng},
ISBN = {9781450377386},
DOI = {10.1145/3371158.3371223},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020)},
EDITOR = {Bhattacharya, Arnab and Natarajan, Sriraam and Saha Roy, Rishiraj},
PAGES = {354--354},
ADDRESS = {Hyderabad, India},
}

Endnote

%0 Conference Proceedings
%A Jain, Monika
%A Mirza, Paramita
%A Mutharaju, Raghava
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Cardinality Extraction from Text for Ontology Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-AB73-6
%R 10.1145/3371158.3371223
%D 2020
%B ACM India Joint International Conferenceon Data Science and Management of Data 
%Z date of event: 2020-01-05 - 2020-01-07
%C Hyderabad, India
%B Proceedings of the 7th ACM IKDD CoDS and 25th COMAD
%E Bhattacharya, Arnab; Natarajan, Sriraam; Saha Roy, Rishiraj
%P 354 - 354
%I ACM
%@ 9781450377386

Conference paper

M. Kaiser

“Incorporating User Feedback in Conversational Question Answering over Heterogeneous Web Sources,” in SIGIR ’20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020.

mehr

BibTeX

@inproceedings{Kaiser_SIGIR20b,
TITLE = {Incorporating User Feedback in Conversational Question Answering over Heterogeneous {Web} Sources},
AUTHOR = {Kaiser, Magdalena},
LANGUAGE = {eng},
ISBN = {9781450380164},
DOI = {10.1145/3397271.3401454},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {SIGIR '20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {2482--2482},
ADDRESS = {Virtual Event, China},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Incorporating User Feedback in Conversational Question Answering over Heterogeneous Web Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FCDA-8
%R 10.1145/3397271.3401454
%D 2020
%B 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2020-07-25 - 2020-07-30
%C Virtual Event, China
%B SIGIR '20
%P 2482 - 2482
%I ACM
%@ 9781450380164

Paper

M. Kaiser, R. Saha Roy, and G. Weikum

“Conversational Question Answering over Passages by Leveraging Word Proximity Networks,” 2020. [Online]. Available: https://arxiv.org/abs/2004.13117.

mehr

Abstract

Question answering (QA) over text passages is a problem of long-standing
interest in information retrieval. Recently, the conversational setting has
attracted attention, where a user asks a sequence of questions to satisfy her
information needs around a topic. While this setup is a natural one and similar
to humans conversing with each other, it introduces two key research
challenges: understanding the context left implicit by the user in follow-up
questions, and dealing with ad hoc question formulations. In this work, we
demonstrate CROWN (Conversational passage ranking by Reasoning Over Word
Networks): an unsupervised yet effective system for conversational QA with
passage responses, that supports several modes of context propagation over
multiple turns. To this end, CROWN first builds a word proximity network (WPN)
from large corpora to store statistically significant term co-occurrences. At
answering time, passages are ranked by a combination of their similarity to the
question, and coherence of query terms within: these factors are measured by
reading off node and edge weights from the WPN. CROWN provides an interface
that is both intuitive for end-users, and insightful for experts for
reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data,
where it achieved above-median performance in a pool of neural methods.

BibTeX

@online{Kaiser_2004.13117,
TITLE = {Conversational Question Answering over Passages by Leveraging Word Proximity Networks},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2004.13117},
EPRINT = {2004.13117},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Question answering (QA) over text passages is a problem of long-standing<br>interest in information retrieval. Recently, the conversational setting has<br>attracted attention, where a user asks a sequence of questions to satisfy her<br>information needs around a topic. While this setup is a natural one and similar<br>to humans conversing with each other, it introduces two key research<br>challenges: understanding the context left implicit by the user in follow-up<br>questions, and dealing with ad hoc question formulations. In this work, we<br>demonstrate CROWN (Conversational passage ranking by Reasoning Over Word<br>Networks): an unsupervised yet effective system for conversational QA with<br>passage responses, that supports several modes of context propagation over<br>multiple turns. To this end, CROWN first builds a word proximity network (WPN)<br>from large corpora to store statistically significant term co-occurrences. At<br>answering time, passages are ranked by a combination of their similarity to the<br>question, and coherence of query terms within: these factors are measured by<br>reading off node and edge weights from the WPN. CROWN provides an interface<br>that is both intuitive for end-users, and insightful for experts for<br>reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data,<br>where it achieved above-median performance in a pool of neural methods.<br>},
}

Endnote

%0 Report
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Conversational Question Answering over Passages by Leveraging Word Proximity Networks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F17D-D
%U https://arxiv.org/abs/2004.13117
%D 2020
%X   Question answering (QA) over text passages is a problem of long-standing<br>interest in information retrieval. Recently, the conversational setting has<br>attracted attention, where a user asks a sequence of questions to satisfy her<br>information needs around a topic. While this setup is a natural one and similar<br>to humans conversing with each other, it introduces two key research<br>challenges: understanding the context left implicit by the user in follow-up<br>questions, and dealing with ad hoc question formulations. In this work, we<br>demonstrate CROWN (Conversational passage ranking by Reasoning Over Word<br>Networks): an unsupervised yet effective system for conversational QA with<br>passage responses, that supports several modes of context propagation over<br>multiple turns. To this end, CROWN first builds a word proximity network (WPN)<br>from large corpora to store statistically significant term co-occurrences. At<br>answering time, passages are ranked by a combination of their similarity to the<br>question, and coherence of query terms within: these factors are measured by<br>reading off node and edge weights from the WPN. CROWN provides an interface<br>that is both intuitive for end-users, and insightful for experts for<br>reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data,<br>where it achieved above-median performance in a pool of neural methods.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

M. Kaiser, R. Saha Roy, and G. Weikum

“Conversational Question Answering over Passages by Leveraging Word Proximity Networks,” in SIGIR ’20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020.

mehr

BibTeX

@inproceedings{Kaiser_SIGIR20,
TITLE = {Conversational Question Answering over Passages by Leveraging Word Proximity Networks},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450380164},
DOI = {10.1145/3397271.3401399},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {SIGIR '20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {2129--2132},
ADDRESS = {Virtual Event, China},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Conversational Question Answering over Passages by Leveraging Word Proximity Networks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F152-C
%R 10.1145/3397271.3401399
%D 2020
%B 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2020-07-25 - 2020-07-30
%C Virtual Event, China
%B SIGIR '20
%P 2129 - 2132
%I ACM
%@ 9781450380164

Conference paper

P. Lahoti, A. Beutel, J. Chen, K. Lee, F. Prost, N. Thain, X. Wang, and E. Chi

“Fairness without Demographics through Adversarially Reweighted Learning,” in Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Virtual Event, 2020.

mehr

BibTeX

@inproceedings{DBLP:conf/nips/LahotiBCLPT0C20,
TITLE = {Fairness without Demographics through Adversarially Reweighted Learning},
AUTHOR = {Lahoti, Preethi and Beutel, Alex and Chen, Jilin and Lee, Kang and Prost, Flavien and Thain, Nithum and Wang, Xuezhi and Chi, Ed},
LANGUAGE = {eng},
PUBLISHER = {Curran Associates, Inc.},
YEAR = {2020},
BOOKTITLE = {Advances in Neural Information Processing Systems 33 (NeurIPS 2020)},
EDITOR = {Larochelle, Hugo and Ranzato, Marc Aurelio and Hadsell, Raia and Balcan, Maria-Florina and Lin, Hsuan-Tien},
ADDRESS = {Virtual Event},
}

Endnote

%0 Conference Proceedings
%A Lahoti, Preethi
%A Beutel, Alex
%A Chen, Jilin
%A Lee, Kang
%A Prost, Flavien
%A Thain, Nithum
%A Wang, Xuezhi
%A Chi, Ed
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Fairness without Demographics through Adversarially Reweighted Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FCC2-2
%D 2020
%B 34th Conference on Neural Information Processing Systems
%Z date of event: 2020-12-06 - 2020-12-12
%C Virtual Event
%B Advances in Neural Information Processing Systems 33
%E Larochelle, Hugo; Ranzato, Marc Aurelio; Hadsell, Raia; Balcan, Maria-Florina; Lin, Hsuan-Tien
%I Curran Associates, Inc.
%U https://proceedings.neurips.cc/paper/2020/hash/07fc15c9d169ee48573edd749d25945d-Abstract.html

Paper

C. Li, A. Yates, S. MacAvaney, B. He, and Y. Sun

“PARADE: Passage Representation Aggregation for Document Reranking,” 2020. [Online]. Available: https://arxiv.org/abs/2008.09093.

mehr

Abstract

We present PARADE, an end-to-end Transformer-based model that considers
document-level context for document reranking. PARADE leverages passage-level
relevance representations to predict a document relevance score, overcoming the
limitations of previous approaches that perform inference on passages
independently. Experiments on two ad-hoc retrieval benchmarks demonstrate
PARADE's effectiveness over such methods. We conduct extensive analyses on
PARADE's efficiency, highlighting several strategies for improving it. When
combined with knowledge distillation, a PARADE model with 72\% fewer parameters
achieves effectiveness competitive with previous approaches using BERT-Base.
Our code is available at \url{https://github.com/canjiali/PARADE}.

BibTeX

@online{Li2008.09093,
TITLE = {{PARADE}: Passage Representation Aggregation for Document Reranking},
AUTHOR = {Li, Canjia and Yates, Andrew and MacAvaney, Sean and He, Ben and Sun, Yingfei},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2008.09093},
EPRINT = {2008.09093},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {We present PARADE, an end-to-end Transformer-based model that considers<br>document-level context for document reranking. PARADE leverages passage-level<br>relevance representations to predict a document relevance score, overcoming the<br>limitations of previous approaches that perform inference on passages<br>independently. Experiments on two ad-hoc retrieval benchmarks demonstrate<br>PARADE's effectiveness over such methods. We conduct extensive analyses on<br>PARADE's efficiency, highlighting several strategies for improving it. When<br>combined with knowledge distillation, a PARADE model with 72\% fewer parameters<br>achieves effectiveness competitive with previous approaches using BERT-Base.<br>Our code is available at \url{https://github.com/canjiali/PARADE}.<br>},
}

Endnote

%0 Report
%A Li, Canjia
%A Yates, Andrew
%A MacAvaney, Sean
%A He, Ben
%A Sun, Yingfei
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T PARADE: Passage Representation Aggregation for Document Reranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-06CF-9
%U https://arxiv.org/abs/2008.09093
%D 2020
%X   We present PARADE, an end-to-end Transformer-based model that considers<br>document-level context for document reranking. PARADE leverages passage-level<br>relevance representations to predict a document relevance score, overcoming the<br>limitations of previous approaches that perform inference on passages<br>independently. Experiments on two ad-hoc retrieval benchmarks demonstrate<br>PARADE's effectiveness over such methods. We conduct extensive analyses on<br>PARADE's efficiency, highlighting several strategies for improving it. When<br>combined with knowledge distillation, a PARADE model with 72\% fewer parameters<br>achieves effectiveness competitive with previous approaches using BERT-Base.<br>Our code is available at \url{https://github.com/canjiali/PARADE}.<br>
%K Computer Science, Information Retrieval, cs.IR

Paper

J. Lin, R. Nogueira, and A. Yates

“Pretrained Transformers for Text Ranking: BERT and Beyond,” 2020. [Online]. Available: https://arxiv.org/abs/2010.06467.

mehr

Abstract

The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has, without exaggeration, revolutionized the fields of natural
language processing (NLP), information retrieval (IR), and beyond. In this
survey, we provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
ranking architectures and learned dense representations that attempt to perform
ranking directly. There are two themes that pervade our survey: techniques for
handling long documents, beyond the typical sentence-by-sentence processing
approaches used in NLP, and techniques for addressing the tradeoff between
effectiveness (result quality) and efficiency (query latency). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading.

BibTeX

@online{Lin2010.06467,
TITLE = {Pretrained Transformers for Text Ranking: {BERT} and Beyond},
AUTHOR = {Lin, Jimmy and Nogueira, Rodrigo and Yates, Andrew},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2010.06467},
EPRINT = {2010.06467},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {The goal of text ranking is to generate an ordered list of texts retrieved<br>from a corpus in response to a query. Although the most common formulation of<br>text ranking is search, instances of the task can also be found in many natural<br>language processing applications. This survey provides an overview of text<br>ranking with neural network architectures known as transformers, of which BERT<br>is the best-known example. The combination of transformers and self-supervised<br>pretraining has, without exaggeration, revolutionized the fields of natural<br>language processing (NLP), information retrieval (IR), and beyond. In this<br>survey, we provide a synthesis of existing work as a single point of entry for<br>practitioners who wish to gain a better understanding of how to apply<br>transformers to text ranking problems and researchers who wish to pursue work<br>in this area. We cover a wide range of modern techniques, grouped into two<br>high-level categories: transformer models that perform reranking in multi-stage<br>ranking architectures and learned dense representations that attempt to perform<br>ranking directly. There are two themes that pervade our survey: techniques for<br>handling long documents, beyond the typical sentence-by-sentence processing<br>approaches used in NLP, and techniques for addressing the tradeoff between<br>effectiveness (result quality) and efficiency (query latency). Although<br>transformer architectures and pretraining techniques are recent innovations,<br>many aspects of how they are applied to text ranking are relatively well<br>understood and represent mature techniques. However, there remain many open<br>research questions, and thus in addition to laying out the foundations of<br>pretrained transformers for text ranking, this survey also attempts to<br>prognosticate where the field is heading.<br>},
}

Endnote

%0 Report
%A Lin, Jimmy
%A Nogueira, Rodrigo
%A Yates, Andrew
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Pretrained Transformers for Text Ranking: BERT and Beyond : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-06DA-C
%U https://arxiv.org/abs/2010.06467
%D 2020
%X   The goal of text ranking is to generate an ordered list of texts retrieved<br>from a corpus in response to a query. Although the most common formulation of<br>text ranking is search, instances of the task can also be found in many natural<br>language processing applications. This survey provides an overview of text<br>ranking with neural network architectures known as transformers, of which BERT<br>is the best-known example. The combination of transformers and self-supervised<br>pretraining has, without exaggeration, revolutionized the fields of natural<br>language processing (NLP), information retrieval (IR), and beyond. In this<br>survey, we provide a synthesis of existing work as a single point of entry for<br>practitioners who wish to gain a better understanding of how to apply<br>transformers to text ranking problems and researchers who wish to pursue work<br>in this area. We cover a wide range of modern techniques, grouped into two<br>high-level categories: transformer models that perform reranking in multi-stage<br>ranking architectures and learned dense representations that attempt to perform<br>ranking directly. There are two themes that pervade our survey: techniques for<br>handling long documents, beyond the typical sentence-by-sentence processing<br>approaches used in NLP, and techniques for addressing the tradeoff between<br>effectiveness (result quality) and efficiency (query latency). Although<br>transformer architectures and pretraining techniques are recent innovations,<br>many aspects of how they are applied to text ranking are relatively well<br>understood and represent mature techniques. However, there remain many open<br>research questions, and thus in addition to laying out the foundations of<br>pretrained transformers for text ranking, this survey also attempts to<br>prognosticate where the field is heading.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Article

P. Mandros, M. Boley, and J. Vreeken

“Discovering Dependencies with Reliable Mutual Information,” Knowledge and Information Systems, vol. 62, 2020.

mehr

BibTeX

@article{Mandros2020,
TITLE = {Discovering Dependencies with Reliable Mutual Information},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {0219-3116},
DOI = {10.1007/s10115-020-01494-9},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2020},
JOURNAL = {Knowledge and Information Systems},
VOLUME = {62},
PAGES = {4223--4253},
}

Endnote

%0 Journal Article
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Dependencies with Reliable Mutual Information : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-DC90-F
%R 10.1007/s10115-020-01494-9
%7 2020
%D 2020
%J Knowledge and Information Systems
%V 62
%& 4223
%P 4223 - 4253
%I Springer
%C New York, NY
%@ false

Conference paper

S. Nag Chowdhury, W. Cheng, G. de Melo, S. Razniewski, and G. Weikum

“Illustrate Your Story: Enriching Text with Images,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.

mehr

BibTeX

@inproceedings{NagWSDM2020,
TITLE = {Illustrate Your Story: {Enriching} Text with Images},
AUTHOR = {Nag Chowdhury, Sreyasi and Cheng, William and de Melo, Gerard and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450368223},
DOI = {10.1145/3336191.3371866},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining},
EDITOR = {Caverlee, James and Hu, Xia Ben},
PAGES = {849--852},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Nag Chowdhury, Sreyasi
%A Cheng, William
%A de Melo, Gerard
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Illustrate Your Story: Enriching Text with Images : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-A27C-8
%R 10.1145/3336191.3371866
%D 2020
%B 13th International Conference on Web Search and Data Mining
%Z date of event: 2020-02-03 - 2020-02-07
%C Houston, TX, USA
%B WSDM '20
%E Caverlee, James; Hu, Xia Ben
%P 849 - 852
%I ACM
%@ 9781450368223

Thesis

D5IMPR-CS

T.-P. Nguyen

“Advanced Semantics for Commonsense Knowledge Extraction,” Universität des Saarlandes, Saarbrücken, 2020.

mehr

Abstract

BibTeX

@mastersthesis{NguyenMSc2020,
TITLE = {Advanced Semantics for Commonsense Knowledge Extraction},
AUTHOR = {Nguyen, Tuan-Phong},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2020},
DATE = {2020},
ABSTRACT = {Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.},
}

Endnote

%0 Thesis
%A Nguyen, Tuan-Phong
%Y Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Advanced Semantics for Commonsense Knowledge Extraction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FED0-0
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2020
%P 67 p.
%V master
%9 master
%X Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.

Paper

T.-P. Nguyen, S. Razniewski, and G. Weikum

“Advanced Semantics for Commonsense Knowledge Extraction,” WWW 2021, 2020. [Online]. Available: https://arxiv.org/abs/2011.00905.

mehr

Abstract

Commonsense knowledge (CSK) about concepts and their properties is useful for
AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB
and others compiled large CSK collections, but are restricted in their
expressiveness to subject-predicate-object (SPO) triples with simple concepts
for S and monolithic strings for P and O. Also, these projects have either
prioritized precision or recall, but hardly reconcile these complementary
goals. This paper presents a methodology, called Ascent, to automatically build
a large-scale knowledge base (KB) of CSK assertions, with advanced
expressiveness and both better precision and recall than prior works. Ascent
goes beyond triples by capturing composite concepts with subgroups and aspects,
and by refining assertions with semantic facets. The latter are important to
express temporal and spatial validity of assertions and further qualifiers.
Ascent combines open information extraction with judicious cleaning using
language models. Intrinsic evaluation shows the superior size and quality of
the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the
benefits of Ascent.

BibTeX

@online{Nguyen_2011.00905,
TITLE = {Advanced Semantics for Commonsense Knowledge Extraction},
AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2011.00905},
EPRINT = {2011.00905},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Commonsense knowledge (CSK) about concepts and their properties is useful for<br>AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB<br>and others compiled large CSK collections, but are restricted in their<br>expressiveness to subject-predicate-object (SPO) triples with simple concepts<br>for S and monolithic strings for P and O. Also, these projects have either<br>prioritized precision or recall, but hardly reconcile these complementary<br>goals. This paper presents a methodology, called Ascent, to automatically build<br>a large-scale knowledge base (KB) of CSK assertions, with advanced<br>expressiveness and both better precision and recall than prior works. Ascent<br>goes beyond triples by capturing composite concepts with subgroups and aspects,<br>and by refining assertions with semantic facets. The latter are important to<br>express temporal and spatial validity of assertions and further qualifiers.<br>Ascent combines open information extraction with judicious cleaning using<br>language models. Intrinsic evaluation shows the superior size and quality of<br>the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the<br>benefits of Ascent.<br>},
JOURNAL = {WWW 2021},
}

Endnote

%0 Report
%A Nguyen, Tuan-Phong
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Advanced Semantics for Commonsense Knowledge Extraction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FEDA-6
%U https://arxiv.org/abs/2011.00905
%D 2020
%X   Commonsense knowledge (CSK) about concepts and their properties is useful for<br>AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB<br>and others compiled large CSK collections, but are restricted in their<br>expressiveness to subject-predicate-object (SPO) triples with simple concepts<br>for S and monolithic strings for P and O. Also, these projects have either<br>prioritized precision or recall, but hardly reconcile these complementary<br>goals. This paper presents a methodology, called Ascent, to automatically build<br>a large-scale knowledge base (KB) of CSK assertions, with advanced<br>expressiveness and both better precision and recall than prior works. Ascent<br>goes beyond triples by capturing composite concepts with subgroups and aspects,<br>and by refining assertions with semantic facets. The latter are important to<br>express temporal and spatial validity of assertions and further qualifiers.<br>Ascent combines open information extraction with judicious cleaning using<br>language models. Intrinsic evaluation shows the superior size and quality of<br>the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the<br>benefits of Ascent.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL
%J WWW 2021

Thesis

A. Oláh

“What’s in the Box? Explaining Neural Networks with Robust Rules,” Universität des Saarlandes, Saarbrücken, 2020.

mehr

BibTeX

@mastersthesis{olah:20:explainn,
TITLE = {What's in the Box? Explaining Neural Networks with Robust Rules},
AUTHOR = {Ol{\'a}h, Anna},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2020},
DATE = {2020},
}

Endnote

%0 Thesis
%A Ol&#225;h, Anna
%Y Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T What's in the Box? Explaining Neural Networks with Robust Rules : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FEFA-2
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2020
%V master
%9 master

Conference paper

K. Pal, V. T. Ho, and G. Weikum

“Co-Clustering Triples from Open Information Extraction,” in Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020), Hyderabad, India, 2020.

mehr

BibTeX

@inproceedings{Pal_CoDS2020,
TITLE = {Co-Clustering Triples from Open Information Extraction},
AUTHOR = {Pal, Koninika and Ho, Vinh Thinh and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450377386},
DOI = {10.1145/3371158.3371183},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020)},
EDITOR = {Bhattacharya, Arnab and Natarajan, Sriraam and Saha Roy, Rishiraj},
PAGES = {190--194},
ADDRESS = {Hyderabad, India},
}

Endnote

%0 Conference Proceedings
%A Pal, Koninika
%A Ho, Vinh Thinh
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Co-Clustering Triples from Open Information Extraction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EBFC-5
%R 10.1145/3371158.3371183
%D 2020
%B ACM India Joint International Conferenceon Data Science and Management of Data 
%Z date of event: 2020-01-05 - 2020-01-07
%C Hyderabad, India
%B Proceedings of the 7th ACM IKDD CoDS and 25th COMAD
%E Bhattacharya, Arnab; Natarajan, Sriraam; Saha Roy, Rishiraj
%P 190 - 194
%I ACM
%@ 9781450377386

Conference paper

T. Pellissier Tanon, G. Weikum, and F. Suchanek

“YAGO 4: A Reason-able Knowledge Base,” in The Semantic Web (ESWC 2020), Heraklion, Greece, 2020.

mehr

BibTeX

@inproceedings{Pellissier_ESCW2020,
TITLE = {{YAGO 4}: {A} Reason-able Knowledge Base},
AUTHOR = {Pellissier Tanon, Thomas and Weikum, Gerhard and Suchanek, Fabian},
LANGUAGE = {eng},
ISBN = {978-3-030-49460-5},
DOI = {10.1007/978-3-030-49461-2_34},
PUBLISHER = {Springer},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {The Semantic Web (ESWC 2020)},
EDITOR = {Harth, Andreas and Kirrane, Sabrina and Ngonga Ngomo, Axel-Cyrille and Paulheim, Heiko and Rula, Anisa and Gentile, Anna Lisa and Haase, Peter and Cochez, Michael},
PAGES = {583 {\textbar}--596},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12123},
ADDRESS = {Heraklion, Greece},
}

Endnote

%0 Conference Proceedings
%A Pellissier Tanon, Thomas
%A Weikum, Gerhard
%A Suchanek, Fabian
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T YAGO 4: A Reason-able Knowledge Base : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EFC8-B
%R 10.1007/978-3-030-49461-2_34
%D 2020
%B 17th Extended Semantic Web Conference
%Z date of event: 2020-05-31 - 2020-06-04
%C Heraklion, Greece
%B The Semantic Web
%E Harth, Andreas; Kirrane, Sabrina; Ngonga Ngomo, Axel-Cyrille; Paulheim, Heiko; Rula, Anisa; Gentile, Anna Lisa; Haase, Peter; Cochez, Michael
%P 583 | - 596
%I Springer
%@ 978-3-030-49460-5
%B Lecture Notes in Computer Science
%N 12123

Conference paper

F. Pennerath, P. Mandros, and J. Vreeken

“Discovering Approximate Functional Dependencies using Smoothed Mutual Information,” in KDD ’20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, USA, 2020.

mehr

BibTeX

@inproceedings{penerath:20:smooth,
TITLE = {Discovering Approximate Functional Dependencies using Smoothed Mutual Information},
AUTHOR = {Pennerath, Fr{\'e}d{\'e}ric and Mandros, Panagiotis and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-4503-7998-4},
DOI = {10.1145/3394486.3403178},
PUBLISHER = {ACM},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {KDD '20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
EDITOR = {Gupta, Rajesh and Liu, Yan and Tang, Jilaiang and Prakash, B. Aditya},
PAGES = {1254--1264},
ADDRESS = {Virtual Event, USA},
}

Endnote

%0 Conference Proceedings
%A Pennerath, Fr&#233;d&#233;ric
%A Mandros, Panagiotis
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Discovering Approximate Functional Dependencies using Smoothed Mutual Information : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-2560-2
%R 10.1145/3394486.3403178
%D 2020
%B 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%Z date of event: 2020-08-23 - 2020-08-27
%C Virtual Event, USA
%B KDD '20
%E Gupta, Rajesh; Liu, Yan; Tang, Jilaiang; Prakash, B. Aditya
%P 1254 - 1264
%I ACM
%@ 978-1-4503-7998-4

Conference paper

S. Qiu, B. Xu, J. Zhang, Y. Wang, X. Shen, G. de Melo, C. Long, and X. Li

“EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks,” in Companion of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.

mehr

BibTeX

@inproceedings{qiu2020easyaug,
TITLE = {{EasyAug}: {An} Automatic Textual Data Augmentation Platform for Classification Tasks},
AUTHOR = {Qiu, Siyuan and Xu, Binxia and Zhang, Jie and Wang, Yafang and Shen, Xiaoyu and de Melo, Gerard and Long, Chong and Li, Xiaolong},
LANGUAGE = {eng},
ISBN = {978-1-4503-7024-0},
DOI = {10.1145/3366424.3383552},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2020)},
EDITOR = {El Fallah, Amal and Sukthankar, Gita and Liu, Tie-Yan and van Steen, Maarten},
PAGES = {249--252},
ADDRESS = {Taipei, Taiwan},
}

Endnote

%0 Conference Proceedings
%A Qiu, Siyuan
%A Xu, Binxia
%A Zhang, Jie
%A Wang, Yafang
%A Shen, Xiaoyu
%A de Melo, Gerard
%A Long, Chong
%A Li, Xiaolong
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-143B-0
%R 10.1145/3366424.3383552
%D 2020
%B The World Wide Web Conference
%Z date of event: 2020-04-20 - 2020-04-24
%C Taipei, Taiwan
%B Companion of The World Wide Web Conference
%E El Fallah, Amal; Sukthankar, Gita; Liu, Tie-Yan; van Steen, Maarten
%P 249 - 252
%I ACM
%@ 978-1-4503-7024-0

Conference paper

N. H. Ramadhana, F. Darari, P. O. H. Putra, W. Nutt, S. Razniewski, and R. I. Akbar

“User-Centered Design for Knowledge Imbalance Analysis: A Case Study of ProWD,” in VOILA!2020, Fifth International Workshop on Visualization and Interaction for Ontologies and Linked Data, Virtual Conference, 2020.

mehr

BibTeX

@inproceedings{Ramadhana_VOILA2020,
TITLE = {User-Centered Design for Knowledge Imbalance Analysis: {A} Case Study of {ProWD}},
AUTHOR = {Ramadhana, Nadyah Hani and Darari, Fariz and Putra, Panca O. Hadi and Nutt, Werner and Razniewski, Simon and Akbar, Refo Ilmiya},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-2778/paper2.pdf; urn:nbn:de:0074-2778-8},
PUBLISHER = {ceur-ws.org},
YEAR = {2020},
BOOKTITLE = {VOILA!2020, Fifth International Workshop on Visualization and Interaction for Ontologies and Linked Data},
EDITOR = {Ivanova, Valentina and Lambrix, Patrick and Pesquita, Catia and Wiens, Vitalis},
PAGES = {14--27},
EID = {2},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2778},
ADDRESS = {Virtual Conference},
}

Endnote

%0 Conference Proceedings
%A Ramadhana, Nadyah Hani
%A Darari, Fariz
%A Putra, Panca O. Hadi
%A Nutt, Werner
%A Razniewski, Simon
%A Akbar, Refo Ilmiya
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T User-Centered Design for Knowledge Imbalance Analysis: A Case Study of ProWD : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-063B-0
%U http://ceur-ws.org/Vol-2778/paper2.pdf
%D 2020
%B Fifth International Workshop on Visualization and Interaction for Ontologies and Linked Data
%Z date of event: 2020-11-02 - 2020-11-02
%C Virtual Conference
%B VOILA!2020
%E Ivanova, Valentina; Lambrix, Patrick; Pesquita, Catia; Wiens, Vitalis
%P 14 - 27
%Z sequence number: 2
%I ceur-ws.org
%B CEUR Workshop Proceedings
%N 2778
%@ false
%U http://ceur-ws.org/Vol-2778/paper2.pdf

Conference paper

S. Razniewski and P. Das

“Structured Knowledge: Have We Made Progress? An Extrinsic Study of KB Coverage over 19 Years,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.

mehr

Abstract

Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.

BibTeX

@inproceedings{razniewski2020structured,
TITLE = {Structured Knowledge: {H}ave We Made Progress? {A}n Extrinsic Study of {KB} Coverage over 19 Years},
AUTHOR = {Razniewski, Simon and Das, Priyanka},
LANGUAGE = {eng},
ISBN = {978-1-4503-6859-9},
DOI = {10.1145/3340531.3417447},
PUBLISHER = {ACM},
YEAR = {2020},
DATE = {2020},
ABSTRACT = {Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.},
BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management},
EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan},
PAGES = {3317--3320},
ADDRESS = {Virtual Event, Ireland},
}

Endnote

%0 Conference Proceedings
%A Razniewski, Simon
%A Das, Priyanka
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Structured Knowledge: Have We Made Progress? An Extrinsic Study of KB Coverage over 19 Years : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FF42-0
%R 10.1145/3340531.3417447
%D 2020
%B 29th ACM International Conference on Information & Knowledge Management
%Z date of event: 2020-10-19 - 2020-10-23
%C Virtual Event, Ireland
%X Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.
%B CIKM '20
%E d&#8217;Aquin, Mathieu; Dietze, Stefan
%P 3317 - 3320
%I ACM
%@ 978-1-4503-6859-9

Conference paper

J. Romero and S. Razniewski

“Inside Quasimodo: Exploring Construction and Usage of Commonsense Knowledge,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.

mehr

Abstract

BibTeX

@inproceedings{Romero_CIKM2020,
TITLE = {Inside {Quasimodo}: {E}xploring Construction and Usage of Commonsense Knowledge},
AUTHOR = {Romero, Julien and Razniewski, Simon},
LANGUAGE = {eng},
ISBN = {978-1-4503-6859-9},
DOI = {10.1145/3340531.3417416},
PUBLISHER = {ACM},
YEAR = {2020},
DATE = {2020},
ABSTRACT = {Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.},
BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management},
EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan},
PAGES = {3445--3448},
ADDRESS = {Virtual Event, Ireland},
}

Endnote

%0 Conference Proceedings
%A Romero, Julien
%A Razniewski, Simon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Inside Quasimodo: Exploring Construction and Usage of Commonsense Knowledge : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-04C6-4
%R 10.1145/3340531.3417416
%D 2020
%B 29th ACM International Conference on Information & Knowledge Management
%Z date of event: 2020-10-19 - 2020-10-23
%C Virtual Event, Ireland
%X Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.
%B CIKM '20
%E d&#8217;Aquin, Mathieu; Dietze, Stefan
%P 3445 - 3448
%I ACM
%@ 978-1-4503-6859-9

Paper

R. Saha Roy and A. Anand

“Question Answering over Curated and Open Web Sources,” 2020. [Online]. Available: https://arxiv.org/abs/2004.11980.

mehr

Abstract

The last few years have seen an explosion of research on the topic of
automated question answering (QA), spanning the communities of information
retrieval, natural language processing, and artificial intelligence. This
tutorial would cover the highlights of this really active period of growth for
QA to give the audience a grasp over the families of algorithms that are
currently being used. We partition research contributions by the underlying
source from where answers are retrieved: curated knowledge graphs, unstructured
text, or hybrid corpora. We choose this dimension of partitioning as it is the
most discriminative when it comes to algorithm design. Other key dimensions are
covered within each sub-topic: like the complexity of questions addressed, and
degrees of explainability and interactivity introduced in the systems. We would
conclude the tutorial with the most promising emerging trends in the expanse of
QA, that would help new entrants into this field make the best decisions to
take the community forward. Much has changed in the community since the last
tutorial on QA in SIGIR 2016, and we believe that this timely overview will
indeed benefit a large number of conference participants.

BibTeX

@online{SahaRoy2004.11980,
TITLE = {Question Answering over Curated and Open Web Sources},
AUTHOR = {Saha Roy, Rishiraj and Anand, Avishek},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2004.11980},
EPRINT = {2004.11980},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {The last few years have seen an explosion of research on the topic of<br>automated question answering (QA), spanning the communities of information<br>retrieval, natural language processing, and artificial intelligence. This<br>tutorial would cover the highlights of this really active period of growth for<br>QA to give the audience a grasp over the families of algorithms that are<br>currently being used. We partition research contributions by the underlying<br>source from where answers are retrieved: curated knowledge graphs, unstructured<br>text, or hybrid corpora. We choose this dimension of partitioning as it is the<br>most discriminative when it comes to algorithm design. Other key dimensions are<br>covered within each sub-topic: like the complexity of questions addressed, and<br>degrees of explainability and interactivity introduced in the systems. We would<br>conclude the tutorial with the most promising emerging trends in the expanse of<br>QA, that would help new entrants into this field make the best decisions to<br>take the community forward. Much has changed in the community since the last<br>tutorial on QA in SIGIR 2016, and we believe that this timely overview will<br>indeed benefit a large number of conference participants.<br>},
}

Endnote

%0 Report
%A Saha Roy, Rishiraj
%A Anand, Avishek
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Question Answering over Curated and Open Web Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-09CA-B
%U https://arxiv.org/abs/2004.11980
%D 2020
%X   The last few years have seen an explosion of research on the topic of<br>automated question answering (QA), spanning the communities of information<br>retrieval, natural language processing, and artificial intelligence. This<br>tutorial would cover the highlights of this really active period of growth for<br>QA to give the audience a grasp over the families of algorithms that are<br>currently being used. We partition research contributions by the underlying<br>source from where answers are retrieved: curated knowledge graphs, unstructured<br>text, or hybrid corpora. We choose this dimension of partitioning as it is the<br>most discriminative when it comes to algorithm design. Other key dimensions are<br>covered within each sub-topic: like the complexity of questions addressed, and<br>degrees of explainability and interactivity introduced in the systems. We would<br>conclude the tutorial with the most promising emerging trends in the expanse of<br>QA, that would help new entrants into this field make the best decisions to<br>take the community forward. Much has changed in the community since the last<br>tutorial on QA in SIGIR 2016, and we believe that this timely overview will<br>indeed benefit a large number of conference participants.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

R. Saha Roy and A. Anand

“Question Answering over Curated and Open Web Sources,” in SIGIR ’20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020.

mehr

BibTeX

@inproceedings{SahaRoy_SIGIR20,
TITLE = {Question Answering over Curated and Open Web Sources},
AUTHOR = {Saha Roy, Rishiraj and Anand, Avishek},
LANGUAGE = {eng},
ISBN = {9781450380164},
DOI = {10.1145/3397271.3401421},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {SIGIR '20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {2432--2435},
ADDRESS = {Virtual Event, China},
}

Endnote

%0 Conference Proceedings
%A Saha Roy, Rishiraj
%A Anand, Avishek
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Question Answering over Curated and Open Web Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-02F6-0
%R 10.1145/3397271.3401421
%D 2020
%B 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2020-07-25 - 2020-07-30
%C Virtual Event, China
%B SIGIR '20
%P 2432 - 2435
%I ACM
%@ 9781450380164

Article

V. Sathya, S. Ghosh, A. Ramamurthy, and B. R. Tamma

“Small Cell Planning: Resource Management and Interference Mitigation Mechanisms in LTE HetNets,” Wireless Personal Communications, vol. 115, 2020.

mehr

BibTeX

@article{Sathya2020,
TITLE = {Small Cell Planning: {R}esource Management and Interference Mitigation Mechanisms in {LTE HetNets}},
AUTHOR = {Sathya, Vanlin and Ghosh, Shrestha and Ramamurthy, Arun and Tamma, Bheemarjuna Reddy},
LANGUAGE = {eng},
ISSN = {0929-6212},
DOI = {10.1007/s11277-020-07574-x},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2020},
JOURNAL = {Wireless Personal Communications},
VOLUME = {115},
PAGES = {335--361},
}

Endnote

%0 Journal Article
%A Sathya, Vanlin
%A Ghosh, Shrestha
%A Ramamurthy, Arun
%A Tamma, Bheemarjuna Reddy
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Small Cell Planning: Resource Management and Interference Mitigation Mechanisms in LTE HetNets : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-B963-A
%R 10.1007/s11277-020-07574-x
%7 2020
%D 2020
%J Wireless Personal Communications
%V 115
%& 335
%P 335 - 361
%I Springer
%C New York, NY
%@ false

Conference paper

X. Shen, E. Chang, H. Su, C. Niu, and D. Klakow

“Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence,” in The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020.

mehr

BibTeX

@inproceedings{shen2020neural,
TITLE = {Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence},
AUTHOR = {Shen, Xiaoyu and Chang, Ernie and Su, Hui and Niu, Cheng and Klakow, Dietrich},
LANGUAGE = {eng},
ISBN = {978-1-952148-25-5},
URL = {https://www.aclweb.org/anthology/2020.acl-main.641},
DOI = {10.18653/v1/2020.acl-main.641},
PUBLISHER = {ACL},
YEAR = {2020},
BOOKTITLE = {The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)},
EDITOR = {Jurafsky, Dan and Chai, Joyce and Schluter, Natalie and Tetreault, Joel},
PAGES = {7155--7165},
}

Endnote

%0 Conference Proceedings
%A Shen, Xiaoyu
%A Chang, Ernie
%A Su, Hui
%A Niu, Cheng
%A Klakow, Dietrich
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-141B-4
%U https://www.aclweb.org/anthology/2020.acl-main.641
%R 10.18653/v1/2020.acl-main.641
%D 2020
%B 58th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2020-07-05 - 2020-07-10
%B The 58th Annual Meeting of the Association for Computational Linguistics
%E Jurafsky, Dan; Chai, Joyce; Schluter, Natalie; Tetreault, Joel
%P 7155 - 7165
%I ACL
%@ 978-1-952148-25-5

Conference paper

H. Su, X. Shen, S. Zhao, Z. Xiao, P. Hu, C. Niu, and J. Zhou

“Diversifying Dialogue Generation with Non-Conversational Text,” in The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020.

mehr

BibTeX

@inproceedings{su2020diversifying,
TITLE = {Diversifying Dialogue Generation with Non-Conversational Text},
AUTHOR = {Su, Hui and Shen, Xiaoyu and Zhao, Sanqiang and Xiao, Zhou and Hu, Pengwei and Niu, Cheng and Zhou, Jie},
LANGUAGE = {eng},
ISBN = {978-1-952148-25-5},
URL = {https://www.aclweb.org/anthology/2020.acl-main.634},
DOI = {10.18653/v1/2020.acl-main.634},
PUBLISHER = {ACL},
YEAR = {2020},
BOOKTITLE = {The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)},
EDITOR = {Jurafsky, Dan and Chai, Joyce and Schluter, Natalie and Tetreault, Joel},
PAGES = {7087--7097},
}

Endnote

%0 Conference Proceedings
%A Su, Hui
%A Shen, Xiaoyu
%A Zhao, Sanqiang
%A Xiao, Zhou
%A Hu, Pengwei
%A Niu, Cheng
%A Zhou, Jie
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Diversifying Dialogue Generation with Non-Conversational Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-14AF-D
%U https://www.aclweb.org/anthology/2020.acl-main.634
%R 10.18653/v1/2020.acl-main.634
%D 2020
%B 58th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2020-07-05 - 2020-07-10
%B The 58th Annual Meeting of the Association for Computational Linguistics
%E Jurafsky, Dan; Chai, Joyce; Schluter, Natalie; Tetreault, Joel
%P 7087 - 7097
%I ACL
%@ 978-1-952148-25-5

Thesis

S. Sukarieh

“SPRAP: Detecting Opinion Spam Campaigns in Online Rating Services,” Universität des Saarlandes, Saarbrücken, 2020.

mehr

BibTeX

@mastersthesis{sukarieh:20:sprap,
TITLE = {{SPRAP}: Detecting Opinion Spam Campaigns in Online Rating Services},
AUTHOR = {Sukarieh, Sandra},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2020},
DATE = {2020},
}

Endnote

%0 Thesis
%A Sukarieh, Sandra
%Y Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T SPRAP: Detecting Opinion Spam Campaigns in Online Rating Services : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FF00-A
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2020
%V master
%9 master

Article

C. Sutton, M. Boley, L. Ghiringhelli, M. Rupp, J. Vreeken, and M. Scheffler,

“Identifying Domains of Applicability of Machine Learning Models for Materials Science,” Nature Communications, vol. 11, 2020.

mehr

BibTeX

@article{sutton:20:natcomm,
TITLE = {Identifying Domains of Applicability of Machine Learning Models for Materials Science},
AUTHOR = {Sutton, Chris and Boley, Mario and Ghiringhelli, Luca and Rupp, Matthias and Vreeken, Jilles and Scheffler,, Matthias},
LANGUAGE = {eng},
ISSN = {2041-1723},
DOI = {10.1038/s41467-020-17112-9},
PUBLISHER = {Nature Publishing Group},
ADDRESS = {London},
YEAR = {2020},
JOURNAL = {Nature Communications},
VOLUME = {11},
EID = {4428},
}

Endnote

%0 Journal Article
%A Sutton, Chris
%A Boley, Mario
%A Ghiringhelli, Luca
%A Rupp, Matthias
%A Vreeken, Jilles
%A Scheffler,, Matthias
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Identifying Domains of Applicability of Machine Learning Models for Materials Science : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-26CF-5
%R 10.1038/s41467-020-17112-9
%7 2020
%D 2020
%J Nature Communications
%O Nat. Commun.
%V 11
%Z sequence number: 4428
%I Nature Publishing Group
%C London
%@ false

Conference paper

E. Terolli, P. Ernst, and G. Weikum

“Focused Query Expansion with Entity Cores for Patient-Centric Health Search,” in The Semantic Web -- ISWC 2020, Athens, Greece (Virtual Conference), 2020.

mehr

BibTeX

@inproceedings{Terolli_ISWC2020,
TITLE = {Focused Query Expansion with Entity Cores for Patient-Centric Health Search},
AUTHOR = {Terolli, Erisa and Ernst, Patrick and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-030-62418-7},
DOI = {10.1007/978-3-030-62419-4_31},
PUBLISHER = {Springer},
YEAR = {2020},
DATE = {2020},
BOOKTITLE = {The Semantic Web -- ISWC 2020},
EDITOR = {Pan, Jeff Z. and Tamma, Valentina and D'Amato, Claudia and Janowicz, Krzysztof and Fu, Bo and Polleres, Axel and Seneviratne, Oshani and Kagal, Lalana},
PAGES = {547--564},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {12506},
ADDRESS = {Athens, Greece (Virtual Conference)},
}

Endnote

%0 Conference Proceedings
%A Terolli, Erisa
%A Ernst, Patrick
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Focused Query Expansion with Entity Cores for Patient-Centric Health Search  : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-78D7-0
%R 10.1007/978-3-030-62419-4_31
%D 2020
%B 19th International Semantic Web Conference
%Z date of event: 2020-11-02 - 2020-11-06
%C Athens, Greece (Virtual Conference)
%B The Semantic Web -- ISWC 2020
%E Pan, Jeff Z.; Tamma, Valentina; D'Amato, Claudia; Janowicz, Krzysztof; Fu, Bo; Polleres, Axel; Seneviratne, Oshani; Kagal, Lalana
%P 547 - 564
%I Springer
%@ 978-3-030-62418-7
%B Lecture Notes in Computer Science
%N 12506

Conference paper

A. Tigunova

“Extracting Personal Information from Conversations,” in Companion of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.

mehr

BibTeX

@inproceedings{tigunova2020extracting,
TITLE = {Extracting Personal Information from Conversations},
AUTHOR = {Tigunova, Anna},
LANGUAGE = {eng},
ISBN = {978-1-4503-7024-0},
DOI = {10.1145/3366424.3382089},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2020)},
EDITOR = {El Fallah, Amal and Sukthankar, Gita and Liu, Tie-Yan and van Steen, Maarten},
PAGES = {284--288},
ADDRESS = {Taipei, Taiwan},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Extracting Personal Information from Conversations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F845-4
%R 10.1145/3366424.3382089
%D 2020
%B The World Wide Web Conference
%Z date of event: 2020-04-20 - 2020-04-24
%C Taipei, Taiwan
%B Companion of The World Wide Web Conference
%E El Fallah, Amal; Sukthankar, Gita; Liu, Tie-Yan; van Steen, Maarten
%P 284 - 288
%I ACM
%@ 978-1-4503-7024-0

Conference paper

A. Tigunova, A. Yates, P. Mirza, and G. Weikum

“CHARM: Inferring Personal Attributes from Conversations,” in The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Online, 2020.

mehr

BibTeX

@inproceedings{Tigunova_EMNLP20,
TITLE = {{CHARM}: {I}nferring Personal Attributes from Conversations},
AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-952148-60-6},
URL = {https://www.aclweb.org/anthology/2020.emnlp-main.434},
DOI = {10.18653/v1/2020.emnlp-main.434},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)},
EDITOR = {Webber, Bonnie and Cohn, Trevor and He, Yulan and Liu, Yang},
PAGES = {5391--5404},
ADDRESS = {Online},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%A Yates, Andrew
%A Mirza, Paramita
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CHARM: Inferring Personal Attributes from Conversations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EEDB-7
%U https://www.aclweb.org/anthology/2020.emnlp-main.434
%R 10.18653/v1/2020.emnlp-main.434
%D 2020
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2020-11-16 - 2020-11-20
%C Online
%B The 2020 Conference on Empirical Methods in Natural Language Processing
%E Webber, Bonnie; Cohn, Trevor; He, Yulan; Liu, Yang
%P 5391 - 5404
%I ACM
%@ 978-1-952148-60-6
%U https://www.aclweb.org/anthology/2020.emnlp-main.434.pdf

Conference paper

A. Tigunova, P. Mirza, A. Yates, and G. Weikum

“RedDust: a Large Reusable Dataset of Reddit User Traits,” in Twelfth Language Resources and Evaluation Conference (LREC 2020), Marseille, France, 2020.

mehr

BibTeX

@inproceedings{Tigunova_ELREC20,
TITLE = {{RedDust}: a Large Reusable Dataset of {Reddit} User Traits},
AUTHOR = {Tigunova, Anna and Mirza, Paramita and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {979-10-95546-34-4},
URL = {https://www.aclweb.org/anthology/2020.lrec-1.751},
PUBLISHER = {ELRA},
YEAR = {2020},
BOOKTITLE = {Twelfth Language Resources and Evaluation Conference (LREC 2020)},
EDITOR = {Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odiik, Jan and Piperidis, Stelios},
PAGES = {6118--6126},
ADDRESS = {Marseille, France},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%A Mirza, Paramita
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T RedDust: a Large Reusable Dataset of Reddit User Traits : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F0A9-B
%U https://www.aclweb.org/anthology/2020.lrec-1.751
%D 2020
%B 12th Language Resources and Evaluation Conference
%Z date of event: 2020-05-11 - 2020-05-16
%C Marseille, France
%B Twelfth Language Resources and Evaluation Conference
%E Calzolari, Nicoletta; B&#233;chet, Fr&#233;d&#233;ric; Blache, Philippe; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Mariani, Joseph; Mazo, H&#233;l&#232;ne; Moreno, Asuncion; Odiik, Jan; Piperidis, Stelios
%P 6118 - 6126
%I ELRA
%@ 979-10-95546-34-4
%U https://www.aclweb.org/anthology/2020.lrec-1.751.pdf

Conference paper

G. H. Torbati, A. Yates, and G. Weikum

“Personalized Entity Search by Sparse and Scrutable User Profiles,” in CHIIR ’20, Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval, Vancouver, BC, Canada, 2020.

mehr

BibTeX

@inproceedings{CHIIR2020Torbati,
TITLE = {Personalized Entity Search by Sparse and Scrutable User Profiles},
AUTHOR = {Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450368926},
DOI = {10.1145/3343413.3378011},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {CHIIR '20, Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval},
EDITOR = {O'Brain, Heather and Freund, Luanne},
PAGES = {427--431},
ADDRESS = {Vancouver, BC, Canada},
}

Endnote

%0 Conference Proceedings
%A Torbati, Ghazaleh Haratinezhad
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Personalized Entity Search by Sparse and Scrutable User Profiles : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EAD7-F
%R 10.1145/3343413.3378011
%D 2020
%B Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval
%Z date of event: 2020-03-14 - 2020-03-18
%C Vancouver, BC, Canada
%B CHIIR '20
%E O'Brain, Heather; Freund, Luanne
%P 427 - 431
%I ACM
%@ 9781450368926

Conference paper

T.-K. Tran, M. H. Gad-Elrab, D. Stepanova, E. Kharlamov, and J. Strötgen

“Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs,” in Companion of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.

mehr

BibTeX

@inproceedings{DBLP:conf/www/TranG0KS20,
TITLE = {Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs},
AUTHOR = {Tran, Trung-Kien and Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Kharlamov, Evgeny and Str{\"o}tgen, Jannik},
LANGUAGE = {eng},
ISBN = {978-1-4503-7024-0},
DOI = {10.1145/3366423.3380014},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2020)},
EDITOR = {El Fallah, Amal and Sukthankar, Gita and Liu, Tie-Yan and van Steen, Maarten},
PAGES = {2613--2619},
ADDRESS = {Taipei, Taiwan},
}

Endnote

%0 Conference Proceedings
%A Tran, Trung-Kien
%A Gad-Elrab, Mohamed Hassan
%A Stepanova, Daria
%A Kharlamov, Evgeny
%A Str&#246;tgen, Jannik
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F861-4
%R 10.1145/3366423.3380014
%D 2020
%B The World Wide Web Conference
%Z date of event: 2020-04-20 - 2020-04-24
%C Taipei, Taiwan
%B Companion of The World Wide Web Conference
%E El Fallah, Amal; Sukthankar, Gita; Liu, Tie-Yan; van Steen, Maarten
%P 2613 - 2619
%I ACM
%@ 978-1-4503-7024-0

Conference paper

L. Wang, X. Shen, G. de Melo, and G. Weikum

“Cross-Domain Learning for Classifying Propaganda in Online Contents,” in Proceedings of the 2020 Truth and Trust Online Conference (TTO 2020), Virtual, 2020.

mehr

BibTeX

@inproceedings{Wang_TTO2020,
TITLE = {Cross-Domain Learning for Classifying Propaganda in Online Contents},
AUTHOR = {Wang, Liqiang and Shen, Xiaoyu and de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-7359904-0-8},
URL = {https://truthandtrustonline.com/wp-content/uploads/2020/10/TTO03.pdf},
PUBLISHER = {Hacks Hackers},
YEAR = {2020},
BOOKTITLE = {Proceedings of the 2020 Truth and Trust Online Conference (TTO 2020)},
EDITOR = {De Cristofaro, Emiliano and Nakov, Preslav},
PAGES = {21--31},
ADDRESS = {Virtual},
}

Endnote

%0 Conference Proceedings
%A Wang, Liqiang
%A Shen, Xiaoyu
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cross-Domain Learning for Classifying Propaganda in Online Contents : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F169-3
%U https://truthandtrustonline.com/wp-content/uploads/2020/10/TTO03.pdf
%D 2020
%B Truth and Trust Online Conference
%Z date of event: 2020-10-16 - 2020-10-17
%C Virtual
%B Proceedings of the 2020 Truth and Trust Online Conference
%E De Cristofaro, Emiliano; Nakov, Preslav
%P 21 - 31
%I Hacks Hackers
%@  978-1-7359904-0-8
%U https://truthandtrustonline.com/wp-content/uploads/2020/10/TTO03.pdf

Paper

L. Wang, X. Shen, G. de Melo, and G. Weikum

“Cross-Domain Learning for Classifying Propaganda in Online Contents,” 2020. [Online]. Available: https://arxiv.org/abs/2011.06844.

mehr

Abstract

As news and social media exhibit an increasing amount of manipulative
polarized content, detecting such propaganda has received attention as a new
task for content analysis. Prior work has focused on supervised learning with
training data from the same domain. However, as propaganda can be subtle and
keeps evolving, manual identification and proper labeling are very demanding.
As a consequence, training data is a major bottleneck. In this paper, we tackle
this bottleneck and present an approach to leverage cross-domain learning,
based on labeled documents and sentences from news and tweets, as well as
political speeches with a clear difference in their degrees of being
propagandistic. We devise informative features and build various classifiers
for propaganda labeling, using cross-domain learning. Our experiments
demonstrate the usefulness of this approach, and identify difficulties and
limitations in various configurations of sources and targets for the transfer
step. We further analyze the influence of various features, and characterize
salient indicators of propaganda.

BibTeX

@online{Wang_2011.06844,
TITLE = {Cross-Domain Learning for Classifying Propaganda in Online Contents},
AUTHOR = {Wang, Liqiang and Shen, Xiaoyu and de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2011.06844},
EPRINT = {2011.06844},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {As news and social media exhibit an increasing amount of manipulative<br>polarized content, detecting such propaganda has received attention as a new<br>task for content analysis. Prior work has focused on supervised learning with<br>training data from the same domain. However, as propaganda can be subtle and<br>keeps evolving, manual identification and proper labeling are very demanding.<br>As a consequence, training data is a major bottleneck. In this paper, we tackle<br>this bottleneck and present an approach to leverage cross-domain learning,<br>based on labeled documents and sentences from news and tweets, as well as<br>political speeches with a clear difference in their degrees of being<br>propagandistic. We devise informative features and build various classifiers<br>for propaganda labeling, using cross-domain learning. Our experiments<br>demonstrate the usefulness of this approach, and identify difficulties and<br>limitations in various configurations of sources and targets for the transfer<br>step. We further analyze the influence of various features, and characterize<br>salient indicators of propaganda.<br>},
}

Endnote

%0 Report
%A Wang, Liqiang
%A Shen, Xiaoyu
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cross-Domain Learning for Classifying Propaganda in Online Contents : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FEBF-5
%U https://arxiv.org/abs/2011.06844
%D 2020
%X   As news and social media exhibit an increasing amount of manipulative<br>polarized content, detecting such propaganda has received attention as a new<br>task for content analysis. Prior work has focused on supervised learning with<br>training data from the same domain. However, as propaganda can be subtle and<br>keeps evolving, manual identification and proper labeling are very demanding.<br>As a consequence, training data is a major bottleneck. In this paper, we tackle<br>this bottleneck and present an approach to leverage cross-domain learning,<br>based on labeled documents and sentences from news and tweets, as well as<br>political speeches with a clear difference in their degrees of being<br>propagandistic. We devise informative features and build various classifiers<br>for propaganda labeling, using cross-domain learning. Our experiments<br>demonstrate the usefulness of this approach, and identify difficulties and<br>limitations in various configurations of sources and targets for the transfer<br>step. We further analyze the influence of various features, and characterize<br>salient indicators of propaganda.<br>
%K Computer Science, Computation and Language, cs.CL

Paper

G. Weikum, L. Dong, S. Razniewski, and F. Suchanek

“Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases,” 2020. [Online]. Available: https://arxiv.org/abs/2009.11564.

mehr

Abstract

Equipping machines with comprehensive knowledge of the world's entities and
their relationships has been a long-standing goal of AI. Over the last decade,
large-scale knowledge bases, also known as knowledge graphs, have been
automatically constructed from web contents and text sources, and have become a
key asset for search engines. This machine knowledge can be harnessed to
semantically interpret textual phrases in news, social media and web tables,
and contributes to question answering, natural language processing and data
analytics. This article surveys fundamental concepts and practical methods for
creating and curating large knowledge bases. It covers models and methods for
discovering and canonicalizing entities and their semantic types and organizing
them into clean taxonomies. On top of this, the article discusses the automatic
extraction of entity-centric properties. To support the long-term life-cycle
and the quality assurance of machine knowledge, the article presents methods
for constructing open schemas and for knowledge curation. Case studies on
academic projects and industrial knowledge graphs complement the survey of
concepts and methods.

BibTeX

@online{Weikum_2009.11564,
TITLE = {Machine Knowledge: {C}reation and Curation of Comprehensive Knowledge Bases},
AUTHOR = {Weikum, Gerhard and Dong, Luna and Razniewski, Simon and Suchanek, Fabian},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2009.11564},
EPRINT = {2009.11564},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Equipping machines with comprehensive knowledge of the world's entities and<br>their relationships has been a long-standing goal of AI. Over the last decade,<br>large-scale knowledge bases, also known as knowledge graphs, have been<br>automatically constructed from web contents and text sources, and have become a<br>key asset for search engines. This machine knowledge can be harnessed to<br>semantically interpret textual phrases in news, social media and web tables,<br>and contributes to question answering, natural language processing and data<br>analytics. This article surveys fundamental concepts and practical methods for<br>creating and curating large knowledge bases. It covers models and methods for<br>discovering and canonicalizing entities and their semantic types and organizing<br>them into clean taxonomies. On top of this, the article discusses the automatic<br>extraction of entity-centric properties. To support the long-term life-cycle<br>and the quality assurance of machine knowledge, the article presents methods<br>for constructing open schemas and for knowledge curation. Case studies on<br>academic projects and industrial knowledge graphs complement the survey of<br>concepts and methods.<br>},
}

Endnote

%0 Report
%A Weikum, Gerhard
%A Dong, Luna
%A Razniewski, Simon
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-F1A6-D
%U https://arxiv.org/abs/2009.11564
%D 2020
%X   Equipping machines with comprehensive knowledge of the world's entities and<br>their relationships has been a long-standing goal of AI. Over the last decade,<br>large-scale knowledge bases, also known as knowledge graphs, have been<br>automatically constructed from web contents and text sources, and have become a<br>key asset for search engines. This machine knowledge can be harnessed to<br>semantically interpret textual phrases in news, social media and web tables,<br>and contributes to question answering, natural language processing and data<br>analytics. This article surveys fundamental concepts and practical methods for<br>creating and curating large knowledge bases. It covers models and methods for<br>discovering and canonicalizing entities and their semantic types and organizing<br>them into clean taxonomies. On top of this, the article discusses the automatic<br>extraction of entity-centric properties. To support the long-term life-cycle<br>and the quality assurance of machine knowledge, the article presents methods<br>for constructing open schemas and for knowledge curation. Case studies on<br>academic projects and industrial knowledge graphs complement the survey of<br>concepts and methods.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB,Computer Science, General Literature, cs.GL

Article

G. Weikum

“Entities with Quantities,” Bulletin of the Technical Committee on Data Engineering, vol. 43, no. 1, 2020.

mehr

BibTeX

@article{Weikum_Entities2020,
TITLE = {Entities with Quantities},
AUTHOR = {Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://sites.computer.org/debull/A20mar/p4.pdf},
PUBLISHER = {IEEE Computer Society},
ADDRESS = {Los Alamitos, CA},
YEAR = {2020},
JOURNAL = {Bulletin of the Technical Committee on Data Engineering},
VOLUME = {43},
NUMBER = {1},
PAGES = {4--8},
}

Endnote

%0 Journal Article
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Entities with Quantities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-EBBB-E
%U http://sites.computer.org/debull/A20mar/p4.pdf
%7 2020
%D 2020
%J Bulletin of the Technical Committee on Data Engineering
%V 43
%N 1
%& 4
%P 4 - 8
%I IEEE Computer Society
%C Los Alamitos, CA

Conference paper

B. Xu, S. Qiu, J. Zhang, Y. Wang, X. Shen, and G. de Melo

“Data Augmentation for Multiclass Utterance Classification - A Systematic Study,” in The 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain (Online), 2020.

mehr

BibTeX

@inproceedings{xu2020data,
TITLE = {Data Augmentation for Multiclass Utterance Classification -- A Systematic Study},
AUTHOR = {Xu, Binxia and Qiu, Siyuan and Zhang, Jie and Wang, Yafang and Shen, Xiaoyu and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-1-952148-27-9},
URL = {https://www.aclweb.org/anthology/2020.coling-main.479},
DOI = {10.18653/v1/2020.coling-main.479},
PUBLISHER = {ACL},
YEAR = {2020},
BOOKTITLE = {The 28th International Conference on Computational Linguistics (COLING 2020)},
EDITOR = {Scott, Donia and Bel, Nuria and Zong, Chengqing},
PAGES = {5494--5506},
ADDRESS = {Barcelona, Spain (Online)},
}

Endnote

%0 Conference Proceedings
%A Xu, Binxia
%A Qiu, Siyuan
%A Zhang, Jie
%A Wang, Yafang
%A Shen, Xiaoyu
%A de Melo, Gerard
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Data Augmentation for Multiclass Utterance Classification - A Systematic Study : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-1498-6
%U https://www.aclweb.org/anthology/2020.coling-main.479
%R 10.18653/v1/2020.coling-main.479
%D 2020
%B The 28th International Conferenceon Computational Linguistics
%Z date of event: 2020-12-08 - 2020-12-13
%C Barcelona, Spain (Online)
%B The 28th International Conference on Computational Linguistics
%E Scott, Donia; Bel, Nuria; Zong, Chengqing
%P 5494 - 5506
%I ACL
%@ 978-1-952148-27-9

Conference paper

A. Yates, K. M. Jose, X. Zhang, and J. Lin

“Flexible IR Pipelines with Capreolus,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.

mehr

Abstract

BibTeX

@inproceedings{Yates_CIKM2020,
TITLE = {Flexible {IR} Pipelines with {Capreolus}},
AUTHOR = {Yates, Andrew and Jose, Kevin Martin and Zhang, Xinyu and Lin, Jimmy},
LANGUAGE = {eng},
ISBN = {978-1-4503-6859-9},
DOI = {10.1145/3340531.3412780},
PUBLISHER = {ACM},
YEAR = {2020},
DATE = {2020},
ABSTRACT = {Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.},
BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management},
EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan},
PAGES = {3181--3188},
ADDRESS = {Virtual Event, Ireland},
}

Endnote

%0 Conference Proceedings
%A Yates, Andrew
%A Jose, Kevin Martin
%A Zhang, Xinyu
%A Lin, Jimmy
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Flexible IR Pipelines with Capreolus : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-066A-B
%R 10.1145/3340531.3412780
%D 2020
%B 29th ACM International Conference on Information & Knowledge Management
%Z date of event: 2020-10-19 - 2020-10-23
%C Virtual Event, Ireland
%X Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.
%B CIKM '20
%E d&#8217;Aquin, Mathieu; Dietze, Stefan
%P 3181 - 3188
%I ACM
%@ 978-1-4503-6859-9

Conference paper

A. Yates, S. Arora, X. Zhang, W. Yang, K. M. Jose, and J. Lin

“Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.

mehr

BibTeX

@inproceedings{YatesWSDM2020,
TITLE = {Capreolus: {A} Toolkit for End-to-End Neural Ad Hoc Retrieval},
AUTHOR = {Yates, Andrew and Arora, Siddhant and Zhang, Xinyu and Yang, Wei and Jose, Kevin Martin and Lin, Jimmy},
LANGUAGE = {eng},
ISBN = {9781450368223},
DOI = {10.1145/3336191.3371868},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining},
EDITOR = {Caverlee, James and Hu, Xia Ben},
PAGES = {861--864},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Yates, Andrew
%A Arora, Siddhant
%A Zhang, Xinyu
%A Yang, Wei
%A Jose, Kevin Martin
%A Lin, Jimmy
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-A28E-3
%R 10.1145/3336191.3371868
%D 2020
%B 13th International Conference on Web Search and Data Mining
%Z date of event: 2020-02-03 - 2020-02-07
%C Houston, TX, USA
%B WSDM '20
%E Caverlee, James; Hu, Xia Ben
%P 861 - 864
%I ACM
%@ 9781450368223

Conference paper

Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates

“BERT-QE: Contextualized Query Expansion for Document Re-ranking,” in Findings of the ACL: EMNLP 2020, Online, 2020.

mehr

BibTeX

@inproceedings{Zheng_EMNLP20,
TITLE = {{BERT-QE}: {C}ontextualized Query Expansion for Document Re-ranking},
AUTHOR = {Zheng, Zhi and Hui, Kai and He, Ben and Han, Xianpei and Sun, Le and Yates, Andrew},
LANGUAGE = {eng},
ISBN = {978-1-952148-90-3},
URL = {https://www.aclweb.org/anthology/2020.findings-emnlp.424},
DOI = {10.18653/v1/2020.findings-emnlp.424},
PUBLISHER = {ACM},
YEAR = {2020},
BOOKTITLE = {Findings of the ACL: EMNLP 2020},
EDITOR = {Cohn, Trevor and He, Yulan and Liu, Yang},
PAGES = {4718--4728},
SERIES = {Findings of the Association for Computational Linguistics},
VOLUME = {1},
ADDRESS = {Online},
}

Endnote

%0 Conference Proceedings
%A Zheng, Zhi
%A Hui, Kai
%A He, Ben
%A Han, Xianpei
%A Sun, Le
%A Yates, Andrew
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T BERT-QE: Contextualized Query Expansion for Document Re-ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0687-9
%U https://www.aclweb.org/anthology/2020.findings-emnlp.424
%R 10.18653/v1/2020.findings-emnlp.424
%D 2020
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2020-11-16 - 2020-11-20
%C Online
%B Findings of the ACL: EMNLP 2020
%E Cohn, Trevor; He, Yulan; Liu, Yang
%P 4718 - 4728
%I ACM

%@ 978-1-952148-90-3
%B Findings of the Association for Computational Linguistics
%N 1
%U https://www.aclweb.org/anthology/2020.findings-emnlp.424.pdf

Paper

Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates

“BERT-QE: Contextualized Query Expansion for Document Re-ranking,” 2020. [Online]. Available: https://arxiv.org/abs/2009.07258.

mehr

Abstract

Query expansion aims to mitigate the mismatch between the language used in a
query and in a document. However, query expansion methods can suffer from
introducing non-relevant information when expanding the query. To bridge this
gap, inspired by recent advances in applying contextualized models like BERT to
the document retrieval task, this paper proposes a novel query expansion model
that leverages the strength of the BERT model to select relevant document
chunks for expansion. In evaluation on the standard TREC Robust04 and GOV2 test
collections, the proposed BERT-QE model significantly outperforms BERT-Large
models.

BibTeX

@online{Zheng2009.07258,
TITLE = {{BERT}-{QE}: Contextualized Query Expansion for Document Re-ranking},
AUTHOR = {Zheng, Zhi and Hui, Kai and He, Ben and Han, Xianpei and Sun, Le and Yates, Andrew},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/2009.07258},
EPRINT = {2009.07258},
EPRINTTYPE = {arXiv},
YEAR = {2020},
ABSTRACT = {Query expansion aims to mitigate the mismatch between the language used in a<br>query and in a document. However, query expansion methods can suffer from<br>introducing non-relevant information when expanding the query. To bridge this<br>gap, inspired by recent advances in applying contextualized models like BERT to<br>the document retrieval task, this paper proposes a novel query expansion model<br>that leverages the strength of the BERT model to select relevant document<br>chunks for expansion. In evaluation on the standard TREC Robust04 and GOV2 test<br>collections, the proposed BERT-QE model significantly outperforms BERT-Large<br>models.<br>},
}

Endnote

%0 Report
%A Zheng, Zhi
%A Hui, Kai
%A He, Ben
%A Han, Xianpei
%A Sun, Le
%A Yates, Andrew
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T BERT-QE: Contextualized Query Expansion for Document Re-ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-06D5-1
%U https://arxiv.org/abs/2009.07258
%D 2020
%X   Query expansion aims to mitigate the mismatch between the language used in a<br>query and in a document. However, query expansion methods can suffer from<br>introducing non-relevant information when expanding the query. To bridge this<br>gap, inspired by recent advances in applying contextualized models like BERT to<br>the document retrieval task, this paper proposes a novel query expansion model<br>that leverages the strength of the BERT model to select relevant document<br>chunks for expansion. In evaluation on the standard TREC Robust04 and GOV2 test<br>collections, the proposed BERT-QE model significantly outperforms BERT-Large<br>models.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

2019

Thesis

D5IMPR-CS

M. Abouhamra

“AligNarr: Aligning Narratives of Different Length for Movie Summarization,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

Abstract

Automatic text alignment is an important problem in natural language processing. It
can be used to create the data needed to train different language models. Most research
about automatic summarization revolves around summarizing news articles or scientific
papers, which are somewhat small texts with simple and clear structure. The bigger the
difference in size between the summary and the original text, the harder the problem will
be since important information will be sparser and identifying them can be more difficult.
Therefore, creating datasets from larger texts can help improve automatic summarization.
In this project, we try to develop an algorithm which can automatically create a
dataset for abstractive automatic summarization for bigger narrative text bodies such
as movie scripts. To this end, we chose sentences as summary text units and scenes
as script text units and developed an algorithm which uses some of the latest natural
language processing techniques to align scenes and sentences based on the similarity in
their meanings.
Solving this alignment problem can provide us with important information about how
to evaluate the meaning of a text, which can help us create better abstractive summariza-
tion models. We developed a method which uses different similarity scoring techniques
(embedding similarity, word inclusion and entity inclusion) to align script scenes and sum-
mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that
the bigger the differences in the number of text units being aligned, the more difficult the
alignment problem is. We also critiqued of our own similarity scoring techniques and dif-
ferent alignment algorithms based on integer linear programming and local optimization
and showed their limitations and discussed ideas to improve them.

BibTeX

@mastersthesis{AbouhamraMSc2019,
TITLE = {{AligNarr}: Aligning Narratives of Different Length for Movie Summarization},
AUTHOR = {Abouhamra, Mostafa},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {Automatic text alignment is an important problem in natural language processing. It<br>can be used to create the data needed to train different language models. Most research<br>about automatic summarization revolves around summarizing news articles or scientific<br>papers, which are somewhat small texts with simple and clear structure. The bigger the<br>difference in size between the summary and the original text, the harder the problem will<br>be since important information will be sparser and identifying them can be more difficult.<br>Therefore, creating datasets from larger texts can help improve automatic summarization.<br>In this project, we try to develop an algorithm which can automatically create a<br>dataset for abstractive automatic summarization for bigger narrative text bodies such<br>as movie scripts. To this end, we chose sentences as summary text units and scenes<br>as script text units and developed an algorithm which uses some of the latest natural<br>language processing techniques to align scenes and sentences based on the similarity in<br>their meanings.<br>Solving this alignment problem can provide us with important information about how<br>to evaluate the meaning of a text, which can help us create better abstractive summariza-<br>tion models. We developed a method which uses different similarity scoring techniques<br>(embedding similarity, word inclusion and entity inclusion) to align script scenes and sum-<br>mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that<br>the bigger the differences in the number of text units being aligned, the more difficult the<br>alignment problem is. We also critiqued of our own similarity scoring techniques and dif-<br>ferent alignment algorithms based on integer linear programming and local optimization<br>and showed their limitations and discussed ideas to improve them.},
}

Endnote

%0 Thesis
%A Abouhamra, Mostafa
%Y Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T AligNarr: Aligning Narratives of Different Length for Movie Summarization : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-5836-D
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P 54 p.
%V master
%9 master
%X Automatic text alignment is an important problem in natural language processing. It<br>can be used to create the data needed to train different language models. Most research<br>about automatic summarization revolves around summarizing news articles or scientific<br>papers, which are somewhat small texts with simple and clear structure. The bigger the<br>difference in size between the summary and the original text, the harder the problem will<br>be since important information will be sparser and identifying them can be more difficult.<br>Therefore, creating datasets from larger texts can help improve automatic summarization.<br>In this project, we try to develop an algorithm which can automatically create a<br>dataset for abstractive automatic summarization for bigger narrative text bodies such<br>as movie scripts. To this end, we chose sentences as summary text units and scenes<br>as script text units and developed an algorithm which uses some of the latest natural<br>language processing techniques to align scenes and sentences based on the similarity in<br>their meanings.<br>Solving this alignment problem can provide us with important information about how<br>to evaluate the meaning of a text, which can help us create better abstractive summariza-<br>tion models. We developed a method which uses different similarity scoring techniques<br>(embedding similarity, word inclusion and entity inclusion) to align script scenes and sum-<br>mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that<br>the bigger the differences in the number of text units being aligned, the more difficult the<br>alignment problem is. We also critiqued of our own similarity scoring techniques and dif-<br>ferent alignment algorithms based on integer linear programming and local optimization<br>and showed their limitations and discussed ideas to improve them.

Conference paper

A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum

“ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,” in The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), Minneapolis, MN, USA, 2019.

mehr

BibTeX

@inproceedings{abujabal19comqa,
TITLE = {{ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters},
AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-950737-13-0},
URL = {https://www.aclweb.org/anthology/N19-1027},
PUBLISHER = {ACL},
YEAR = {2019},
BOOKTITLE = {The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019)},
EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar},
PAGES = {307--317},
ADDRESS = {Minneapolis, MN, USA},
}

Endnote

%0 Conference Proceedings
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Yahya, Mohamed
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ComQA: A Community-sourced Dataset for Complex Factoid Question
  Answering with Paraphrase Clusters : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-11A7-D
%U https://www.aclweb.org/anthology/N19-1027
%D 2019
%B Annual Conference of the North American Chapter of the Association for Computational Linguistics 
%Z date of event: 2019-06-02 - 2019-06-07
%C Minneapolis, MN, USA 
%B The 2019 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies
%E Burstein, Jill; Doran, Christy; Solorio, Thamar
%P 307 - 317
%I ACL
%@ 978-1-950737-13-0
%U https://www.aclweb.org/anthology/N19-1027

Thesis

D5IMPR-CS

A. Abujabal

“Question Answering over Knowledge Bases with Continuous Learning,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

Abstract

Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset.
For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.

BibTeX

@phdthesis{Abujabalphd2013,
TITLE = {Question Answering over Knowledge Bases with Continuous Learning},
AUTHOR = {Abujabal, Abdalghani},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-279688},
DOI = {10.22028/D291-27968},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. <br> For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.},
}

Endnote

%0 Thesis
%A Abujabal, Abdalghani
%Y Weikum, Gerhard
%A referee: Linn, Jimmy
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Question Answering over Knowledge Bases with Continuous Learning :
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-AEC0-0
%R 10.22028/D291-27968
%U urn:nbn:de:bsz:291--ds-279688
%F OTHER: hdl:20.500.11880/27438
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P 141 p.
%V phd
%9 phd
%X Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27438

Conference paper

M. Alikhani, S. Nag Chowdhury, G. de Melo, and M. Stone

“CITE: A Corpus Of Text-Image Discourse Relations,” in The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, MN, USA, 2019.

mehr

BibTeX

@inproceedings{AlikhaniEtAl2019CITETextImageDiscourse,
TITLE = {{CITE}: {A} Corpus Of Text-Image Discourse Relations},
AUTHOR = {Alikhani, Malihe and Nag Chowdhury, Sreyasi and de Melo, Gerard and Stone, Matthew},
LANGUAGE = {eng},
ISBN = {978-1-950737-13-0},
PUBLISHER = {ACL},
YEAR = {2019},
BOOKTITLE = {The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019)},
EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar},
PAGES = {570--575},
ADDRESS = {Minneapolis, MN, USA},
}

Endnote

%0 Conference Proceedings
%A Alikhani, Malihe
%A Nag Chowdhury, Sreyasi
%A de Melo, Gerard
%A Stone, Matthew
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T CITE: A Corpus Of Text-Image Discourse Relations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-78D8-3
%D 2019
%B Annual Conference of the North American Chapter of the Association for Computational Linguistics 
%Z date of event: 2019-06-02 - 2019-06-07
%C Minneapolis, MN, USA 
%B The 2019 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies
%E Burstein, Jill; Doran, Christy; Solorio, Thamar
%P 570 - 575
%I ACL
%@ 978-1-950737-13-0
%U https://aclweb.org/anthology/papers/N/N19/N19-1056/

Paper

S. Arora and A. Yates

“Investigating Retrieval Method Selection with Axiomatic Features,” 2019. [Online]. Available: http://arxiv.org/abs/1904.05737.

mehr

Abstract

We consider algorithm selection in the context of ad-hoc information
retrieval. Given a query and a pair of retrieval methods, we propose a
meta-learner that predicts how to combine the methods' relevance scores into an
overall relevance score. Inspired by neural models' different properties with
regard to IR axioms, these predictions are based on features that quantify
axiom-related properties of the query and its top ranked documents. We conduct
an evaluation on TREC Web Track data and find that the meta-learner often
significantly improves over the individual methods. Finally, we conduct feature
and query weight analyses to investigate the meta-learner's behavior.

BibTeX

@online{Arora_arXiv1904.05737,
TITLE = {Investigating Retrieval Method Selection with Axiomatic Features},
AUTHOR = {Arora, Siddhant and Yates, Andrew},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1904.05737},
EPRINT = {1904.05737},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {We consider algorithm selection in the context of ad-hoc information<br>retrieval. Given a query and a pair of retrieval methods, we propose a<br>meta-learner that predicts how to combine the methods' relevance scores into an<br>overall relevance score. Inspired by neural models' different properties with<br>regard to IR axioms, these predictions are based on features that quantify<br>axiom-related properties of the query and its top ranked documents. We conduct<br>an evaluation on TREC Web Track data and find that the meta-learner often<br>significantly improves over the individual methods. Finally, we conduct feature<br>and query weight analyses to investigate the meta-learner's behavior.<br>},
}

Endnote

%0 Report
%A Arora, Siddhant
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Investigating Retrieval Method Selection with Axiomatic Features : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-02BF-3
%U http://arxiv.org/abs/1904.05737
%D 2019
%X   We consider algorithm selection in the context of ad-hoc information<br>retrieval. Given a query and a pair of retrieval methods, we propose a<br>meta-learner that predicts how to combine the methods' relevance scores into an<br>overall relevance score. Inspired by neural models' different properties with<br>regard to IR axioms, these predictions are based on features that quantify<br>axiom-related properties of the query and its top ranked documents. We conduct<br>an evaluation on TREC Web Track data and find that the meta-learner often<br>significantly improves over the individual methods. Finally, we conduct feature<br>and query weight analyses to investigate the meta-learner's behavior.<br>
%K Computer Science, Information Retrieval, cs.IR

Conference paper

S. Arora and A. Yates

“Investigating Retrieval Method Selection with Axiomatic Features,” in Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (AMIR 2019), Cologne, Germany, 2019.

mehr

BibTeX

@inproceedings{Arora_AMIR2019,
TITLE = {Investigating Retrieval Method Selection with Axiomatic Features},
AUTHOR = {Arora, Siddhant and Yates, Andrew},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-2360-3},
PUBLISHER = {CEUR-WS},
YEAR = {2019},
BOOKTITLE = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (AMIR 2019)},
EDITOR = {Beel, Joeran and Kolthoff, Lars},
PAGES = {18--31},
EID = {4},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2360},
ADDRESS = {Cologne, Germany},
}

Endnote

%0 Conference Proceedings
%A Arora, Siddhant
%A Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Investigating Retrieval Method Selection with Axiomatic Features : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-028E-A
%D 2019
%B The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval
%Z date of event: 2019-04-14 - 2019-04-14
%C Cologne, Germany
%B Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval
co-located with the 41st European Conference on Information Retrieval
%E Beel, Joeran; Kolthoff, Lars
%P 18 - 31
%Z sequence number: 4
%I CEUR-WS
%B CEUR Workshop Proceedings
%N 2360
%@ false
%U http://ceur-ws.org/Vol-2360/paper4Axiomatic.pdf

Thesis

IMPR-CSD5

J. A. Biega

“Enhancing Privacy and Fairness in Search Systems,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

Abstract

Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.

BibTeX

@phdthesis{biegaphd2019,
TITLE = {Enhancing Privacy and Fairness in Search Systems},
AUTHOR = {Biega, Joanna Asia},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-278861},
DOI = {10.22028/D291-27886},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.},
}

Endnote

%0 Thesis
%A Biega, Joanna Asia
%Y Weikum, Gerhard
%A referee: Gummadi, Krishna
%A referee: Nejdl, Wolfgang
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society
External Organizations
%T Enhancing Privacy and Fairness in Search Systems : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-9AED-5
%R 10.22028/D291-27886 
%U urn:nbn:de:bsz:291--ds-278861
%F OTHER: hdl:20.500.11880/27389
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P 111 p.
%V phd
%9 phd
%X Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27389

Conference paper

A. Chakraborty, N. Mota, A. J. Biega, K. P. Gummadi, and H. Heidari

“On the Impact of Choice Architectures on Inequality in Online Donation Platforms,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.

mehr

BibTeX

@inproceedings{Chakraborty_WWW2019b,
TITLE = {On the Impact of Choice Architectures on Inequality in Online Donation Platforms},
AUTHOR = {Chakraborty, Abhijnan and Mota, Nuno and Biega, Asia J. and Gummadi, Krishna P. and Heidari, Hoda},
LANGUAGE = {eng},
ISBN = {978-1-4503-6674-8},
DOI = {10.1145/3308558.3313663},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)},
EDITOR = {McAuley, Julian},
PAGES = {2623--2629},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Chakraborty, Abhijnan
%A Mota, Nuno
%A Biega, Asia J.
%A Gummadi, Krishna P.
%A Heidari, Hoda
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T On the Impact of Choice Architectures on Inequality in Online Donation Platforms : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-FC88-9
%R 10.1145/3308558.3313663
%D 2019
%B The Web Conference
%Z date of event: 2019-05-13 - 2019-05-17
%C San Francisco, CA, USA
%B Proceedings of The World Wide Web Conference
%E McAuley, Julian
%P 2623 - 2629
%I ACM
%@ 978-1-4503-6674-8

Article

F. Chierichetti, R. Kumar, A. Panconesi, and E. Terolli

“On the Distortion of Locality Sensitive Hashing,” SIAM Journal on Computing, vol. 48, no. 2, 2019.

mehr

BibTeX

@article{Chierichetti2019,
TITLE = {On the Distortion of Locality Sensitive Hashing},
AUTHOR = {Chierichetti, Flavio and Kumar, Ravi and Panconesi, Alessandro and Terolli, Erisa},
LANGUAGE = {eng},
ISSN = {0097-5397},
DOI = {10.1137/17M1127752},
PUBLISHER = {SIAM},
ADDRESS = {Philadelphia, PA},
YEAR = {2019},
DATE = {2019},
JOURNAL = {SIAM Journal on Computing},
VOLUME = {48},
NUMBER = {2},
PAGES = {350--372},
}

Endnote

%0 Journal Article
%A Chierichetti, Flavio
%A Kumar, Ravi
%A Panconesi, Alessandro
%A Terolli, Erisa
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On the Distortion of Locality Sensitive Hashing : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-A7E7-C
%R 10.1137/17M1127752
%7 2019
%D 2019
%J SIAM Journal on Computing
%V 48
%N 2
%& 350
%P 350 - 372
%I SIAM
%C Philadelphia, PA
%@ false

Paper

P. Christmann, R. Saha Roy, A. Abujabal, J. Singh, and G. Weikum

“Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion,” 2019. [Online]. Available: http://arxiv.org/abs/1910.03262.

mehr

Abstract

Fact-centric information needs are rarely one-shot; users typically ask
follow-up questions to explore a topic. In such a conversational setting, the
user's inputs are often incomplete, with entities or predicates left out, and
ungrammatical phrases. This poses a huge challenge to question answering (QA)
systems that typically rely on cues in full-fledged interrogative sentences. As
a solution, we develop CONVEX: an unsupervised method that can answer
incomplete questions over a knowledge graph (KG) by maintaining conversation
context using entities and predicates seen so far and automatically inferring
missing or ambiguous pieces for follow-up questions. The core of our method is
a graph exploration algorithm that judiciously expands a frontier to find
candidate answers for the current question. To evaluate CONVEX, we release
ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from
five different domains. We show that CONVEX: (i) adds conversational support to
any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and
question completion strategies.

BibTeX

@online{Christmann_arXiv1910.03262,
TITLE = {Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Abujabal, Abdalghani and Singh, Jyotsna and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1910.03262},
EPRINT = {1910.03262},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Fact-centric information needs are rarely one-shot; users typically ask<br>follow-up questions to explore a topic. In such a conversational setting, the<br>user's inputs are often incomplete, with entities or predicates left out, and<br>ungrammatical phrases. This poses a huge challenge to question answering (QA)<br>systems that typically rely on cues in full-fledged interrogative sentences. As<br>a solution, we develop CONVEX: an unsupervised method that can answer<br>incomplete questions over a knowledge graph (KG) by maintaining conversation<br>context using entities and predicates seen so far and automatically inferring<br>missing or ambiguous pieces for follow-up questions. The core of our method is<br>a graph exploration algorithm that judiciously expands a frontier to find<br>candidate answers for the current question. To evaluate CONVEX, we release<br>ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from<br>five different domains. We show that CONVEX: (i) adds conversational support to<br>any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and<br>question completion strategies.<br>},
}

Endnote

%0 Report
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Abujabal, Abdalghani
%A Singh, Jyotsna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83DC-F
%U http://arxiv.org/abs/1910.03262
%D 2019
%X   Fact-centric information needs are rarely one-shot; users typically ask<br>follow-up questions to explore a topic. In such a conversational setting, the<br>user's inputs are often incomplete, with entities or predicates left out, and<br>ungrammatical phrases. This poses a huge challenge to question answering (QA)<br>systems that typically rely on cues in full-fledged interrogative sentences. As<br>a solution, we develop CONVEX: an unsupervised method that can answer<br>incomplete questions over a knowledge graph (KG) by maintaining conversation<br>context using entities and predicates seen so far and automatically inferring<br>missing or ambiguous pieces for follow-up questions. The core of our method is<br>a graph exploration algorithm that judiciously expands a frontier to find<br>candidate answers for the current question. To evaluate CONVEX, we release<br>ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from<br>five different domains. We show that CONVEX: (i) adds conversational support to<br>any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and<br>question completion strategies.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

P. Christmann, R. Saha Roy, A. Abujabal, J. Singh, and G. Weikum

“Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion,” in CIKM ’19, 28th ACM International Conference on Information and Knowledge Management, Beijing China, 2019.

mehr

BibTeX

@inproceedings{Christmann_CIKM2019,
TITLE = {Look before you Hop: {C}onversational Question Answering over Knowledge Graphs Using Judicious Context Expansion},
AUTHOR = {Christmann, Philipp and Saha Roy, Rishiraj and Abujabal, Abdalghani and Singh, Jyotsna and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450369763},
DOI = {10.1145/3357384.3358016},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {CIKM '19, 28th ACM International Conference on Information and Knowledge Management},
EDITOR = {Zhu, Wenwu and Tao, Dacheng},
PAGES = {729--738},
ADDRESS = {Beijing China},
}

Endnote

%0 Conference Proceedings
%A Christmann, Philipp
%A Saha Roy, Rishiraj
%A Abujabal, Abdalghani
%A Singh, Jyotsna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8231-0
%R 10.1145/3357384.3358016
%D 2019
%B 28th ACM International Conference on Information and Knowledge Management
%Z date of event: 2019-11-03 - 2019-11-07
%C Beijing China
%B CIKM '19
%E Zhu, Wenwu; Tao, Dacheng
%P 729 - 738
%I ACM
%@ 9781450369763

Conference paper

C. X. Chu, S. Razniewski, and G. Weikum

“TiFi: Taxonomy Induction for Fictional Domains,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.

mehr

BibTeX

@inproceedings{Chu_WWW2019,
TITLE = {{TiFi}: {T}axonomy Induction for Fictional Domains},
AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-6674-8},
DOI = {10.1145/3308558.3313519},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)},
EDITOR = {McAuley, Julian},
PAGES = {2673--2679},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Chu, Cuong Xuan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TiFi: Taxonomy Induction for Fictional Domains : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-6558-9
%R 10.1145/3308558.3313519
%D 2019
%B The Web Conference
%Z date of event: 2019-05-13 - 2019-05-17
%C San Francisco, CA, USA
%B Proceedings of The World Wide Web Conference
%E McAuley, Julian
%P 2673 - 2679
%I ACM
%@ 978-1-4503-6674-8

Paper

C. X. Chu, S. Razniewski, and G. Weikum

“TiFi: Taxonomy Induction for Fictional Domains [Extended version],” 2019. [Online]. Available: http://arxiv.org/abs/1901.10263.

mehr

Abstract

Taxonomies are important building blocks of structured knowledge bases, and
their construction from text sources and Wikipedia has received much attention.
In this paper we focus on the construction of taxonomies for fictional domains,
using noisy category systems from fan wikis or text extraction as input. Such
fictional domains are archetypes of entity universes that are poorly covered by
Wikipedia, such as also enterprise-specific knowledge bases or highly
specialized verticals. Our fiction-targeted approach, called TiFi, consists of
three phases: (i) category cleaning, by identifying candidate categories that
truly represent classes in the domain of interest, (ii) edge cleaning, by
selecting subcategory relationships that correspond to class subsumption, and
(iii) top-level construction, by mapping classes onto a subset of high-level
WordNet categories. A comprehensive evaluation shows that TiFi is able to
construct taxonomies for a diverse range of fictional domains such as Lord of
the Rings, The Simpsons or Greek Mythology with very high precision and that it
outperforms state-of-the-art baselines for taxonomy induction by a substantial
margin.

BibTeX

@online{Chu_arXIv1901.10263,
TITLE = {{TiFi}: Taxonomy Induction for Fictional Domains [Extended version]},
AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1901.10263},
EPRINT = {1901.10263},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Taxonomies are important building blocks of structured knowledge bases, and<br>their construction from text sources and Wikipedia has received much attention.<br>In this paper we focus on the construction of taxonomies for fictional domains,<br>using noisy category systems from fan wikis or text extraction as input. Such<br>fictional domains are archetypes of entity universes that are poorly covered by<br>Wikipedia, such as also enterprise-specific knowledge bases or highly<br>specialized verticals. Our fiction-targeted approach, called TiFi, consists of<br>three phases: (i) category cleaning, by identifying candidate categories that<br>truly represent classes in the domain of interest, (ii) edge cleaning, by<br>selecting subcategory relationships that correspond to class subsumption, and<br>(iii) top-level construction, by mapping classes onto a subset of high-level<br>WordNet categories. A comprehensive evaluation shows that TiFi is able to<br>construct taxonomies for a diverse range of fictional domains such as Lord of<br>the Rings, The Simpsons or Greek Mythology with very high precision and that it<br>outperforms state-of-the-art baselines for taxonomy induction by a substantial<br>margin.<br>},
}

Endnote

%0 Report
%A Chu, Cuong Xuan
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TiFi: Taxonomy Induction for Fictional Domains [Extended version] : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-FE67-C
%U http://arxiv.org/abs/1901.10263
%D 2019
%X   Taxonomies are important building blocks of structured knowledge bases, and<br>their construction from text sources and Wikipedia has received much attention.<br>In this paper we focus on the construction of taxonomies for fictional domains,<br>using noisy category systems from fan wikis or text extraction as input. Such<br>fictional domains are archetypes of entity universes that are poorly covered by<br>Wikipedia, such as also enterprise-specific knowledge bases or highly<br>specialized verticals. Our fiction-targeted approach, called TiFi, consists of<br>three phases: (i) category cleaning, by identifying candidate categories that<br>truly represent classes in the domain of interest, (ii) edge cleaning, by<br>selecting subcategory relationships that correspond to class subsumption, and<br>(iii) top-level construction, by mapping classes onto a subset of high-level<br>WordNet categories. A comprehensive evaluation shows that TiFi is able to<br>construct taxonomies for a diverse range of fictional domains such as Lord of<br>the Rings, The Simpsons or Greek Mythology with very high precision and that it<br>outperforms state-of-the-art baselines for taxonomy induction by a substantial<br>margin.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR

Thesis

S. A. Cotop

“How to be Grim,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

BibTeX

@mastersthesis{cotop:19:grim,
TITLE = {How to be Grim},
AUTHOR = {Cotop, Simina Ana},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
}

Endnote

%0 Thesis
%A Cotop, Simina Ana
%Y Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T How to be Grim : Explaining Data at Different Granularity Levels
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FF05-5
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%V master
%9 master

Thesis

J. Cueppers

“How to Make Cake: Finding Causal Patterns for Marked Events in Sequences,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

BibTeX

@mastersthesis{cuepper:19:cake,
TITLE = {How to Make Cake: Finding Causal Patterns for Marked Events in Sequences},
AUTHOR = {Cueppers, Joscha},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
}

Endnote

%0 Thesis
%A Cueppers, Joscha
%Y Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T How to Make Cake: Finding Causal Patterns for Marked Events in Sequences : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FF09-1
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%V master
%9 master

Conference paper

I. Dikeoulias, J. Strötgen, and S. Razniewski

“Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties,” in Companion of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.

mehr

BibTeX

@inproceedings{Dikeoulias_WWW2019,
TITLE = {Epitaph or Breaking News? {A}nalyzing and Predicting the Stability of Knowledge Base Properties},
AUTHOR = {Dikeoulias, Ioannis and Str{\"o}tgen, Jannik and Razniewski, Simon},
LANGUAGE = {eng},
ISBN = {978-1-4503-6675-5},
DOI = {10.1145/3308560.3314998},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2019)},
EDITOR = {McAuley, Julian},
PAGES = {1155--1158},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Dikeoulias, Ioannis
%A Str&#246;tgen, Jannik
%A Razniewski, Simon
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-0281-7
%R 10.1145/3308560.3314998
%D 2019
%B The Web Conference
%Z date of event: 2019-05-13 - 2019-05-17
%C San Francisco, CA, USA
%B Companion of The World Wide Web Conference
%E McAuley, Julian
%P 1155 - 1158
%I ACM
%@ 978-1-4503-6675-5

Conference paper

P. Ernst, E. Terolli, and G. Weikum

“LongLife: a Platform for Personalized Search for Health and Life Sciences,” in Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019 Satellites), Auckland, New Zealand, 2019.

mehr

BibTeX

@inproceedings{Ernst_ISWC2019,
TITLE = {{LongLife}: a Platform for Personalized Search for Health and Life Sciences},
AUTHOR = {Ernst, Patrick and Terolli, Erisa and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-2456/paper62.pdf; urn:nbn:de:0074-2456-4},
PUBLISHER = {ceur-ws.org},
YEAR = {2019},
BOOKTITLE = {Proceedings of the ISWC 2019 Satellite Tracks (Posters \& Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019 Satellites)},
EDITOR = {Su{\'a}rez-Figueroa, Mari Carmen and Cheng, Gong and Gentile, Anna Lisa and Gu{\'e}ret, Christophe and Keet, Maria and Bernstein, Abraham},
PAGES = {237--240},
EID = {62},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2456},
ADDRESS = {Auckland, New Zealand},
}

Endnote

%0 Conference Proceedings
%A Ernst, Patrick
%A Terolli, Erisa
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T LongLife: a Platform for Personalized Search for Health and Life Sciences : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83A6-B
%U http://ceur-ws.org/Vol-2456/paper62.pdf
%D 2019
%B 18th Semantic Web Conference
%Z date of event: 2019-10-26 - 2019-10-30
%C Auckland, New Zealand
%B Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference
%E Su&#225;rez-Figueroa, Mari Carmen; Cheng, Gong; Gentile, Anna Lisa; Gu&#233;ret, Christophe; Keet, Maria; Bernstein, Abraham
%P 237 - 240
%Z sequence number: 62
%I ceur-ws.org
%B CEUR Workshop Proceedings
%N 2456
%@ false

Conference paper

M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum

“Tracy: Tracing Facts over Knowledge Graphs and Text,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.

mehr

BibTeX

@inproceedings{Gad-Elrab_WWW2019,
TITLE = {Tracy: {T}racing Facts over Knowledge Graphs and Text},
AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-6674-8},
DOI = {10.1145/3308558.3314126},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)},
EDITOR = {McAuley, Julian},
PAGES = {3516--3520},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed Hassan
%A Stepanova, Daria
%A Urbani, Jacopo
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Tracy: Tracing Facts over Knowledge Graphs and Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-08AA-5
%R 10.1145/3308558.3314126
%D 2019
%B The Web Conference
%Z date of event: 2019-05-13 - 2019-05-17
%C San Francisco, CA, USA
%B Proceedings of The World Wide Web Conference
%E McAuley, Julian
%P 3516 - 3520
%I ACM
%@ 978-1-4503-6674-8

Conference paper

M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum

“ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text ,” in WSDM ’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.

mehr

BibTeX

@inproceedings{Gad-Elrab_WSDM2019,
TITLE = {{ExFaKT}: {A} Framework for Explaining Facts over Knowledge Graphs and Text},
AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5940-5},
DOI = {10.1145/3289600.3290996},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {WSDM '19, 12h ACM International Conference on Web Search and Data Mining},
PAGES = {87--95},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed Hassan
%A Stepanova, Daria
%A Urbani, Jacopo
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text&#160; : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9C44-2
%R 10.1145/3289600.3290996
%D 2019
%B 12h ACM International Conference on Web Search and Data Mining
%Z date of event: 2019-02-11 - 2019-02-15
%C Melbourne, Australia
%B WSDM '19
%P 87 - 95
%I ACM
%@ 978-1-4503-5940-5

Conference paper

A. Ghazimatin, R. Saha Roy, and G. Weikum

“FAIRY: A Framework for Understanding Relationships between Users’ Actions and their Social Feeds,” in WSDM ’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.

mehr

BibTeX

@inproceedings{Ghazimatin_WSDM2019,
TITLE = {{FAIRY}: {A} Framework for Understanding Relationships between Users' Actions and their Social Feeds},
AUTHOR = {Ghazimatin, Azin and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5940-5},
DOI = {10.1145/3289600.3290990},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {WSDM '19, 12h ACM International Conference on Web Search and Data Mining},
PAGES = {240--248},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Ghazimatin, Azin
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T FAIRY: A Framework for Understanding Relationships between Users'
Actions and their Social Feeds : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9BD9-B
%R 10.1145/3289600.3290990
%D 2019
%B 12h ACM International Conference on Web Search and Data Mining
%Z date of event: 2019-02-11 - 2019-02-15
%C Melbourne, Australia
%B WSDM '19
%P 240 - 248
%I ACM
%@ 978-1-4503-5940-5

Paper

A. Ghazimatin, R. Saha Roy, and G. Weikum

“FAIRY: A Framework for Understanding Relationships between Users’ Actions and their Social Feeds,” 2019. [Online]. Available: http://arxiv.org/abs/1908.03109.

mehr

Abstract

Users increasingly rely on social media feeds for consuming daily
information. The items in a feed, such as news, questions, songs, etc., usually
result from the complex interplay of a user's social contacts, her interests
and her actions on the platform. The relationship of the user's own behavior
and the received feed is often puzzling, and many users would like to have a
clear explanation on why certain items were shown to them. Transparency and
explainability are key concerns in the modern world of cognitive overload,
filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a
framework that systematically discovers, ranks, and explains relationships
between users' actions and items in their social media feeds. We model the
user's local neighborhood on the platform as an interaction graph, a form of
heterogeneous information network constructed solely from information that is
easily accessible to the concerned user. We posit that paths in this
interaction graph connecting the user and her feed items can act as pertinent
explanations for the user. These paths are scored with a learning-to-rank model
that captures relevance and surprisal. User studies on two social platforms
demonstrate the practical viability and user benefits of the FAIRY method.

BibTeX

@online{Ghazimatin_arXiv1908.03109,
TITLE = {{FAIRY}: A Framework for Understanding Relationships between Users' Actions and their Social Feeds},
AUTHOR = {Ghazimatin, Azin and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1908.03109},
EPRINT = {1908.03109},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Users increasingly rely on social media feeds for consuming daily<br>information. The items in a feed, such as news, questions, songs, etc., usually<br>result from the complex interplay of a user's social contacts, her interests<br>and her actions on the platform. The relationship of the user's own behavior<br>and the received feed is often puzzling, and many users would like to have a<br>clear explanation on why certain items were shown to them. Transparency and<br>explainability are key concerns in the modern world of cognitive overload,<br>filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a<br>framework that systematically discovers, ranks, and explains relationships<br>between users' actions and items in their social media feeds. We model the<br>user's local neighborhood on the platform as an interaction graph, a form of<br>heterogeneous information network constructed solely from information that is<br>easily accessible to the concerned user. We posit that paths in this<br>interaction graph connecting the user and her feed items can act as pertinent<br>explanations for the user. These paths are scored with a learning-to-rank model<br>that captures relevance and surprisal. User studies on two social platforms<br>demonstrate the practical viability and user benefits of the FAIRY method.<br>},
}

Endnote

%0 Report
%A Ghazimatin, Azin
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T FAIRY: A Framework for Understanding Relationships between Users'
  Actions and their Social Feeds : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83B9-6
%U http://arxiv.org/abs/1908.03109
%D 2019
%X   Users increasingly rely on social media feeds for consuming daily<br>information. The items in a feed, such as news, questions, songs, etc., usually<br>result from the complex interplay of a user's social contacts, her interests<br>and her actions on the platform. The relationship of the user's own behavior<br>and the received feed is often puzzling, and many users would like to have a<br>clear explanation on why certain items were shown to them. Transparency and<br>explainability are key concerns in the modern world of cognitive overload,<br>filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a<br>framework that systematically discovers, ranks, and explains relationships<br>between users' actions and items in their social media feeds. We model the<br>user's local neighborhood on the platform as an interaction graph, a form of<br>heterogeneous information network constructed solely from information that is<br>easily accessible to the concerned user. We posit that paths in this<br>interaction graph connecting the user and her feed items can act as pertinent<br>explanations for the user. These paths are scored with a learning-to-rank model<br>that captures relevance and surprisal. User studies on two social platforms<br>demonstrate the practical viability and user benefits of the FAIRY method.<br>
%K cs.SI,Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML,

Paper

A. Ghazimatin, O. Balalau, R. Saha Roy, and G. Weikum

“PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems,” 2019. [Online]. Available: http://arxiv.org/abs/1911.08378.

mehr

Abstract

Interpretable explanations for recommender systems and other machine learning
models are crucial to gain user trust. Prior works that have focused on paths
connecting users and items in a heterogeneous network have several limitations,
such as discovering relationships rather than true explanations, or
disregarding other users' privacy. In this work, we take a fresh perspective,
and present PRINCE: a provider-side mechanism to produce tangible explanations
for end-users, where an explanation is defined to be a set of minimal actions
performed by the user that, if removed, changes the recommendation to a
different item. Given a recommendation, PRINCE uses a polynomial-time optimal
algorithm for finding this minimal set of a user's actions from an exponential
search space, based on random walks over dynamic graphs. Experiments on two
real-world datasets show that PRINCE provides more compact explanations than
intuitive baselines, and insights from a crowdsourced user-study demonstrate
the viability of such action-based explanations. We thus posit that PRINCE
produces scrutable, actionable, and concise explanations, owing to its use of
counterfactual evidence, a user's own actions, and minimal sets, respectively.

BibTeX

@online{Ghazimatin_arXiv1911.08378,
TITLE = {{PRINCE}: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems},
AUTHOR = {Ghazimatin, Azin and Balalau, Oana and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1911.08378},
EPRINT = {1911.08378},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Interpretable explanations for recommender systems and other machine learning<br>models are crucial to gain user trust. Prior works that have focused on paths<br>connecting users and items in a heterogeneous network have several limitations,<br>such as discovering relationships rather than true explanations, or<br>disregarding other users' privacy. In this work, we take a fresh perspective,<br>and present PRINCE: a provider-side mechanism to produce tangible explanations<br>for end-users, where an explanation is defined to be a set of minimal actions<br>performed by the user that, if removed, changes the recommendation to a<br>different item. Given a recommendation, PRINCE uses a polynomial-time optimal<br>algorithm for finding this minimal set of a user's actions from an exponential<br>search space, based on random walks over dynamic graphs. Experiments on two<br>real-world datasets show that PRINCE provides more compact explanations than<br>intuitive baselines, and insights from a crowdsourced user-study demonstrate<br>the viability of such action-based explanations. We thus posit that PRINCE<br>produces scrutable, actionable, and concise explanations, owing to its use of<br>counterfactual evidence, a user's own actions, and minimal sets, respectively.<br>},
}

Endnote

%0 Report
%A Ghazimatin, Azin
%A Balalau, Oana
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T PRINCE: Provider-side Interpretability with Counterfactual Explanations
in Recommender Systems : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8415-E
%U http://arxiv.org/abs/1911.08378
%D 2019
%X   Interpretable explanations for recommender systems and other machine learning<br>models are crucial to gain user trust. Prior works that have focused on paths<br>connecting users and items in a heterogeneous network have several limitations,<br>such as discovering relationships rather than true explanations, or<br>disregarding other users' privacy. In this work, we take a fresh perspective,<br>and present PRINCE: a provider-side mechanism to produce tangible explanations<br>for end-users, where an explanation is defined to be a set of minimal actions<br>performed by the user that, if removed, changes the recommendation to a<br>different item. Given a recommendation, PRINCE uses a polynomial-time optimal<br>algorithm for finding this minimal set of a user's actions from an exponential<br>search space, based on random walks over dynamic graphs. Experiments on two<br>real-world datasets show that PRINCE provides more compact explanations than<br>intuitive baselines, and insights from a crowdsourced user-study demonstrate<br>the viability of such action-based explanations. We thus posit that PRINCE<br>produces scrutable, actionable, and concise explanations, owing to its use of<br>counterfactual evidence, a user's own actions, and minimal sets, respectively.<br>
%K Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI,Statistics, Machine Learning, stat.ML

Conference paper

A. Guimarães, O. Balalau, E. Terolli, and G. Weikum

“Analyzing the Traits and Anomalies of Political Discussions on Reddit,” in Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019), Munich, Germany, 2019.

mehr

BibTeX

@inproceedings{Guimaraes_ICWSM2019,
TITLE = {Analyzing the Traits and Anomalies of Political Discussions on {R}eddit},
AUTHOR = {Guimar{\~a}es, Anna and Balalau, Oana and Terolli, Erisa and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {2334-0770},
PUBLISHER = {AAAI},
YEAR = {2019},
BOOKTITLE = {Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019)},
PAGES = {205--213},
ADDRESS = {Munich, Germany},
}

Endnote

%0 Conference Proceedings
%A Guimar&#227;es, Anna
%A Balalau, Oana
%A Terolli, Erisa
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Analyzing the Traits and Anomalies of Political Discussions on Reddit : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-3649-F
%D 2019
%B 13th International Conference on Web and Social Media
%Z date of event: 2019-06-11 - 2019-06-14
%C Munich, Germany
%B Proceedings of the Thirteenth International Conference on Web and Social Media
%P 205 - 213
%I AAAI
%@ false

Conference paper

D. Gupta and K. Berberich

“Structured Search in Annotated Document Collections,” in WSDM ’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.

mehr

BibTeX

@inproceedings{Gupta_WSDM2019Demo,
TITLE = {Structured Search in Annotated Document Collections},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-5940-5},
DOI = {10.1145/3289600.3290618},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {WSDM '19, 12h ACM International Conference on Web Search and Data Mining},
PAGES = {794--797},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Structured Search in Annotated Document Collections : Demo paper
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A8D6-F
%R 10.1145/3289600.3290618
%D 2019
%B 12h ACM International Conference on Web Search and Data Mining
%Z date of event: 2019-02-11 - 2019-02-15
%C Melbourne, Australia
%B WSDM '19
%P 794 - 797
%I ACM
%@ 978-1-4503-5940-5

Conference paper

D. Gupta and K. Berberich

“Efficient Retrieval of Knowledge Graph Fact Evidences,” in The Semantic Web: ESWC 2019 Satellite Events, Portorož, Slovenia, 2019.

mehr

BibTeX

@inproceedings{GuptaESWC2019a,
TITLE = {Efficient Retrieval of Knowledge Graph Fact Evidences},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-030-32326-4},
DOI = {10.1007/978-3-030-32327-1_18},
PUBLISHER = {Springer},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {The Semantic Web: ESWC 2019 Satellite Events},
EDITOR = {Hitzler, Pascal and Kirrane, Sabrina and Hartig, Olaf and de Boer, Victor and Vidal, Maria-Esther and Maleshova, Maria and Schlobach, Stefan and Hammar, Karl and Lasierra, Nelia and Stadtm{\"u}ller, Steffen and Hose, Katja and Verborgh, Ruben},
PAGES = {90--94},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11762},
ADDRESS = {Portoro{\v z}, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficient Retrieval of Knowledge Graph Fact Evidences : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8477-0
%R 10.1007/978-3-030-32327-1_18
%D 2019
%B 16th Extended Semantic Web Conference
%Z date of event: 2019-06-02 - 2019-06-06
%C Portoro&#382;, Slovenia
%B The Semantic Web: ESWC 2019 Satellite Events
%E Hitzler, Pascal; Kirrane, Sabrina; Hartig, Olaf; de Boer, Victor; Vidal, Maria-Esther; Maleshova, Maria; Schlobach, Stefan; Hammar, Karl; Lasierra, Nelia; Stadtm&#252;ller, Steffen; Hose, Katja; Verborgh, Ruben
%P 90 - 94
%I Springer
%@ 978-3-030-32326-4
%B Lecture Notes in Computer Science
%N 11762

Conference paper

D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti

“Generating Semantic Aspects for Queries,” in The Semantic Web (ESWC 2019), Portorož, Slovenia, 2019.

mehr

BibTeX

@inproceedings{GuptaESWC2019,
TITLE = {Generating Semantic Aspects for Queries},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios},
LANGUAGE = {eng},
ISBN = {978-3-030-21347-3},
DOI = {10.1007/978-3-030-21348-0_11},
PUBLISHER = {Springer},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {The Semantic Web (ESWC 2019)},
EDITOR = {Hitzler, Pascal and Fern{\'a}ndez, Miriam and Janowicz, Krzysztof and Zaveri, Amrapali and Gray, Alasdair J. G. and Lopez, Vanessa and Haller, Armin and Hammar, Karl},
PAGES = {162--178},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11503},
ADDRESS = {Portoro{\v z}, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%A Str&#246;tgen, Jannik
%A Zeinalipour-Yazti, Demetrios
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Generating Semantic Aspects for Queries : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-FF5F-5
%R 10.1007/978-3-030-21348-0_11
%D 2019
%B 16th Extended Semantic Web Conference
%Z date of event: 2019-06-02 - 2019-06-06
%C Portoro&#382;, Slovenia
%B The Semantic Web
%E Hitzler, Pascal; Fern&#225;ndez, Miriam; Janowicz, Krzysztof; Zaveri, Amrapali; Gray, Alasdair J. G.; Lopez, Vanessa; Haller, Armin; Hammar, Karl
%P 162 - 178
%I Springer
%@ 978-3-030-21347-3
%B Lecture Notes in Computer Science
%N 11503

Conference paper

D. Gupta and K. Berberich

“JIGSAW: Structuring Text into Tables,” in ICTIR ’19, ACM SIGIR International Conference on Theory of Information Retrieval, Santa Clara, CA, USA, 2019.

mehr

BibTeX

@inproceedings{Gupta_ICTIR2019,
TITLE = {{JIGSAW}: {S}tructuring Text into Tables},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-6881-0},
DOI = {10.1145/3341981.3344228},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {ICTIR '19, ACM SIGIR International Conference on Theory of Information Retrieval},
EDITOR = {Fang, Yi and Zhang, Yi},
PAGES = {237--244},
ADDRESS = {Santa Clara, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T JIGSAW: Structuring Text into Tables : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8479-E
%R 10.1145/3341981.3344228
%D 2019
%B ACM SIGIR International Conference on Theory of Information Retrieval
%Z date of event: 2019-10-02 - 2019-10-05
%C Santa Clara, CA, USA
%B ICTIR '19
%E Fang, Yi; Zhang, Yi
%P 237 - 244
%I ACM
%@ 978-1-4503-6881-0

Article

D5IMPR-CS

D. Gupta

“Search and Analytics Using Semantic Annotations,” ACM SIGIR Forum, vol. 53, no. 2, 2019.

mehr

Abstract

Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.

BibTeX

@article{Gupta_SIGIR19,
TITLE = {Search and Analytics Using Semantic Annotations},
AUTHOR = {Gupta, Dhruv},
LANGUAGE = {eng},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2019},
ABSTRACT = {Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.},
JOURNAL = {ACM SIGIR Forum},
VOLUME = {53},
NUMBER = {2},
PAGES = {100--101},
}

Endnote

%0 Journal Article
%A Gupta, Dhruv
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
%T Search and Analytics Using Semantic Annotations : Doctorial Abstract
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-A1C2-9
%7 2019
%D 2019
%X Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.
%J ACM SIGIR Forum
%V 53
%N 2
%& 100
%P 100 - 101
%I ACM
%C New York, NY
%U http://sigir.org/wp-content/uploads/2019/december/p100.pdf

Thesis

D5IMPR-CS

D. Gupta

“Search and Analytics Using Semantic Annotations,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

Abstract

BibTeX

@phdthesis{GUPTAphd2019,
TITLE = {Search and Analytics Using Semantic Annotations},
AUTHOR = {Gupta, Dhruv},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-300780},
DOI = {10.22028/D291-30078},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.},
}

Endnote

%0 Thesis
%A Gupta, Dhruv
%Y Berberich, Klaus
%A referee: Weikum, Gerhard
%A referee: Bedathur, Srikanta
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Search and Analytics Using Semantic Annotations :
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-7695-E
%R 10.22028/D291-30078
%U urn:nbn:de:bsz:291--ds-300780
%F OTHER: hdl:20.500.11880/28516
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P xxviii, 211 p.
%V phd
%9 phd
%X Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28516

Conference paper

M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo

“Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” in Proceedings of the 13th International Conference on Computational Semantics - Long Papers (IWCS 2019), Gothenburg, Sweden, 2019.

mehr

BibTeX

@inproceedings{Hedderich_IWCS2019,
TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries},
AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-1-950737-19-2},
PUBLISHER = {ACL},
YEAR = {2019},
BOOKTITLE = {Proceedings of the 13th International Conference on Computational Semantics -- Long Papers (IWCS 2019)},
EDITOR = {Dobnik, Simon and Chatzikyriakidis, Stergios and Demberg, Vera},
PAGES = {247--258},
ADDRESS = {Gothenburg, Sweden},
}

Endnote

%0 Conference Proceedings
%A Hedderich, Michael A.
%A Yates, Andrew
%A Klakow, Dietrich
%A de Melo, Gerard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-02A4-0
%D 2019
%B 13th International Conference on Computational Semantics
%Z date of event: 2019-05-23 - 2019-05-27
%C Gothenburg, Sweden
%B Proceedings of the 13th International Conference on Computational Semantics - Long Papers
%E Dobnik, Simon; Chatzikyriakidis, Stergios; Demberg, Vera
%P 247 - 258
%I ACL
%@ 978-1-950737-19-2
%U https://www.aclweb.org/anthology/W19-0421

Paper

M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo

“Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” 2019. [Online]. Available: http://arxiv.org/abs/1904.01451.

mehr

Abstract

Popular word embedding methods such as word2vec and GloVe assign a single
vector representation to each word, even if a word has multiple distinct
meanings. Multi-sense embeddings instead provide different vectors for each
sense of a word. However, they typically cannot serve as a drop-in replacement
for conventional single-sense embeddings, because the correct sense vector
needs to be selected for each word. In this work, we study the effect of
multi-sense embeddings on the task of reverse dictionaries. We propose a
technique to easily integrate them into an existing neural network architecture
using an attention mechanism. Our experiments demonstrate that large
improvements can be obtained when employing multi-sense embeddings both in the
input sequence as well as for the target representation. An analysis of the
sense distributions and of the learned attention is provided as well.

BibTeX

@online{Hedderich_arXiv1904.01451,
TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries},
AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1904.01451},
EPRINT = {1904.01451},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Popular word embedding methods such as word2vec and GloVe assign a single<br>vector representation to each word, even if a word has multiple distinct<br>meanings. Multi-sense embeddings instead provide different vectors for each<br>sense of a word. However, they typically cannot serve as a drop-in replacement<br>for conventional single-sense embeddings, because the correct sense vector<br>needs to be selected for each word. In this work, we study the effect of<br>multi-sense embeddings on the task of reverse dictionaries. We propose a<br>technique to easily integrate them into an existing neural network architecture<br>using an attention mechanism. Our experiments demonstrate that large<br>improvements can be obtained when employing multi-sense embeddings both in the<br>input sequence as well as for the target representation. An analysis of the<br>sense distributions and of the learned attention is provided as well.<br>},
}

Endnote

%0 Report
%A Hedderich, Michael A.
%A Yates, Andrew
%A Klakow, Dietrich
%A de Melo, Gerard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-02B4-E
%U http://arxiv.org/abs/1904.01451
%D 2019
%X   Popular word embedding methods such as word2vec and GloVe assign a single<br>vector representation to each word, even if a word has multiple distinct<br>meanings. Multi-sense embeddings instead provide different vectors for each<br>sense of a word. However, they typically cannot serve as a drop-in replacement<br>for conventional single-sense embeddings, because the correct sense vector<br>needs to be selected for each word. In this work, we study the effect of<br>multi-sense embeddings on the task of reverse dictionaries. We propose a<br>technique to easily integrate them into an existing neural network architecture<br>using an attention mechanism. Our experiments demonstrate that large<br>improvements can be obtained when employing multi-sense embeddings both in the<br>input sequence as well as for the target representation. An analysis of the<br>sense distributions and of the learned attention is provided as well.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Learning, cs.LG

Conference paper

V. T. Ho, Y. Ibrahim, K. Pal, K. Berberich, and G. Weikum

“Qsearch: Answering Quantity Queries from Text,” in The Semantic Web -- ISWC 2019, Auckland, New Zealand, 2019.

mehr

BibTeX

@inproceedings{Ho_ISWC2019,
TITLE = {Qsearch: {A}nswering Quantity Queries from Text},
AUTHOR = {Ho, Vinh Thinh and Ibrahim, Yusra and Pal, Koninika and Berberich, Klaus and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {0302-9743},
ISBN = {978-3-030-30792-9},
DOI = {10.1007/978-3-030-30793-6_14},
PUBLISHER = {Springer},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {The Semantic Web -- ISWC 2019},
DEBUG = {author: Gandon, Fabien},
EDITOR = {Ghidini, Chiara and Hartig, Olaf and Maleshkova, Maria and Sv{\'a}tek, Vojt{\u e}ch and Cruz, Isabel and Hogan, Aidan and Song, Jie and Lefran{\c c}ois, Maxime},
PAGES = {237--257},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11778},
ADDRESS = {Auckland, New Zealand},
}

Endnote

%0 Conference Proceedings
%A Ho, Vinh Thinh
%A Ibrahim, Yusra
%A Pal, Koninika
%A Berberich, Klaus
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Qsearch: Answering Quantity Queries from Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83AB-6
%R 10.1007/978-3-030-30793-6_14
%D 2019
%B 18th Semantic Web Conference
%Z date of event: 2019-10-26 - 2019-10-30
%C Auckland, New Zealand
%B The Semantic Web -- ISWC 2019
%E Ghidini, Chiara; Hartig, Olaf; Maleshkova, Maria; Sv&#225;tek, Vojt&#277;ch; Cruz, Isabel; Hogan, Aidan; Song, Jie; Lefran&#231;ois, Maxime; Gandon, Fabien
%P 237 - 257
%I Springer
%@ 978-3-030-30792-9
%B Lecture Notes in Computer Science
%N 11778
%@ false

Thesis

D5IMPR-CSD1

Y. Ibrahim

“Understanding Quantities in Web Tables and Text,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

Abstract

There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: - We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. - We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. - We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. - We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.

BibTeX

@phdthesis{yusraphd2019,
TITLE = {Understanding Quantities in Web Tables and Text},
AUTHOR = {Ibrahim, Yusra},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-296575},
DOI = {10.22028/D291-29657},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: -- We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. -- We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. -- We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. -- We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.},
}

Endnote

%0 Thesis
%A Ibrahim, Yusra
%Y Weikum, Gerhard
%A referee: Riedewald, Mirek
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Algorithms and Complexity, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Understanding Quantities in Web Tables and Text :
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-4384-A
%R 10.22028/D291-29657
%U urn:nbn:de:bsz:291--ds-296575
%F OTHER: hdl:20.500.11880/28300
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P 116 p.
%V phd
%9 phd
%X There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: - We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. - We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. - We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. - We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28300

Conference paper

D5D1

Y. Ibrahim, M. Riedewald, G. Weikum, and D. Zeinalipour-Yazti

“Bridging Quantities in Tables and Text,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.

mehr

BibTeX

@inproceedings{Ibrahim_ICDE2019,
TITLE = {Bridging Quantities in Tables and Text},
AUTHOR = {Ibrahim, Yusra and Riedewald, Mirek and Weikum, Gerhard and Zeinalipour-Yazti, Demetrios},
LANGUAGE = {eng},
ISBN = {978-1-5386-7474-1},
DOI = {10.1109/ICDE.2019.00094},
PUBLISHER = {IEEE},
YEAR = {2019},
BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering},
PAGES = {1010--1021},
ADDRESS = {Macau, China},
}

Endnote

%0 Conference Proceedings
%A Ibrahim, Yusra
%A Riedewald, Mirek
%A Weikum, Gerhard
%A Zeinalipour-Yazti, Demetrios
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Algorithms and Complexity, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Bridging Quantities in Tables and Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-01AB-B
%R 10.1109/ICDE.2019.00094
%D 2019
%B 35th IEEE International Conference on Data Engineering
%Z date of event: 2019-04-08 - 2019-04-12
%C Macau, China
%B ICDE 2019
%P 1010 - 1021
%I IEEE
%@ 978-1-5386-7474-1

Conference paper

Y. Ibrahim and G. Weikum

“ExQuisiTe: Explaining Quantities in Text,” in Proceedings of the World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.

mehr

BibTeX

@inproceedings{Ibrahim_WWW2019,
TITLE = {{ExQuisiTe}: {E}xplaining Quantities in Text},
AUTHOR = {Ibrahim, Yusra and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-6674-8},
DOI = {10.1145/3308558.3314134},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2019)},
EDITOR = {McAuley, Julian},
PAGES = {3541--3544},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Ibrahim, Yusra
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ExQuisiTe: Explaining Quantities in Text : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-01B3-1
%R 10.1145/3308558.3314134
%D 2019
%B The Web Conference
%Z date of event: 2019-05-13 - 2019-05-17
%C San Francisco, CA, USA
%B Proceedings of the World Wide Web Conference
%E McAuley, Julian
%P 3541 - 3544
%I ACM
%@ 978-1-4503-6674-8

Conference paper

Y. Ismaeil, O. Balalau, and P. Mirza

“Discovering the Functions of Language in Online Forums,” in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2020), Hong Kong, China, 2019.

mehr

BibTeX

@inproceedings{ismaeil-etal-2019-discovering,
TITLE = {Discovering the Functions of Language in Online Forums},
AUTHOR = {Ismaeil, Youmna and Balalau, Oana and Mirza, Paramita},
LANGUAGE = {eng},
ISBN = {978-1-950737-84-0},
DOI = {10.18653/v1/D19-5534},
PUBLISHER = {ACL},
YEAR = {2019},
BOOKTITLE = {Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2020)},
EDITOR = {Xu, Wei and Ritter, Alan and Baldwin, Tim and Rahimi, Afshin},
PAGES = {259--264},
ADDRESS = {Hong Kong, China},
}

Endnote

%0 Conference Proceedings
%A Ismaeil, Youmna
%A Balalau, Oana
%A Mirza, Paramita
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering the Functions of Language in Online Forums : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0405-E
%R 10.18653/v1/D19-5534
%F OTHER: D19-5534
%D 2019
%B 5th Workshop on Noisy User-generated Text
%Z date of event: 2019-11-04 - 2019-11-04
%C Hong Kong, China
%B Proceedings of the 5th Workshop on Noisy User-generated Text
%E Xu, Wei; Ritter, Alan; Baldwin, Tim; Rahimi, Afshin
%P 259 - 264
%I ACL
%@ 978-1-950737-84-0
%U https://www.aclweb.org/anthology/D19-5534

Paper

Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum

“TEQUILA: Temporal Question Answering over Knowledge Bases,” 2019. [Online]. Available: http://arxiv.org/abs/1908.03650.

mehr

Abstract

Question answering over knowledge bases (KB-QA) poses challenges in handling
complex questions that need to be decomposed into sub-questions. An important
case, addressed here, is that of temporal questions, where cues for temporal
relations need to be discovered and handled. We present TEQUILA, an enabler
method for temporal QA that can run on top of any KB-QA engine. TEQUILA has
four stages. It detects if a question has temporal intent. It decomposes and
rewrites the question into non-temporal sub-questions and temporal constraints.
Answers to sub-questions are then retrieved from the underlying KB-QA engine.
Finally, TEQUILA uses constraint reasoning on temporal intervals to compute
final answers to the full question. Comparisons against state-of-the-art
baselines show the viability of our method.

BibTeX

@online{Jia_arXiv1908.03650,
TITLE = {{TEQUILA}: Temporal Question Answering over Knowledge Bases},
AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1908.03650},
EPRINT = {1908.03650},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Question answering over knowledge bases (KB-QA) poses challenges in handling<br>complex questions that need to be decomposed into sub-questions. An important<br>case, addressed here, is that of temporal questions, where cues for temporal<br>relations need to be discovered and handled. We present TEQUILA, an enabler<br>method for temporal QA that can run on top of any KB-QA engine. TEQUILA has<br>four stages. It detects if a question has temporal intent. It decomposes and<br>rewrites the question into non-temporal sub-questions and temporal constraints.<br>Answers to sub-questions are then retrieved from the underlying KB-QA engine.<br>Finally, TEQUILA uses constraint reasoning on temporal intervals to compute<br>final answers to the full question. Comparisons against state-of-the-art<br>baselines show the viability of our method.<br>},
}

Endnote

%0 Report
%A Jia, Zhen
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TEQUILA: Temporal Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83BE-1
%U http://arxiv.org/abs/1908.03650
%D 2019
%X   Question answering over knowledge bases (KB-QA) poses challenges in handling<br>complex questions that need to be decomposed into sub-questions. An important<br>case, addressed here, is that of temporal questions, where cues for temporal<br>relations need to be discovered and handled. We present TEQUILA, an enabler<br>method for temporal QA that can run on top of any KB-QA engine. TEQUILA has<br>four stages. It detects if a question has temporal intent. It decomposes and<br>rewrites the question into non-temporal sub-questions and temporal constraints.<br>Answers to sub-questions are then retrieved from the underlying KB-QA engine.<br>Finally, TEQUILA uses constraint reasoning on temporal intervals to compute<br>final answers to the full question. Comparisons against state-of-the-art<br>baselines show the viability of our method.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

M. Kaiser, R. Saha Roy, and G. Weikum

“CROWN: Conversational Passage Ranking by Reasoning over Word Networks,” in Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019), Gaithersburg, MD, USA, 2019.

mehr

BibTeX

@inproceedings{KaiserTrec19,
TITLE = {{CROWN}: {C}onversational Passage Ranking by Reasoning over Word Networks},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
PUBLISHER = {NIST},
YEAR = {2019},
BOOKTITLE = {Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019)},
EDITOR = {Voorhees, Ellen M. and Ellis, Angela},
SERIES = {NIST Special Publication},
VOLUME = {1250},
ADDRESS = {Gaithersburg, MD, USA},
}

Endnote

%0 Conference Proceedings
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CROWN: Conversational Passage Ranking by Reasoning over Word Networks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-03C3-8
%D 2019
%B Twenty-Eighth Text REtrieval Conference
%Z date of event: 2019-11-13 - 2019-11-15
%C Gaithersburg, MD, USA
%B Proceedings of the Twenty-Eighth Text REtrieval Conference
%E Voorhees, Ellen M.; Ellis, Angela
%I NIST
%B NIST Special Publication
%N 1250

Paper

M. Kaiser, R. Saha Roy, and G. Weikum

“CROWN: Conversational Passage Ranking by Reasoning over Word Networks,” 2019. [Online]. Available: http://arxiv.org/abs/1911.02850.

mehr

Abstract

Information needs around a topic cannot be satisfied in a single turn; users
typically ask follow-up questions referring to the same theme and a system must
be capable of understanding the conversational context of a request to retrieve
correct answers. In this paper, we present our submission to the TREC
Conversational Assistance Track 2019, in which such a conversational setting is
explored. We propose a simple unsupervised method for conversational passage
ranking by formulating the passage score for a query as a combination of
similarity and coherence. To be specific, passages are preferred that contain
words semantically similar to the words used in the question, and where such
words appear close by. We built a word-proximity network (WPN) from a large
corpus, where words are nodes and there is an edge between two nodes if they
co-occur in the same passages in a statistically significant way, within a
context window. Our approach, named CROWN, improved nDCG scores over a provided
Indri baseline on the CAsT training data. On the evaluation data for CAsT, our
best run submission achieved above-average performance with respect to AP@5 and
nDCG@1000.

BibTeX

@online{Kaiser_arXiv1911.02850,
TITLE = {{CROWN}: Conversational Passage Ranking by Reasoning over Word Networks},
AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1911.02850},
EPRINT = {1911.02850},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Information needs around a topic cannot be satisfied in a single turn; users<br>typically ask follow-up questions referring to the same theme and a system must<br>be capable of understanding the conversational context of a request to retrieve<br>correct answers. In this paper, we present our submission to the TREC<br>Conversational Assistance Track 2019, in which such a conversational setting is<br>explored. We propose a simple unsupervised method for conversational passage<br>ranking by formulating the passage score for a query as a combination of<br>similarity and coherence. To be specific, passages are preferred that contain<br>words semantically similar to the words used in the question, and where such<br>words appear close by. We built a word-proximity network (WPN) from a large<br>corpus, where words are nodes and there is an edge between two nodes if they<br>co-occur in the same passages in a statistically significant way, within a<br>context window. Our approach, named CROWN, improved nDCG scores over a provided<br>Indri baseline on the CAsT training data. On the evaluation data for CAsT, our<br>best run submission achieved above-average performance with respect to AP@5 and<br>nDCG@1000.<br>},
}

Endnote

%0 Report
%A Kaiser, Magdalena
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CROWN: Conversational Passage Ranking by Reasoning over Word Networks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83ED-C
%U http://arxiv.org/abs/1911.02850
%D 2019
%X   Information needs around a topic cannot be satisfied in a single turn; users<br>typically ask follow-up questions referring to the same theme and a system must<br>be capable of understanding the conversational context of a request to retrieve<br>correct answers. In this paper, we present our submission to the TREC<br>Conversational Assistance Track 2019, in which such a conversational setting is<br>explored. We propose a simple unsupervised method for conversational passage<br>ranking by formulating the passage score for a query as a combination of<br>similarity and coherence. To be specific, passages are preferred that contain<br>words semantically similar to the words used in the question, and where such<br>words appear close by. We built a word-proximity network (WPN) from a large<br>corpus, where words are nodes and there is an edge between two nodes if they<br>co-occur in the same passages in a statistically significant way, within a<br>context window. Our approach, named CROWN, improved nDCG scores over a provided<br>Indri baseline on the CAsT training data. On the evaluation data for CAsT, our<br>best run submission achieved above-average performance with respect to AP@5 and<br>nDCG@1000.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

J. Kalofolias, M. Boley, and J. Vreeken

“Discovering Robustly Connected Subgraphs with Simple Descriptions,” in 19th IEEE International Conference on Data Mining (ICDM 2019), Beijing, China, 2019.

mehr

BibTeX

@inproceedings{kalofolias:19:rosi,
TITLE = {Discovering Robustly Connected Subgraphs with Simple Descriptions},
AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-7281-4604-1},
DOI = {10.1109/ICDM.2019.00139},
PUBLISHER = {IEEE},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {19th IEEE International Conference on Data Mining (ICDM 2019)},
PAGES = {1150--1155},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Kalofolias, Janis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Robustly Connected Subgraphs with Simple Descriptions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-26D3-F
%R 10.1109/ICDM.2019.00139
%D 2019
%B 19th IEEE International Conference on Data Mining 
%Z date of event: 2019-11-08 - 2019-11-11
%C Beijing, China
%B 19th IEEE International Conference on Data Mining 
%P 1150 - 1155
%I IEEE
%@  978-1-7281-4604-1

Conference paper

D. Kaltenpoth and J. Vreeken

“We Are Not Your Real Parents: Telling Causal from Confounded by MDL,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.

mehr

BibTeX

@inproceedings{Kaltenpoth_SDM2019,
TITLE = {We Are Not Your Real Parents: {T}elling Causal from Confounded by {MDL}},
AUTHOR = {Kaltenpoth, David and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-567-3},
DOI = {10.1137/1.9781611975673.23},
PUBLISHER = {SIAM},
YEAR = {2019},
BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)},
EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh},
PAGES = {199--207},
ADDRESS = {Calgary, Canada},
}

Endnote

%0 Conference Proceedings
%A Kaltenpoth, David
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T We Are Not Your Real Parents: Telling Causal from Confounded by MDL : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-0D37-2
%R 10.1137/1.9781611975673.23
%D 2019
%B SIAM International Conference on Data Mining
%Z date of event: 2019-05-02 - 2019-05-04
%C Calgary, Canada
%B Proceedings of the 2019 SIAM International Conference on Data Mining
%E Berger-Wolf, Tanya; Chawla, Nitesh
%P 199 - 207
%I SIAM
%@ 978-1-61197-567-3

Paper

D. Kaltenpoth and J. Vreeken

“We Are Not Your Real Parents: Telling Causal from Confounded using MDL,” 2019. [Online]. Available: http://arxiv.org/abs/1901.06950.

mehr

Abstract

Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of
finding out whether $X$ jointly causes $Y$ or whether they are all confounded
by an unobserved latent variable $Z$. To do so, we take an
information-theoretic approach based on Kolmogorov complexity. In a nutshell,
we follow the postulate that first encoding the true cause, and then the
effects given that cause, results in a shorter description than any other
encoding of the observed variables.
The ideal score is not computable, and hence we have to approximate it. We
propose to do so using the Minimum Description Length (MDL) principle. We
compare the MDL scores under the models where $X$ causes $Y$ and where there
exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores
are consistent. To find potential confounders we propose using latent factor
modeling, in particular, probabilistic PCA (PPCA).
Empirical evaluation on both synthetic and real-world data shows that our
method, CoCa, performs very well -- even when the true generating process of
the data is far from the assumptions made by the models we use. Moreover, it is
robust as its accuracy goes hand in hand with its confidence.

BibTeX

@online{Kaltenpoth_arXiv1901.06950,
TITLE = {We Are Not Your Real Parents: Telling Causal from Confounded using {MDL}},
AUTHOR = {Kaltenpoth, David and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1901.06950},
EPRINT = {1901.06950},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of<br>finding out whether $X$ jointly causes $Y$ or whether they are all confounded<br>by an unobserved latent variable $Z$. To do so, we take an<br>information-theoretic approach based on Kolmogorov complexity. In a nutshell,<br>we follow the postulate that first encoding the true cause, and then the<br>effects given that cause, results in a shorter description than any other<br>encoding of the observed variables.<br> The ideal score is not computable, and hence we have to approximate it. We<br>propose to do so using the Minimum Description Length (MDL) principle. We<br>compare the MDL scores under the models where $X$ causes $Y$ and where there<br>exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores<br>are consistent. To find potential confounders we propose using latent factor<br>modeling, in particular, probabilistic PCA (PPCA).<br> Empirical evaluation on both synthetic and real-world data shows that our<br>method, CoCa, performs very well -- even when the true generating process of<br>the data is far from the assumptions made by the models we use. Moreover, it is<br>robust as its accuracy goes hand in hand with its confidence.<br>},
}

Endnote

%0 Report
%A Kaltenpoth, David
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T We Are Not Your Real Parents: Telling Causal from Confounded using MDL : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-FFEE-3
%U http://arxiv.org/abs/1901.06950
%D 2019
%X   Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of<br>finding out whether $X$ jointly causes $Y$ or whether they are all confounded<br>by an unobserved latent variable $Z$. To do so, we take an<br>information-theoretic approach based on Kolmogorov complexity. In a nutshell,<br>we follow the postulate that first encoding the true cause, and then the<br>effects given that cause, results in a shorter description than any other<br>encoding of the observed variables.<br>  The ideal score is not computable, and hence we have to approximate it. We<br>propose to do so using the Minimum Description Length (MDL) principle. We<br>compare the MDL scores under the models where $X$ causes $Y$ and where there<br>exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores<br>are consistent. To find potential confounders we propose using latent factor<br>modeling, in particular, probabilistic PCA (PPCA).<br>  Empirical evaluation on both synthetic and real-world data shows that our<br>method, CoCa, performs very well -- even when the true generating process of<br>the data is far from the assumptions made by the models we use. Moreover, it is<br>robust as its accuracy goes hand in hand with its confidence.<br>
%K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML

Article

S. Karaev and P. Miettinen

“Algorithms for Approximate Subtropical Matrix Factorization,” Data Mining and Knowledge Discovery, vol. 33, no. 2, 2019.

mehr

BibTeX

@article{Karaev_DMKD2018,
TITLE = {Algorithms for Approximate Subtropical Matrix Factorization},
AUTHOR = {Karaev, Sanjar and Miettinen, Pauli},
LANGUAGE = {eng},
DOI = {10.1007/s10618-018-0599-1},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2019},
DATE = {2019},
JOURNAL = {Data Mining and Knowledge Discovery},
VOLUME = {33},
NUMBER = {2},
PAGES = {526--576},
}

Endnote

%0 Journal Article
%A Karaev, Sanjar
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Algorithms for Approximate Subtropical Matrix Factorization : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9FD5-B
%R 10.1007/s10618-018-0599-1
%7 2018
%D 2019
%J Data Mining and Knowledge Discovery
%O DMKD
%V 33
%N 2
%& 526
%P 526 - 576
%I Springer
%C New York, NY

Thesis

D5IMPR-CS

S. Karaev

“Matrix Factorization over Diods and its Applications in Data Mining,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

Abstract

Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (“negative numbers”) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used – the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.

BibTeX

@phdthesis{Karaevphd2019,
TITLE = {Matrix Factorization over Diods and its Applications in Data Mining},
AUTHOR = {Karaev, Sanjar},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-286619},
DOI = {10.22028/D291-28661},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse ({\textquotedblleft}negative numbers{\textquotedblright}) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used -- the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.},
}

Endnote

%0 Thesis
%A Karaev, Sanjar
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%A referee: van Leeuwen, Matthijs
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Matrix Factorization over Diods and its Applications in Data Mining : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-4369-A
%R 10.22028/D291-28661
%U urn:nbn:de:bsz:291--ds-286619
%F OTHER: hdl:20.500.11880/27903
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P 113 p.
%V phd
%9 phd
%X Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (&#8220;negative numbers&#8221;) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used &#8211; the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27903

Article

A. Konstantinidis, P. Irakleous, Z. Georgiou, D. Zeinalipour-Yazti, and P. K. Chrysanthis

“IoT Data Prefetching in Indoor Navigation SOAs,” ACM Transactions on Internet Technology, vol. 19, no. 1, 2019.

mehr

BibTeX

@article{Konstantinidis:2018:IDP:3283809.3177777,
TITLE = {{IoT} Data Prefetching in Indoor Navigation {SOAs}},
AUTHOR = {Konstantinidis, Andreas and Irakleous, Panagiotis and Georgiou, Zacharias and Zeinalipour-Yazti, Demetrios and Chrysanthis, Panos K.},
LANGUAGE = {eng},
ISSN = {1533-5399},
DOI = {10.1145/3177777},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2019},
DATE = {2019},
JOURNAL = {ACM Transactions on Internet Technology},
VOLUME = {19},
NUMBER = {1},
EID = {10},
}

Endnote

%0 Journal Article
%A Konstantinidis, Andreas
%A Irakleous, Panagiotis
%A Georgiou, Zacharias
%A Zeinalipour-Yazti, Demetrios
%A Chrysanthis, Panos K.
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T IoT Data Prefetching in Indoor Navigation SOAs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-CA09-1
%R 10.1145/3177777
%7 2019
%D 2019
%J ACM Transactions on Internet Technology
%O TOIT
%V 19
%N 1
%Z sequence number: 10
%I ACM
%C New York, NY
%@ false

Conference paper

P. Lahoti, K. Gummadi, and G. Weikum

“iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.

mehr

BibTeX

@inproceedings{Lahoti_ICDE2019,
TITLE = {{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making},
AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-5386-7474-1},
DOI = {10.1109/ICDE.2019.00121},
PUBLISHER = {IEEE},
YEAR = {2019},
BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering},
PAGES = {1334--1345},
ADDRESS = {Macau, China},
}

Endnote

%0 Conference Proceedings
%A Lahoti, Preethi
%A Gummadi, Krishna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-F395-2
%R 10.1109/ICDE.2019.00121
%D 2019
%B 35th IEEE International Conference on Data Engineering
%Z date of event: 2019-04-08 - 2019-04-12
%C Macau, China
%B ICDE 2019
%P 1334 - 1345
%I IEEE
%@ 978-1-5386-7474-1

Article

P. Lahoti, K. Gummadi, and G. Weikum

“Operationalizing Individual Fairness with Pairwise Fair Representations,” Proceedings of the VLDB Endowment (Proc. VLDB 2019), vol. 13, no. 4, 2019.

mehr

BibTeX

@article{Lahoti2019_PVLDB,
TITLE = {Operationalizing Individual Fairness with Pairwise Fair Representations},
AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.14778/3372716.3372723},
PUBLISHER = {VLDB Endowment Inc.},
YEAR = {2019},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {13},
NUMBER = {4},
PAGES = {506--518},
BOOKTITLE = {Proceedings of the 45h International Conference on Very Large Data Bases (VLDB 2019)},
EDITOR = {Balazinska, Magdalena and Zhou, Xiaofang},
}

Endnote

%0 Journal Article
%A Lahoti, Preethi
%A Gummadi, Krishna
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Operationalizing Individual Fairness with Pairwise Fair Representations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8168-4
%R 10.14778/3372716.3372723
%7 2019
%D 2019
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 13
%N 4
%& 506
%P 506 - 518
%I VLDB Endowment Inc.
%B Proceedings of the 45h International Conference on  Very Large Data Bases
%O VLDB 2019 Los Angeles, CA, USA, 26-30 August 2019

Paper

P. Lahoti, K. P. Gummadi, and G. Weikum

“Operationalizing Individual Fairness with Pairwise Fair Representations,” 2019. [Online]. Available: http://arxiv.org/abs/1907.01439.

mehr

Abstract

We revisit the notion of individual fairness proposed by Dwork et al. A
central challenge in operationalizing their approach is the difficulty in
eliciting a human specification of a similarity metric. In this paper, we
propose an operationalization of individual fairness that does not rely on a
human specification of a distance metric. Instead, we propose novel approaches
to elicit and leverage side-information on equally deserving individuals to
counter subordination between social groups. We model this knowledge as a
fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the
data that captures both data-driven similarity between individuals and the
pairwise side-information in fairness graph. We elicit fairness judgments from
a variety of sources, including humans judgments for two real-world datasets on
recidivism prediction (COMPAS) and violent neighborhood prediction (Crime &
Communities). Our experiments show that the PFR model for operationalizing
individual fairness is practically viable.

BibTeX

@online{Lahoti_arXiv1907.01439,
TITLE = {Operationalizing Individual Fairness with Pairwise Fair Representations},
AUTHOR = {Lahoti, Preethi and Gummadi, Krishna P. and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1907.01439},
EPRINT = {1907.01439},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {We revisit the notion of individual fairness proposed by Dwork et al. A<br>central challenge in operationalizing their approach is the difficulty in<br>eliciting a human specification of a similarity metric. In this paper, we<br>propose an operationalization of individual fairness that does not rely on a<br>human specification of a distance metric. Instead, we propose novel approaches<br>to elicit and leverage side-information on equally deserving individuals to<br>counter subordination between social groups. We model this knowledge as a<br>fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the<br>data that captures both data-driven similarity between individuals and the<br>pairwise side-information in fairness graph. We elicit fairness judgments from<br>a variety of sources, including humans judgments for two real-world datasets on<br>recidivism prediction (COMPAS) and violent neighborhood prediction (Crime &<br>Communities). Our experiments show that the PFR model for operationalizing<br>individual fairness is practically viable.<br>},
}

Endnote

%0 Report
%A Lahoti, Preethi
%A Gummadi, Krishna P.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Operationalizing Individual Fairness with Pairwise Fair Representations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-FF17-5
%U http://arxiv.org/abs/1907.01439
%D 2019
%X   We revisit the notion of individual fairness proposed by Dwork et al. A<br>central challenge in operationalizing their approach is the difficulty in<br>eliciting a human specification of a similarity metric. In this paper, we<br>propose an operationalization of individual fairness that does not rely on a<br>human specification of a distance metric. Instead, we propose novel approaches<br>to elicit and leverage side-information on equally deserving individuals to<br>counter subordination between social groups. We model this knowledge as a<br>fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the<br>data that captures both data-driven similarity between individuals and the<br>pairwise side-information in fairness graph. We elicit fairness judgments from<br>a variety of sources, including humans judgments for two real-world datasets on<br>recidivism prediction (COMPAS) and violent neighborhood prediction (Crime &<br>Communities). Our experiments show that the PFR model for operationalizing<br>individual fairness is practically viable.<br>
%K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML

Conference paper

X. Lu, S. Pramanik, R. Saha Roy, A. Abujabal, Y. Wang, and G. Weikum

“Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs,” in SIGIR ’19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 2019.

mehr

BibTeX

@inproceedings{lu19answering,
TITLE = {Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs},
AUTHOR = {Lu, Xiaolu and Pramanik, Soumajit and Saha Roy, Rishiraj and Abujabal, Abdalghani and Wang, Yafang and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-6172-9},
DOI = {10.1145/3331184.3331252},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {SIGIR '19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Piwowarski, Benjamin and Chevalier, Max and Gaussier, {\'E}ric},
PAGES = {105--114},
ADDRESS = {Paris, France},
}

Endnote

%0 Conference Proceedings
%A Lu, Xiaolu
%A Pramanik, Soumajit
%A Saha Roy, Rishiraj
%A Abujabal, Abdalghani
%A Wang, Yafang
%A Weikum, Gerhard
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-7085-8
%R 10.1145/3331184.3331252
%D 2019
%B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2019-07-21 - 2019-07-25
%C Paris, France
%B SIGIR '19
%E Piwowarski, Benjamin; Chevalier, Max; Gaussier, &#201;ric
%P 105 - 114
%I ACM
%@ 978-1-4503-6172-9

Paper

X. Lu, S. Pramanik, R. Saha Roy, A. Abujabal, Y. Wang, and G. Weikum

“Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs,” 2019. [Online]. Available: http://arxiv.org/abs/1908.00469.

mehr

Abstract

Direct answering of questions that involve multiple entities and relations is
a challenge for text-based QA. This problem is most pronounced when answers can
be found only by joining evidence from multiple documents. Curated knowledge
graphs (KGs) may yield good answers, but are limited by their inherent
incompleteness and potential staleness. This paper presents QUEST, a method
that can answer complex questions directly from textual sources on-the-fly, by
computing similarity joins over partial results from different documents. Our
method is completely unsupervised, avoiding training-data bottlenecks and being
able to cope with rapidly evolving ad hoc topics and formulation style in user
questions. QUEST builds a noisy quasi KG with node and edge weights, consisting
of dynamically retrieved entity names and relational phrases. It augments this
graph with types and semantic alignments, and computes the best answers by an
algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex
questions, and show that it substantially outperforms state-of-the-art
baselines.

BibTeX

@online{Lu_arXiv1908.00469,
TITLE = {Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs},
AUTHOR = {Lu, Xiaolu and Pramanik, Soumajit and Saha Roy, Rishiraj and Abujabal, Abdalghani and Wang, Yafang and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1908.00469},
EPRINT = {1908.00469},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Direct answering of questions that involve multiple entities and relations is<br>a challenge for text-based QA. This problem is most pronounced when answers can<br>be found only by joining evidence from multiple documents. Curated knowledge<br>graphs (KGs) may yield good answers, but are limited by their inherent<br>incompleteness and potential staleness. This paper presents QUEST, a method<br>that can answer complex questions directly from textual sources on-the-fly, by<br>computing similarity joins over partial results from different documents. Our<br>method is completely unsupervised, avoiding training-data bottlenecks and being<br>able to cope with rapidly evolving ad hoc topics and formulation style in user<br>questions. QUEST builds a noisy quasi KG with node and edge weights, consisting<br>of dynamically retrieved entity names and relational phrases. It augments this<br>graph with types and semantic alignments, and computes the best answers by an<br>algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex<br>questions, and show that it substantially outperforms state-of-the-art<br>baselines.<br>},
}

Endnote

%0 Report
%A Lu, Xiaolu
%A Pramanik, Soumajit
%A Saha Roy, Rishiraj
%A Abujabal, Abdalghani
%A Wang, Yafang
%A Weikum, Gerhard
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83B3-C
%U http://arxiv.org/abs/1908.00469
%D 2019
%X   Direct answering of questions that involve multiple entities and relations is<br>a challenge for text-based QA. This problem is most pronounced when answers can<br>be found only by joining evidence from multiple documents. Curated knowledge<br>graphs (KGs) may yield good answers, but are limited by their inherent<br>incompleteness and potential staleness. This paper presents QUEST, a method<br>that can answer complex questions directly from textual sources on-the-fly, by<br>computing similarity joins over partial results from different documents. Our<br>method is completely unsupervised, avoiding training-data bottlenecks and being<br>able to cope with rapidly evolving ad hoc topics and formulation style in user<br>questions. QUEST builds a noisy quasi KG with node and edge weights, consisting<br>of dynamically retrieved entity names and relational phrases. It augments this<br>graph with types and semantic alignments, and computes the best answers by an<br>algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex<br>questions, and show that it substantially outperforms state-of-the-art<br>baselines.<br>
%K Computer Science, Information Retrieval, cs.IR

Conference paper

S. MacAvaney, A. Yates, K. Hui, and O. Frieder

“Content-Based Weak Supervision for Ad-Hoc Re-Ranking,” in SIGIR ’19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 2019.

mehr

BibTeX

@inproceedings{MacAvaney_SIGIR2019b,
TITLE = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir},
LANGUAGE = {eng},
ISBN = {9781450361729},
DOI = {10.1145/3331184.3331316},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {SIGIR '19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Piwowarski, Benjamin and Chevalier, Max and Gaussier, {\'E}ric},
PAGES = {993--996},
ADDRESS = {Paris, France},
}

Endnote

%0 Conference Proceedings
%A MacAvaney, Sean
%A Yates, Andrew
%A Hui, Kai
%A Frieder, Ophir
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Content-Based Weak Supervision for Ad-Hoc Re-Ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-6B55-4
%R 10.1145/3331184.3331316
%D 2019
%B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2019-07-21 - 2019-07-25
%C Paris, France
%B SIGIR '19
%E Piwowarski, Benjamin; Chevalier, Max; Gaussier, &#201;ric
%P 993 - 996
%I ACM
%@ 9781450361729

Paper

S. MacAvaney, A. Yates, K. Hui, and O. Frieder

“Content-Based Weak Supervision for Ad-Hoc Re-Ranking,” 2019. [Online]. Available: http://arxiv.org/abs/1707.00189.

mehr

Abstract

One challenge with neural ranking is the need for a large amount of
manually-labeled relevance judgments for training. In contrast with prior work,
we examine the use of weak supervision sources for training that yield pseudo
query-document pairs that already exhibit relevance (e.g., newswire
headline-content pairs and encyclopedic heading-paragraph pairs). We also
propose filtering techniques to eliminate training samples that are too far out
of domain using two techniques: a heuristic-based approach and novel supervised
filter that re-purposes a neural ranker. Using several leading neural ranking
architectures and multiple weak supervision datasets, we show that these
sources of training pairs are effective on their own (outperforming prior weak
supervision techniques), and that filtering can further improve performance.

BibTeX

@online{MacAvaney_arXiv1707.00189,
TITLE = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1707.00189},
EPRINT = {1707.00189},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {One challenge with neural ranking is the need for a large amount of<br>manually-labeled relevance judgments for training. In contrast with prior work,<br>we examine the use of weak supervision sources for training that yield pseudo<br>query-document pairs that already exhibit relevance (e.g., newswire<br>headline-content pairs and encyclopedic heading-paragraph pairs). We also<br>propose filtering techniques to eliminate training samples that are too far out<br>of domain using two techniques: a heuristic-based approach and novel supervised<br>filter that re-purposes a neural ranker. Using several leading neural ranking<br>architectures and multiple weak supervision datasets, we show that these<br>sources of training pairs are effective on their own (outperforming prior weak<br>supervision techniques), and that filtering can further improve performance.<br>},
}

Endnote

%0 Report
%A MacAvaney, Sean
%A Yates, Andrew
%A Hui, Kai
%A Frieder, Ophir
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Content-Based Weak Supervision for Ad-Hoc Re-Ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-6B59-0
%U http://arxiv.org/abs/1707.00189
%D 2019
%X   One challenge with neural ranking is the need for a large amount of<br>manually-labeled relevance judgments for training. In contrast with prior work,<br>we examine the use of weak supervision sources for training that yield pseudo<br>query-document pairs that already exhibit relevance (e.g., newswire<br>headline-content pairs and encyclopedic heading-paragraph pairs). We also<br>propose filtering techniques to eliminate training samples that are too far out<br>of domain using two techniques: a heuristic-based approach and novel supervised<br>filter that re-purposes a neural ranker. Using several leading neural ranking<br>architectures and multiple weak supervision datasets, we show that these<br>sources of training pairs are effective on their own (outperforming prior weak<br>supervision techniques), and that filtering can further improve performance.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Article

S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder

“Overcoming Low-Utility Facets for Complex Answer Retrieval,” Information Retrieval Journal, vol. 22, no. 3–4, 2019.

mehr

BibTeX

@article{MacAvaney2019,
TITLE = {Overcoming Low-Utility Facets for Complex Answer Retrieval},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir},
LANGUAGE = {eng},
ISSN = {1386-4564},
DOI = {10.1007/s10791-018-9343-0},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2019},
DATE = {2019},
JOURNAL = {Information Retrieval Journal},
VOLUME = {22},
NUMBER = {3-4},
PAGES = {395--418},
}

Endnote

%0 Journal Article
%A MacAvaney, Sean
%A Yates, Andrew
%A Cohan, Arman
%A Soldaini, Luca
%A Hui, Kai
%A Goharian, Nazli
%A Frieder, Ophir
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Overcoming Low-Utility Facets for Complex Answer Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-C4A1-9
%R 10.1007/s10791-018-9343-0
%7 2019
%D 2019
%J Information Retrieval Journal
%V 22
%N 3-4
%& 395
%P 395 - 418
%I Springer
%C New York, NY
%@ false

Conference paper

S. MacAvaney, A. Yates, A. Cohan, and N. Goharian

“CEDR: Contextualized Embeddings for Document Ranking,” in SIGIR ’19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 2019.

mehr

BibTeX

@inproceedings{MacAvaney_SIGIR2019,
TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli},
LANGUAGE = {eng},
ISBN = {9781450361729},
DOI = {10.1145/3331184.3331317},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {SIGIR '19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
EDITOR = {Piwowarski, Benjamin and Chevalier, Max and Gaussier, {\'E}ric},
PAGES = {1101--1104},
ADDRESS = {Paris, France},
}

Endnote

%0 Conference Proceedings
%A MacAvaney, Sean
%A Yates, Andrew
%A Cohan, Arman
%A Goharian, Nazli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T CEDR: Contextualized Embeddings for Document Ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-02D3-B
%R 10.1145/3331184.3331317
%D 2019
%B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2019-07-21 - 2019-07-25
%C Paris, France
%B SIGIR '19
%E Piwowarski, Benjamin; Chevalier, Max; Gaussier, &#201;ric
%P 1101 - 1104
%I ACM
%@ 9781450361729

Paper

S. MacAvaney, A. Yates, A. Cohan, and N. Goharian

“CEDR: Contextualized Embeddings for Document Ranking,” 2019. [Online]. Available: http://arxiv.org/abs/1904.07094.

mehr

Abstract

Although considerable attention has been given to neural ranking
architectures recently, far less attention has been paid to the term
representations that are used as input to these models. In this work, we
investigate how two pretrained contextualized language modes (ELMo and BERT)
can be utilized for ad-hoc document ranking. Through experiments on TREC
benchmarks, we find that several existing neural ranking architectures can
benefit from the additional context provided by contextualized language models.
Furthermore, we propose a joint approach that incorporates BERT's
classification vector into existing neural models and show that it outperforms
state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR
(Contextualized Embeddings for Document Ranking). We also address practical
challenges in using these models for ranking, including the maximum input
length imposed by BERT and runtime performance impacts of contextualized
language models.

BibTeX

@online{MacAvaney_arXiv1904.07094,
TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1904.07094},
EPRINT = {1904.07094},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Although considerable attention has been given to neural ranking<br>architectures recently, far less attention has been paid to the term<br>representations that are used as input to these models. In this work, we<br>investigate how two pretrained contextualized language modes (ELMo and BERT)<br>can be utilized for ad-hoc document ranking. Through experiments on TREC<br>benchmarks, we find that several existing neural ranking architectures can<br>benefit from the additional context provided by contextualized language models.<br>Furthermore, we propose a joint approach that incorporates BERT's<br>classification vector into existing neural models and show that it outperforms<br>state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR<br>(Contextualized Embeddings for Document Ranking). We also address practical<br>challenges in using these models for ranking, including the maximum input<br>length imposed by BERT and runtime performance impacts of contextualized<br>language models.<br>},
}

Endnote

%0 Report
%A MacAvaney, Sean
%A Yates, Andrew
%A Cohan, Arman
%A Goharian, Nazli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T CEDR: Contextualized Embeddings for Document Ranking : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-02C7-9
%U http://arxiv.org/abs/1904.07094
%D 2019
%X   Although considerable attention has been given to neural ranking<br>architectures recently, far less attention has been paid to the term<br>representations that are used as input to these models. In this work, we<br>investigate how two pretrained contextualized language modes (ELMo and BERT)<br>can be utilized for ad-hoc document ranking. Through experiments on TREC<br>benchmarks, we find that several existing neural ranking architectures can<br>benefit from the additional context provided by contextualized language models.<br>Furthermore, we propose a joint approach that incorporates BERT's<br>classification vector into existing neural models and show that it outperforms<br>state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR<br>(Contextualized Embeddings for Document Ranking). We also address practical<br>challenges in using these models for ranking, including the maximum input<br>length imposed by BERT and runtime performance impacts of contextualized<br>language models.<br>
%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

P. Mandros, M. Boley, and J. Vreeken

“Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019), Macao, 2019.

mehr

BibTeX

@inproceedings{mandros_IJCAI2019,
TITLE = {Discovering Reliable Dependencies from Data: {H}ardness and Improved Algorithms},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-0-9992411-4-1},
DOI = {10.24963/ijcai.2019/864},
PUBLISHER = {IJCAI},
YEAR = {2019},
BOOKTITLE = {Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019)},
EDITOR = {Krais, Sarit},
PAGES = {6206--6210},
ADDRESS = {Macao},
}

Endnote

%0 Conference Proceedings
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-848A-A
%R 10.24963/ijcai.2019/864 
%D 2019
%B Twenty-Eighth International Joint Conference on Artificial Intelligence
%Z date of event: 2019-08-10 - 2019-08-16
%C Macao
%B Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
%E Krais, Sarit
%P 6206 - 6210
%I IJCAI
%@ 978-0-9992411-4-1 
%U https://www.ijcai.org/Proceedings/2019/0864.pdf

Conference paper

P. Mandros, M. Boley, and J. Vreeken

“Discovering Reliable Correlations in Categorical Data,” in 19th IEEE International Conference on Data Mining (ICDM 2019), Beijing, China, 2019.

mehr

BibTeX

@inproceedings{Mandros_ICDM2019,
TITLE = {Discovering Reliable Correlations in Categorical Data},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-7281-4604-1},
DOI = {10.1109/ICDM.2019.00156},
PUBLISHER = {IEEE},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {19th IEEE International Conference on Data Mining (ICDM 2019)},
PAGES = {1252--1257},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Discovering Reliable Correlations in Categorical Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0006-F27B-F
%R  10.1109/ICDM.2019.00156
%D 2019
%B 19th IEEE International Conference on Data Mining 
%Z date of event: 2019-11-08 - 2019-11-11
%C Beijing, China
%B 19th IEEE International Conference on Data Mining 
%P 1252 - 1257
%I IEEE
%@  978-1-7281-4604-1

Paper

P. Mandros, M. Boley, and J. Vreeken

“Discovering Reliable Correlations in Categorical Data,” 2019. [Online]. Available: http://arxiv.org/abs/1908.11682.

mehr

Abstract

In many scientific tasks we are interested in discovering whether there exist
any correlations in our data. This raises many questions, such as how to
reliably and interpretably measure correlation between a multivariate set of
attributes, how to do so without having to make assumptions on distribution of
the data or the type of correlation, and, how to efficiently discover the
top-most reliably correlated attribute sets from data. In this paper we answer
these questions for discovery tasks in categorical data.
In particular, we propose a corrected-for-chance, consistent, and efficient
estimator for normalized total correlation, by which we obtain a reliable,
naturally interpretable, non-parametric measure for correlation over
multivariate sets. For the discovery of the top-k correlated sets, we derive an
effective algorithmic framework based on a tight bounding function. This
framework offers exact, approximate, and heuristic search. Empirical evaluation
shows that already for small sample sizes the estimator leads to low-regret
optimization outcomes, while the algorithms are shown to be highly effective
for both large and high-dimensional data. Through two case studies we confirm
that our discovery framework identifies interesting and meaningful
correlations.

BibTeX

@online{Mandros_arXiv1908.11682,
TITLE = {Discovering Reliable Correlations in Categorical Data},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1908.11682},
EPRINT = {1908.11682},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {In many scientific tasks we are interested in discovering whether there exist<br>any correlations in our data. This raises many questions, such as how to<br>reliably and interpretably measure correlation between a multivariate set of<br>attributes, how to do so without having to make assumptions on distribution of<br>the data or the type of correlation, and, how to efficiently discover the<br>top-most reliably correlated attribute sets from data. In this paper we answer<br>these questions for discovery tasks in categorical data.<br> In particular, we propose a corrected-for-chance, consistent, and efficient<br>estimator for normalized total correlation, by which we obtain a reliable,<br>naturally interpretable, non-parametric measure for correlation over<br>multivariate sets. For the discovery of the top-k correlated sets, we derive an<br>effective algorithmic framework based on a tight bounding function. This<br>framework offers exact, approximate, and heuristic search. Empirical evaluation<br>shows that already for small sample sizes the estimator leads to low-regret<br>optimization outcomes, while the algorithms are shown to be highly effective<br>for both large and high-dimensional data. Through two case studies we confirm<br>that our discovery framework identifies interesting and meaningful<br>correlations.<br>},
}

Endnote

%0 Report
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Discovering Reliable Correlations in Categorical Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8491-1
%U http://arxiv.org/abs/1908.11682
%D 2019
%X   In many scientific tasks we are interested in discovering whether there exist<br>any correlations in our data. This raises many questions, such as how to<br>reliably and interpretably measure correlation between a multivariate set of<br>attributes, how to do so without having to make assumptions on distribution of<br>the data or the type of correlation, and, how to efficiently discover the<br>top-most reliably correlated attribute sets from data. In this paper we answer<br>these questions for discovery tasks in categorical data.<br>  In particular, we propose a corrected-for-chance, consistent, and efficient<br>estimator for normalized total correlation, by which we obtain a reliable,<br>naturally interpretable, non-parametric measure for correlation over<br>multivariate sets. For the discovery of the top-k correlated sets, we derive an<br>effective algorithmic framework based on a tight bounding function. This<br>framework offers exact, approximate, and heuristic search. Empirical evaluation<br>shows that already for small sample sizes the estimator leads to low-regret<br>optimization outcomes, while the algorithms are shown to be highly effective<br>for both large and high-dimensional data. Through two case studies we confirm<br>that our discovery framework identifies interesting and meaningful<br>correlations.<br>
%K Computer Science, Learning, cs.LG,Computer Science, Databases, cs.DB,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT,Statistics, Machine Learning, stat.ML

Paper

A. Marx and J. Vreeken

“Testing Conditional Independence on Discrete Data using Stochastic Complexity,” 2019. [Online]. Available: http://arxiv.org/abs/1903.04829.

mehr

Abstract

Testing for conditional independence is a core aspect of constraint-based
causal discovery. Although commonly used tests are perfect in theory, they
often fail to reject independence in practice, especially when conditioning on
multiple variables.
We focus on discrete data and propose a new test based on the notion of
algorithmic independence that we instantiate using stochastic complexity.
Amongst others, we show that our proposed test, SCI, is an asymptotically
unbiased as well as $L_2$ consistent estimator for conditional mutual
information (CMI). Further, we show that SCI can be reformulated to find a
sensible threshold for CMI that works well on limited samples. Empirical
evaluation shows that SCI has a lower type II error than commonly used tests.
As a result, we obtain a higher recall when we use SCI in causal discovery
algorithms, without compromising the precision.

BibTeX

@online{Marx_arXiv1903.04829,
TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1903.04829},
EPRINT = {1903.04829},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Testing for conditional independence is a core aspect of constraint-based<br>causal discovery. Although commonly used tests are perfect in theory, they<br>often fail to reject independence in practice, especially when conditioning on<br>multiple variables.<br> We focus on discrete data and propose a new test based on the notion of<br>algorithmic independence that we instantiate using stochastic complexity.<br>Amongst others, we show that our proposed test, SCI, is an asymptotically<br>unbiased as well as $L_2$ consistent estimator for conditional mutual<br>information (CMI). Further, we show that SCI can be reformulated to find a<br>sensible threshold for CMI that works well on limited samples. Empirical<br>evaluation shows that SCI has a lower type II error than commonly used tests.<br>As a result, we obtain a higher recall when we use SCI in causal discovery<br>algorithms, without compromising the precision.<br>},
}

Endnote

%0 Report
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Testing Conditional Independence on Discrete Data using Stochastic
  Complexity : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-027A-1
%U http://arxiv.org/abs/1903.04829
%D 2019
%X   Testing for conditional independence is a core aspect of constraint-based<br>causal discovery. Although commonly used tests are perfect in theory, they<br>often fail to reject independence in practice, especially when conditioning on<br>multiple variables.<br>  We focus on discrete data and propose a new test based on the notion of<br>algorithmic independence that we instantiate using stochastic complexity.<br>Amongst others, we show that our proposed test, SCI, is an asymptotically<br>unbiased as well as $L_2$ consistent estimator for conditional mutual<br>information (CMI). Further, we show that SCI can be reformulated to find a<br>sensible threshold for CMI that works well on limited samples. Empirical<br>evaluation shows that SCI has a lower type II error than commonly used tests.<br>As a result, we obtain a higher recall when we use SCI in causal discovery<br>algorithms, without compromising the precision.<br>
%K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG

Article

A. Marx and J. Vreeken

“Telling Cause from Effect by Local and Global Regression,” Knowledge and Information Systems, vol. 60, no. 3, 2019.

mehr

BibTeX

@article{marx:19:crack,
TITLE = {Telling Cause from Effect by Local and Global Regression},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {0219-1377},
DOI = {10.1007/s10115-018-1286-7},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2019},
DATE = {2019},
JOURNAL = {Knowledge and Information Systems},
VOLUME = {60},
NUMBER = {3},
PAGES = {1277--1305},
}

Endnote

%0 Journal Article
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Telling Cause from Effect by Local and Global Regression : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9EAD-A
%R 10.1007/s10115-018-1286-7
%7 2018-12-07
%D 2019
%J Knowledge and Information Systems
%V 60
%N 3
%& 1277
%P 1277 - 1305
%I Springer
%C New York, NY
%@ false

Conference paper

A. Marx and J. Vreeken

“Testing Conditional Independence on Discrete Data using Stochastic Complexity,” in Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019), Naha, Okinawa, Japan, 2019.

mehr

BibTeX

@inproceedings{Marx_AISTATS2019,
TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
PUBLISHER = {PMLR},
YEAR = {2019},
BOOKTITLE = {Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)},
EDITOR = {Chaudhuri, Kamalika and Sugiyama, Masashi},
PAGES = {496--505},
SERIES = {Proceedings of the Machine Learning Research},
VOLUME = {89},
ADDRESS = {Naha, Okinawa, Japan},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Testing Conditional Independence on Discrete Data using Stochastic Complexity : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-0D3C-D
%D 2019
%B 22nd International Conference on Artificial Intelligence and Statistics
%Z date of event: 2019-04-16 - 2019-04-18
%C Naha, Okinawa, Japan
%B Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics
%E Chaudhuri, Kamalika; Sugiyama, Masashi
%P 496 - 505
%I PMLR
%B Proceedings of the Machine Learning Research
%N 89
%U http://proceedings.mlr.press/v89/marx19a/marx19a.pdf

Conference paper

A. Marx and J. Vreeken

“Identifiability of Cause and Effect using Regularized Regression,” in KDD ’19, 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019.

mehr

BibTeX

@inproceedings{Marx_KDD2019,
TITLE = {Identifiability of Cause and Effect using Regularized Regression},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-4503-6201-6},
DOI = {10.1145/3292500.3330854},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {KDD '19, 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
PAGES = {852--861},
ADDRESS = {Anchorage, AK, USA},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Identifiability of Cause and Effect using Regularized Regression : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-858C-8
%R 10.1145/3292500.3330854
%D 2019
%B 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
%Z date of event: 2019-08-04 - 2019-08-08
%C Anchorage, AK, USA
%B KDD '19
%P 852 - 861
%I ACM
%@ 978-1-4503-6201-6

Conference paper

A. Marx and J. Vreeken

“Approximating Algorithmic Conditional Independence for Discrete Data,” in Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI, Stanford, CA, USA.

mehr

BibTeX

@inproceedings{Marx_AAAISpringSymp2019,
TITLE = {Approximating Algorithmic Conditional Independence for Discrete Data},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
YEAR = {2019},
PUBLREMARK = {Accepted},
BOOKTITLE = {Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI},
ADDRESS = {Stanford, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Approximating Algorithmic Conditional Independence for Discrete Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-0D4C-B
%D 2019
%B First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI
%Z date of event: 2019-05-25 - 2019-05-27
%C Stanford, CA, USA
%B Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI

Conference paper

A. Marx and J. Vreeken

“Causal Inference on Multivariate and Mixed-Type Data,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018), Dublin, Ireland, 2019.

mehr

BibTeX

@inproceedings{marx:18:crack,
TITLE = {Causal Inference on Multivariate and Mixed-Type Data},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-030-10927-1},
DOI = {10.1007/978-3-030-10928-8_39},
PUBLISHER = {Springer},
YEAR = {2018},
DATE = {2019},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018)},
EDITOR = {Berlingerio, Michele and Bonchi, Francesco and G{\"a}rtner, Thomas and Hurley, Neil and Ifrim, Georgiana},
PAGES = {655--671},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {11052},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Inference on Multivariate and Mixed-Type Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9E86-5
%R 10.1007/978-3-030-10928-8_39
%D 2019
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2018-09-10 - 2018-09-14
%C Dublin, Ireland
%B Machine Learning and Knowledge Discovery in Databases
%E Berlingerio, Michele; Bonchi, Francesco; G&#228;rtner, Thomas; Hurley, Neil; Ifrim, Georgiana
%P 655 - 671
%I Springer
%@ 978-3-030-10927-1
%B Lecture Notes in Artificial Intelligence
%N 11052

Conference paper

F. Mesquita, M. Cannaviccio, J. Schmidek, P. Mirza, and D. Barbosa

“KnowledgeNet: A Benchmark Dataset for Knowledge Base Population,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.

mehr

BibTeX

@inproceedings{mesquita-etal-2019-knowledgenet,
TITLE = {{KnowledgeNet}: {A} Benchmark Dataset for Knowledge Base Population},
AUTHOR = {Mesquita, Filipe and Cannaviccio, Matteo and Schmidek, Jordan and Mirza, Paramita and Barbosa, Denilson},
LANGUAGE = {eng},
ISBN = {978-1-950737-90-1},
DOI = {10.18653/v1/D19-1069},
PUBLISHER = {ACL},
YEAR = {2019},
BOOKTITLE = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)},
EDITOR = {Inui, Kentaro and Jiang, Jing and Ng, Vincent and Wan, Xiaojun},
PAGES = {749--758},
ADDRESS = {Hong Kong, China},
}

Endnote

%0 Conference Proceedings
%A Mesquita, Filipe
%A Cannaviccio, Matteo
%A Schmidek, Jordan
%A Mirza, Paramita
%A Barbosa, Denilson
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T KnowledgeNet: A Benchmark Dataset for Knowledge Base Population : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-0410-1
%R 10.18653/v1/D19-1069
%F OTHER: D19-1069
%D 2019
%B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing 
%Z date of event: 2019-11-03 - 2019-11-07
%C Hong Kong, China
%B Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing 
%E Inui, Kentaro; Jiang, Jing; Ng, Vincent; Wan, Xiaojun
%P 749 - 758
%I ACL
%@ 978-1-950737-90-1
%U https://www.aclweb.org/anthology/D19-1069

Article

S. Metzler and P. Miettinen

“HyGen: Generating Random Graphs with Hyperbolic Communities,” Applied Network Science, vol. 4, 2019.

mehr

BibTeX

@article{Metzler_Miettienen19,
TITLE = {{HyGen}: {G}enerating Random Graphs with Hyperbolic Communities},
AUTHOR = {Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
ISSN = {2364-8228},
DOI = {10.1007/s41109-019-0166-8},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2019},
JOURNAL = {Applied Network Science},
VOLUME = {4},
EID = {53},
}

Endnote

%0 Journal Article
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T HyGen: Generating Random Graphs with Hyperbolic Communities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8E5E-3
%R 10.1007/s41109-019-0166-8
%7 2019
%D 2019
%J Applied Network Science
%O ANS Appl Netw Sci
%V 4
%Z sequence number: 53
%I Springer
%C New York, NY
%@ false

Article

S. Metzler, S. Günnemann, and P. Miettinen

“Stability and Dynamics of Communities on Online Question-Answer Sites,” Social Networks, vol. 58, 2019.

mehr

BibTeX

@article{Metzler2019,
TITLE = {Stability and Dynamics of Communities on Online Question-Answer Sites},
AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli},
LANGUAGE = {eng},
ISSN = {0378-8733},
DOI = {10.1016/j.socnet.2018.12.004},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2019},
DATE = {2019},
JOURNAL = {Social Networks},
VOLUME = {58},
PAGES = {50--58},
}

Endnote

%0 Journal Article
%A Metzler, Saskia
%A G&#252;nnemann, Stephan
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Stability and Dynamics of Communities on Online Question-Answer Sites : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-BCC1-0
%R 10.1016/j.socnet.2018.12.004
%7 2019
%D 2019
%J Social Networks
%V 58
%& 50
%P 50 - 58
%I Elsevier
%C Amsterdam
%@ false

Thesis

O. A. Mian

“Causal Discovery using MDL-based Regression,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

BibTeX

@mastersthesis{mian:19:cdregression,
TITLE = {Causal Discovery using {MDL}-based Regression},
AUTHOR = {Mian, Osman Ali},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
}

Endnote

%0 Thesis
%A Mian, Osman Ali
%Y Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Discovery using MDL-based Regression : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FF0D-D
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%V master
%9 master

Conference paper

M. Mohanty, M. Ramanath, M. Yahya, and G. Weikum

“Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs,” in Advances in Database Technology (EDBT 2019), Lisbon, Portugal, 2019.

mehr

BibTeX

@inproceedings{Mohanty:EDBT2019,
TITLE = {{Spec-QP}: {S}peculative Query Planning for Joins over Knowledge Graphs},
AUTHOR = {Mohanty, Madhulika and Ramanath, Maya and Yahya, Mohamed and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-89318-081-3},
DOI = {10.5441/002/edbt.2019.07},
PUBLISHER = {OpenProceedings.org},
YEAR = {2019},
BOOKTITLE = {Advances in Database Technology (EDBT 2019)},
EDITOR = {Herschel, Melanie and Galhardas, Helena and Reinwald, Berthold and Fundlaki, Irini and Binning, Carsten and Kaoudi, Zoi},
PAGES = {61--72},
ADDRESS = {Lisbon, Portugal},
}

Endnote

%0 Conference Proceedings
%A Mohanty, Madhulika
%A Ramanath, Maya
%A Yahya, Mohamed
%A Weikum, Gerhard
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-3A7D-1
%R 10.5441/002/edbt.2019.07
%D 2019
%B 22nd International Conference on Extending Database Technology
%Z date of event: 2019-03-26 - 2019-03-29
%C Lisbon, Portugal
%B Advances in Database Technology
%E Herschel, Melanie; Galhardas, Helena; Reinwald, Berthold; Fundlaki, Irini; Binning, Carsten; Kaoudi, Zoi
%P 61 - 72
%I OpenProceedings.org
%@ 978-3-89318-081-3

Paper

S. Nag Chowdhury, S. Razniewski, and G. Weikum

“Story-oriented Image Selection and Placement,” 2019. [Online]. Available: http://arxiv.org/abs/1909.00692.

mehr

Abstract

Multimodal contents have become commonplace on the Internet today, manifested
as news articles, social media posts, and personal or business blog posts.
Among the various kinds of media (images, videos, graphics, icons, audio) used
in such multimodal stories, images are the most popular. The selection of
images from a collection - either author's personal photo album, or web
repositories - and their meticulous placement within a text, builds a succinct
multimodal commentary for digital consumption. In this paper we present a
system that automates the process of selecting relevant images for a story and
placing them at contextual paragraphs within the story for a multimodal
narration. We leverage automatic object recognition, user-provided tags, and
commonsense knowledge, and use an unsupervised combinatorial optimization to
solve the selection and placement problems seamlessly as a single unit.

BibTeX

@online{Nag_arXiv1909.00692,
TITLE = {Story-oriented Image Selection and Placement},
AUTHOR = {Nag Chowdhury, Sreyasi and Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1909.00692},
EPRINT = {1909.00692},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Multimodal contents have become commonplace on the Internet today, manifested<br>as news articles, social media posts, and personal or business blog posts.<br>Among the various kinds of media (images, videos, graphics, icons, audio) used<br>in such multimodal stories, images are the most popular. The selection of<br>images from a collection -- either author's personal photo album, or web<br>repositories -- and their meticulous placement within a text, builds a succinct<br>multimodal commentary for digital consumption. In this paper we present a<br>system that automates the process of selecting relevant images for a story and<br>placing them at contextual paragraphs within the story for a multimodal<br>narration. We leverage automatic object recognition, user-provided tags, and<br>commonsense knowledge, and use an unsupervised combinatorial optimization to<br>solve the selection and placement problems seamlessly as a single unit.<br>},
}

Endnote

%0 Report
%A Nag Chowdhury, Sreyasi
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Story-oriented Image Selection and Placement : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83C9-4
%U http://arxiv.org/abs/1909.00692
%D 2019
%X   Multimodal contents have become commonplace on the Internet today, manifested<br>as news articles, social media posts, and personal or business blog posts.<br>Among the various kinds of media (images, videos, graphics, icons, audio) used<br>in such multimodal stories, images are the most popular. The selection of<br>images from a collection - either author's personal photo album, or web<br>repositories - and their meticulous placement within a text, builds a succinct<br>multimodal commentary for digital consumption. In this paper we present a<br>system that automates the process of selecting relevant images for a story and<br>placing them at contextual paragraphs within the story for a multimodal<br>narration. We leverage automatic object recognition, user-provided tags, and<br>commonsense knowledge, and use an unsupervised combinatorial optimization to<br>solve the selection and placement problems seamlessly as a single unit.<br>
%K Computer Science, Computation and Language, cs.CL

Paper

S. Nag Chowdhury, N. Tandon, and G. Weikum

“Know2Look: Commonsense Knowledge for Visual Search,” 2019. [Online]. Available: http://arxiv.org/abs/1909.00749.

mehr

Abstract

With the rise in popularity of social media, images accompanied by contextual
text form a huge section of the web. However, search and retrieval of documents
are still largely dependent on solely textual cues. Although visual cues have
started to gain focus, the imperfection in object/scene detection do not lead
to significantly improved results. We hypothesize that the use of background
commonsense knowledge on query terms can significantly aid in retrieval of
documents with associated images. To this end we deploy three different
modalities - text, visual cues, and commonsense knowledge pertaining to the
query - as a recipe for efficient search and retrieval.

BibTeX

@online{Nag_arXiv1909.00749,
TITLE = {{Know2Look}: Commonsense Knowledge for Visual Search},
AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1909.00749},
EPRINT = {1909.00749},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {With the rise in popularity of social media, images accompanied by contextual<br>text form a huge section of the web. However, search and retrieval of documents<br>are still largely dependent on solely textual cues. Although visual cues have<br>started to gain focus, the imperfection in object/scene detection do not lead<br>to significantly improved results. We hypothesize that the use of background<br>commonsense knowledge on query terms can significantly aid in retrieval of<br>documents with associated images. To this end we deploy three different<br>modalities -- text, visual cues, and commonsense knowledge pertaining to the<br>query -- as a recipe for efficient search and retrieval.<br>},
}

Endnote

%0 Report
%A Nag Chowdhury, Sreyasi
%A Tandon, Niket
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Know2Look: Commonsense Knowledge for Visual Search : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83D2-9
%U http://arxiv.org/abs/1909.00749
%D 2019
%X   With the rise in popularity of social media, images accompanied by contextual<br>text form a huge section of the web. However, search and retrieval of documents<br>are still largely dependent on solely textual cues. Although visual cues have<br>started to gain focus, the imperfection in object/scene detection do not lead<br>to significantly improved results. We hypothesize that the use of background<br>commonsense knowledge on query terms can significantly aid in retrieval of<br>documents with associated images. To this end we deploy three different<br>modalities - text, visual cues, and commonsense knowledge pertaining to the<br>query - as a recipe for efficient search and retrieval.<br>
%K Computer Science, Information Retrieval, cs.IR

Paper

S. Nag Chowdhury, N. Tandon, H. Ferhatosmanoglu, and G. Weikum

“VISIR: Visual and Semantic Image Label Refinement,” 2019. [Online]. Available: http://arxiv.org/abs/1909.00741.

mehr

Abstract

The social media explosion has populated the Internet with a wealth of
images. There are two existing paradigms for image retrieval: 1) content-based
image retrieval (CBIR), which has traditionally used visual features for
similarity search (e.g., SIFT features), and 2) tag-based image retrieval
(TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains
semantic expressiveness by advances in deep-learning-based detection of visual
labels. TBIR benefits from query-and-click logs to automatically infer more
informative labels. However, learning-based tagging still yields noisy labels
and is restricted to concrete objects, missing out on generalizations and
abstractions. Click-based tagging is limited to terms that appear in the
textual context of an image or in queries that lead to a click. This paper
addresses the above limitations by semantically refining and expanding the
labels suggested by learning-based object detection. We consider the semantic
coherence between the labels for different objects, leverage lexical and
commonsense knowledge, and cast the label assignment into a constrained
optimization problem solved by an integer linear program. Experiments show that
our method, called VISIR, improves the quality of the state-of-the-art visual
labeling tools like LSDA and YOLO.

BibTeX

@online{Nag_arXiv1909.00741,
TITLE = {{VISIR}: Visual and Semantic Image Label Refinement},
AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Ferhatosmanoglu, Hakan and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1909.00741},
EPRINT = {1909.00741},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {The social media explosion has populated the Internet with a wealth of<br>images. There are two existing paradigms for image retrieval: 1) content-based<br>image retrieval (CBIR), which has traditionally used visual features for<br>similarity search (e.g., SIFT features), and 2) tag-based image retrieval<br>(TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains<br>semantic expressiveness by advances in deep-learning-based detection of visual<br>labels. TBIR benefits from query-and-click logs to automatically infer more<br>informative labels. However, learning-based tagging still yields noisy labels<br>and is restricted to concrete objects, missing out on generalizations and<br>abstractions. Click-based tagging is limited to terms that appear in the<br>textual context of an image or in queries that lead to a click. This paper<br>addresses the above limitations by semantically refining and expanding the<br>labels suggested by learning-based object detection. We consider the semantic<br>coherence between the labels for different objects, leverage lexical and<br>commonsense knowledge, and cast the label assignment into a constrained<br>optimization problem solved by an integer linear program. Experiments show that<br>our method, called VISIR, improves the quality of the state-of-the-art visual<br>labeling tools like LSDA and YOLO.<br>},
}

Endnote

%0 Report
%A Nag Chowdhury, Sreyasi
%A Tandon, Niket
%A Ferhatosmanoglu, Hakan
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T VISIR: Visual and Semantic Image Label Refinement : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83CE-F
%U http://arxiv.org/abs/1909.00741
%D 2019
%X   The social media explosion has populated the Internet with a wealth of<br>images. There are two existing paradigms for image retrieval: 1) content-based<br>image retrieval (CBIR), which has traditionally used visual features for<br>similarity search (e.g., SIFT features), and 2) tag-based image retrieval<br>(TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains<br>semantic expressiveness by advances in deep-learning-based detection of visual<br>labels. TBIR benefits from query-and-click logs to automatically infer more<br>informative labels. However, learning-based tagging still yields noisy labels<br>and is restricted to concrete objects, missing out on generalizations and<br>abstractions. Click-based tagging is limited to terms that appear in the<br>textual context of an image or in queries that lead to a click. This paper<br>addresses the above limitations by semantically refining and expanding the<br>labels suggested by learning-based object detection. We consider the semantic<br>coherence between the labels for different objects, leverage lexical and<br>commonsense knowledge, and cast the label assignment into a constrained<br>optimization problem solved by an integer linear program. Experiments show that<br>our method, called VISIR, improves the quality of the state-of-the-art visual<br>labeling tools like LSDA and YOLO.<br>
%K Computer Science, Multimedia, cs.MM,Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Information Retrieval, cs.IR

Article

S. Paramonov, D. Stepanova, and P. Miettinen

“Hybrid ASP-based Approach to Pattern Mining,” Theory and Practice of Logic Programming, vol. 19, no. 4, 2019.

mehr

BibTeX

@article{ParamonovTPLP,
TITLE = {Hybrid {ASP}-based Approach to Pattern Mining},
AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli},
LANGUAGE = {eng},
ISSN = {1471-0684},
DOI = {10.1017/S1471068418000467},
PUBLISHER = {Cambridge University Press},
ADDRESS = {Cambridge},
YEAR = {2019},
DATE = {2019},
JOURNAL = {Theory and Practice of Logic Programming},
VOLUME = {19},
NUMBER = {4},
PAGES = {505--535},
}

Endnote

%0 Journal Article
%A Paramonov, Sergey
%A Stepanova, Daria
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Hybrid ASP-based Approach to Pattern Mining : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-0CC4-3
%R 10.1017/S1471068418000467
%7 2019
%D 2019
%J Theory and Practice of Logic Programming
%O TPLP
%V 19
%N 4
%& 505
%P 505 - 535
%I Cambridge University Press
%C Cambridge
%@ false

Thesis

D5IMPR-CS

K. Popat

“Credibility Analysis of Textual Claimswith Explainable Evidence,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

Abstract

Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.

BibTeX

@phdthesis{Popatphd2019,
TITLE = {Credibility Analysis of Textual Claimswith Explainable Evidence},
AUTHOR = {Popat, Kashyap},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291--ds-300050},
DOI = {10.22028/D291-30005},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
ABSTRACT = {Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.},
}

Endnote

%0 Thesis
%A Popat, Kashyap
%Y Weikum, Gerhard
%A referee: Naumann, Felix
%A referee: Yates, Andrew
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Credibility Analysis of Textual Claimswith Explainable Evidence
:
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-654D-4
%R 10.22028/D291-30005
%U urn:nbn:de:bsz:291--ds-300050
%F OTHER: hdl:20.500.11880/28481
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%P 134 p.
%V phd
%9 phd
%X Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28481

Paper

K. Popat, S. Mukherjee, A. Yates, and G. Weikum

“STANCY: Stance Classification Based on Consistency Cues,” 2019. [Online]. Available: http://arxiv.org/abs/1910.06048.

mehr

Abstract

Controversial claims are abundant in online media and discussion forums. A
better understanding of such claims requires analyzing them from different
perspectives. Stance classification is a necessary step for inferring these
perspectives in terms of supporting or opposing the claim. In this work, we
present a neural network model for stance classification leveraging BERT
representations and augmenting them with a novel consistency constraint.
Experiments on the Perspectrum dataset, consisting of claims and users'
perspectives from various debate websites, demonstrate the effectiveness of our
approach over state-of-the-art baselines.

BibTeX

@online{Popat_arXiv1910.06048,
TITLE = {{STANCY}: Stance Classification Based on Consistency Cues},
AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1910.06048},
EPRINT = {1910.06048},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Controversial claims are abundant in online media and discussion forums. A<br>better understanding of such claims requires analyzing them from different<br>perspectives. Stance classification is a necessary step for inferring these<br>perspectives in terms of supporting or opposing the claim. In this work, we<br>present a neural network model for stance classification leveraging BERT<br>representations and augmenting them with a novel consistency constraint.<br>Experiments on the Perspectrum dataset, consisting of claims and users'<br>perspectives from various debate websites, demonstrate the effectiveness of our<br>approach over state-of-the-art baselines.<br>},
}

Endnote

%0 Report
%A Popat, Kashyap
%A Mukherjee, Subhabrata
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T STANCY: Stance Classification Based on Consistency Cues : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83E2-7
%U http://arxiv.org/abs/1910.06048
%D 2019
%X   Controversial claims are abundant in online media and discussion forums. A<br>better understanding of such claims requires analyzing them from different<br>perspectives. Stance classification is a necessary step for inferring these<br>perspectives in terms of supporting or opposing the claim. In this work, we<br>present a neural network model for stance classification leveraging BERT<br>representations and augmenting them with a novel consistency constraint.<br>Experiments on the Perspectrum dataset, consisting of claims and users'<br>perspectives from various debate websites, demonstrate the effectiveness of our<br>approach over state-of-the-art baselines.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG

Conference paper

K. Popat, S. Mukherjee, A. Yates, and G. Weikum

“STANCY: Stance Classification Based on Consistency Cues,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.

mehr

BibTeX

@inproceedings{D19-1675,
TITLE = {STANCY: {S}tance Classification Based on Consistency Cues},
AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-950737-90-1},
URL = {https://www.aclweb.org/anthology/D19-1675/},
DOI = {10.18653/v1/D19-1675},
PUBLISHER = {ACL},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)},
EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun},
PAGES = {6412--6417},
ADDRESS = {Hong Kong, China},
}

Endnote

%0 Conference Proceedings
%A Popat, Kashyap
%A Mukherjee, Subhabrata
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T STANCY: Stance Classification Based on Consistency Cues : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-827A-F
%U https://www.aclweb.org/anthology/D19-1675/
%R 10.18653/v1/D19-1675
%D 2019
%B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%Z date of event: 2019-11-03 - 2019-11-07
%C Hong Kong, China
%B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun
%P 6412 - 6417
%I ACL
%@ 978-1-950737-90-1

Conference paper

S. Razniewski, N. Jain, P. Mirza, and G. Weikum

“Coverage of Information Extraction from Sentences and Paragraphs,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.

mehr

BibTeX

@inproceedings{D19-1000,
TITLE = {Coverage of Information Extraction from Sentences and Paragraphs},
AUTHOR = {Razniewski, Simon and Jain, Nitisha and Mirza, Paramita and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-950737-90-1},
URL = {https://www.aclweb.org/anthology/D19-1000},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)},
EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun},
PAGES = {5770--5775},
ADDRESS = {Hong Kong, China},
}

Endnote

%0 Conference Proceedings
%A Razniewski, Simon
%A Jain, Nitisha
%A Mirza, Paramita
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Coverage of Information Extraction from Sentences and Paragraphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8265-6
%U https://www.aclweb.org/anthology/D19-1000
%D 2019
%B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%Z date of event: 2019-11-03 - 2019-11-07
%C Hong Kong, China
%B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun
%P 5770 - 5775
%I ACM
%@ 978-1-950737-90-1

Paper

J. Romero, S. Razniewski, K. Pal, J. Z. Pan, A. Sakhadeo, and G. Weikum

“Commonsense Properties from Query Logs and Question Answering Forums,” 2019. [Online]. Available: http://arxiv.org/abs/1905.10989.

mehr

Abstract

Commonsense knowledge about object properties, human behavior and general
concepts is crucial for robust AI applications. However, automatic acquisition
of this knowledge is challenging because of sparseness and bias in online
sources. This paper presents Quasimodo, a methodology and tool suite for
distilling commonsense properties from non-standard web sources. We devise
novel ways of tapping into search-engine query logs and QA forums, and
combining the resulting candidate assertions with statistical cues from
encyclopedias, books and image tags in a corroboration step. Unlike prior work
on commonsense knowledge bases, Quasimodo focuses on salient properties that
are typically associated with certain objects or concepts. Extensive
evaluations, including extrinsic use-case studies, show that Quasimodo provides
better coverage than state-of-the-art baselines with comparable quality.

BibTeX

@online{Romero_arXiv1905.10989,
TITLE = {Commonsense Properties from Query Logs and Question Answering Forums},
AUTHOR = {Romero, Julien and Razniewski, Simon and Pal, Koninika and Pan, Jeff Z. and Sakhadeo, Archit and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1905.10989},
EPRINT = {1905.10989},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Commonsense knowledge about object properties, human behavior and general<br>concepts is crucial for robust AI applications. However, automatic acquisition<br>of this knowledge is challenging because of sparseness and bias in online<br>sources. This paper presents Quasimodo, a methodology and tool suite for<br>distilling commonsense properties from non-standard web sources. We devise<br>novel ways of tapping into search-engine query logs and QA forums, and<br>combining the resulting candidate assertions with statistical cues from<br>encyclopedias, books and image tags in a corroboration step. Unlike prior work<br>on commonsense knowledge bases, Quasimodo focuses on salient properties that<br>are typically associated with certain objects or concepts. Extensive<br>evaluations, including extrinsic use-case studies, show that Quasimodo provides<br>better coverage than state-of-the-art baselines with comparable quality.<br>},
}

Endnote

%0 Report
%A Romero, Julien
%A Razniewski, Simon
%A Pal, Koninika
%A Pan, Jeff Z.
%A Sakhadeo, Archit
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Commonsense Properties from Query Logs and Question Answering Forums : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-FEEE-4
%U http://arxiv.org/abs/1905.10989
%D 2019
%X   Commonsense knowledge about object properties, human behavior and general<br>concepts is crucial for robust AI applications. However, automatic acquisition<br>of this knowledge is challenging because of sparseness and bias in online<br>sources. This paper presents Quasimodo, a methodology and tool suite for<br>distilling commonsense properties from non-standard web sources. We devise<br>novel ways of tapping into search-engine query logs and QA forums, and<br>combining the resulting candidate assertions with statistical cues from<br>encyclopedias, books and image tags in a corroboration step. Unlike prior work<br>on commonsense knowledge bases, Quasimodo focuses on salient properties that<br>are typically associated with certain objects or concepts. Extensive<br>evaluations, including extrinsic use-case studies, show that Quasimodo provides<br>better coverage than state-of-the-art baselines with comparable quality.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB

Conference paper

J. Romero, S. Razniewski, K. Pal, J. Z. Pan, A. Sakhadeo, and G. Weikum

“Commonsense Properties from Query Logs and Question Answering Forums,” in CIKM ’19, 28th ACM International Conference on Information and Knowledge Management, Beijing China, 2019.

mehr

BibTeX

@inproceedings{Romero_CIKM2019,
TITLE = {Commonsense Properties from Query Logs and Question Answering Forums},
AUTHOR = {Romero, Julien and Razniewski, Simon and Pal, Koninika and Pan, Jeff Z. and Sakhadeo, Archit and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {9781450369763},
DOI = {10.1145/3357384.3357955},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {CIKM '19, 28th ACM International Conference on Information and Knowledge Management},
EDITOR = {Zhu, Wenwu and Tao, Dacheng},
PAGES = {1411--1420},
ADDRESS = {Beijing China},
}

Endnote

%0 Conference Proceedings
%A Romero, Julien
%A Razniewski, Simon
%A Pal, Koninika
%A Pan, Jeff Z.
%A Sakhadeo, Archit
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Commonsense Properties from Query Logs and Question Answering Forums : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8255-8
%R 10.1145/3357384.3357955
%D 2019
%B 28th ACM International Conference on Information and Knowledge Management
%Z date of event: 2019-11-03 - 2019-11-07
%C Beijing China
%B CIKM '19
%E Zhu, Wenwu; Tao, Dacheng
%P 1411 - 1420
%I ACM
%@ 9781450369763

Thesis

D. Saran

“Summarizing Dynamic Graphs using MDL,” Universität des Saarlandes, Saarbrücken, 2019.

mehr

BibTeX

@mastersthesis{saran:19:dyngraphs,
TITLE = {Summarizing Dynamic Graphs using {MDL}},
AUTHOR = {Saran, Divyam},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2019},
DATE = {2019},
}

Endnote

%0 Thesis
%A Saran, Divyam
%Y Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Summarizing Dynamic Graphs using MDL : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FF10-8
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2019
%V master
%9 master

Conference paper

X. Shen, J. Suzuki, K. Inui, H. Su, D. Klakow, and S. Sekine

“Select and Attend: Towards Controllable Content Selection in Text Generation,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.

mehr

BibTeX

@inproceedings{shen2019select,
TITLE = {Select and Attend: {T}owards Controllable Content Selection in Text Generation},
AUTHOR = {Shen, Xiaoyu and Suzuki, Jun and Inui, Kentaro and Su, Hui and Klakow, Dietrich and Sekine, Satoshi},
LANGUAGE = {eng},
ISBN = {978-1-950737-90-1},
URL = {https://www.aclweb.org/anthology/D19-1054},
DOI = {10.18653/v1/D19-1054},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)},
EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun},
PAGES = {579--590},
ADDRESS = {Hong Kong, China},
}

Endnote

%0 Conference Proceedings
%A Shen, Xiaoyu
%A Suzuki, Jun
%A Inui, Kentaro
%A Su, Hui
%A Klakow, Dietrich
%A Sekine, Satoshi
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Select and Attend: Towards Controllable Content Selection in Text Generation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-13BD-E
%U https://www.aclweb.org/anthology/D19-1054
%R 10.18653/v1/D19-1054
%D 2019
%B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%Z date of event: 2019-11-03 - 2019-11-07
%C Hong Kong, China
%B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun
%P 579 - 590
%I ACM
%@ 978-1-950737-90-1

Conference paper

X. Shen, Y. Zhao, H. Su, and D. Klakow

“Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.

mehr

BibTeX

@inproceedings{shen2019improving,
TITLE = {Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator},
AUTHOR = {Shen, Xiaoyu and Zhao, Yang and Su, Hui and Klakow, Dietrich},
LANGUAGE = {eng},
ISBN = {978-1-950737-90-1},
URL = {https://www.aclweb.org/anthology/D19-1390},
DOI = {10.18653/v1/D19-1390},
PUBLISHER = {ACM},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)},
EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun},
PAGES = {3762--3773},
ADDRESS = {Hong Kong, China},
}

Endnote

%0 Conference Proceedings
%A Shen, Xiaoyu
%A Zhao, Yang
%A Su, Hui
%A Klakow, Dietrich
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-13C2-7
%U https://www.aclweb.org/anthology/D19-1390
%R 10.18653/v1/D19-1390
%D 2019
%B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%Z date of event: 2019-11-03 - 2019-11-07
%C Hong Kong, China
%B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing
%E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun
%P 3762 - 3773
%I ACM
%@ 978-1-950737-90-1

Book chapter / section

F. M. Suchanek, J. Lajus, A. Boschin, and G. Weikum

“Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases,” in Reasoning Web -- Explainable Artificial Intelligence, Berlin: Springer, 2019.

mehr

BibTeX

@incollection{Suchanek_LNCS11810,
TITLE = {Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases},
AUTHOR = {Suchanek, Fabian M. and Lajus, Jonathan and Boschin, Armand and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-030-31422-4},
DOI = {10.1007/978-3-030-31423-1_4},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2019},
BOOKTITLE = {Reasoning Web -- Explainable Artificial Intelligence},
DEBUG = {author: Kr{\"o}tzsch, Markus; author: Stpanova, Daria},
PAGES = {110--152},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11810},
}

Endnote

%0 Book Section
%A Suchanek, Fabian M.
%A Lajus, Jonathan
%A Boschin, Armand
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-8298-C
%R 10.1007/978-3-030-31423-1_4
%D 2019
%B Reasoning Web -- Explainable Artificial Intelligence
%E Kr&#246;tzsch, Markus; Stpanova, Daria
%P 110 - 152
%I Springer
%C Berlin
%@ 978-3-030-31422-4
%S Lecture Notes in Computer Science
%N 11810

Conference paper

H. Su, X. Shen, R. Zhang, F. Sun, P. Hu, C. Niu, and J. Zhou

“Improving Multi-turn Dialogue Modelling with Utterance ReWriter,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 2019.

mehr

BibTeX

@inproceedings{Su_2019,
TITLE = {Improving Multi-turn Dialogue Modelling with Utterance {ReWriter}},
AUTHOR = {Su, Hui and Shen, Xiaoyu and Zhang, Rongzhi and Sun, Fei and Hu, Pengwei and Niu, Cheng and Zhou, Jie},
LANGUAGE = {eng},
URL = {https://www.aclweb.org/anthology/P19-1003},
DOI = {10.18653/v1/P19-1003},
PUBLISHER = {ACL},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
EDITOR = {Korhonen, Anna and Traum, David and M{\`a}rquez, Llu{\'i}s},
PAGES = {22--31},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Su, Hui
%A Shen, Xiaoyu
%A Zhang, Rongzhi
%A Sun, Fei
%A Hu, Pengwei
%A Niu, Cheng
%A Zhou, Jie
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Improving Multi-turn Dialogue Modelling with Utterance ReWriter : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-6982-2
%U https://www.aclweb.org/anthology/P19-1003
%R 10.18653/v1/P19-1003
%D 2019
%B 57th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2019-07-28 - 2019-08-02
%C Florence, Italy
%B Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
%E Korhonen, Anna; Traum, David; M&#224;rquez, Llu&#237;s
%P 22 - 31
%I ACL

Paper

N. Tatti and P. Miettinen

“Boolean Matrix Factorization Meets Consecutive Ones Property,” 2019. [Online]. Available: http://arxiv.org/abs/1901.05797.

mehr

Abstract

Boolean matrix factorization is a natural and a popular technique for
summarizing binary matrices. In this paper, we study a problem of Boolean
matrix factorization where we additionally require that the factor matrices
have consecutive ones property (OBMF). A major application of this optimization
problem comes from graph visualization: standard techniques for visualizing
graphs are circular or linear layout, where nodes are ordered in circle or on a
line. A common problem with visualizing graphs is clutter due to too many
edges. The standard approach to deal with this is to bundle edges together and
represent them as ribbon. We also show that we can use OBMF for edge bundling
combined with circular or linear layout techniques.
We demonstrate that not only this problem is NP-hard but we cannot have a
polynomial-time algorithm that yields a multiplicative approximation guarantee
(unless P = NP). On the positive side, we develop a greedy algorithm where at
each step we look for the best 1-rank factorization. Since even obtaining
1-rank factorization is NP-hard, we propose an iterative algorithm where we fix
one side and and find the other, reverse the roles, and repeat. We show that
this step can be done in linear time using pq-trees. We also extend the problem
to cyclic ones property and symmetric factorizations. Our experiments show that
our algorithms find high-quality factorizations and scale well.

BibTeX

@online{Tatti_arXiv1901.05797,
TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property},
AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli},
URL = {http://arxiv.org/abs/1901.05797},
EPRINT = {1901.05797},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Boolean matrix factorization is a natural and a popular technique for<br>summarizing binary matrices. In this paper, we study a problem of Boolean<br>matrix factorization where we additionally require that the factor matrices<br>have consecutive ones property (OBMF). A major application of this optimization<br>problem comes from graph visualization: standard techniques for visualizing<br>graphs are circular or linear layout, where nodes are ordered in circle or on a<br>line. A common problem with visualizing graphs is clutter due to too many<br>edges. The standard approach to deal with this is to bundle edges together and<br>represent them as ribbon. We also show that we can use OBMF for edge bundling<br>combined with circular or linear layout techniques.<br> We demonstrate that not only this problem is NP-hard but we cannot have a<br>polynomial-time algorithm that yields a multiplicative approximation guarantee<br>(unless P = NP). On the positive side, we develop a greedy algorithm where at<br>each step we look for the best 1-rank factorization. Since even obtaining<br>1-rank factorization is NP-hard, we propose an iterative algorithm where we fix<br>one side and and find the other, reverse the roles, and repeat. We show that<br>this step can be done in linear time using pq-trees. We also extend the problem<br>to cyclic ones property and symmetric factorizations. Our experiments show that<br>our algorithms find high-quality factorizations and scale well.<br>},
}

Endnote

%0 Report
%A Tatti, Nikolaj
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Boolean Matrix Factorization Meets Consecutive Ones Property : 
%U http://hdl.handle.net/21.11116/0000-0004-02F0-A
%U http://arxiv.org/abs/1901.05797
%D 2019
%X   Boolean matrix factorization is a natural and a popular technique for<br>summarizing binary matrices. In this paper, we study a problem of Boolean<br>matrix factorization where we additionally require that the factor matrices<br>have consecutive ones property (OBMF). A major application of this optimization<br>problem comes from graph visualization: standard techniques for visualizing<br>graphs are circular or linear layout, where nodes are ordered in circle or on a<br>line. A common problem with visualizing graphs is clutter due to too many<br>edges. The standard approach to deal with this is to bundle edges together and<br>represent them as ribbon. We also show that we can use OBMF for edge bundling<br>combined with circular or linear layout techniques.<br>  We demonstrate that not only this problem is NP-hard but we cannot have a<br>polynomial-time algorithm that yields a multiplicative approximation guarantee<br>(unless P = NP). On the positive side, we develop a greedy algorithm where at<br>each step we look for the best 1-rank factorization. Since even obtaining<br>1-rank factorization is NP-hard, we propose an iterative algorithm where we fix<br>one side and and find the other, reverse the roles, and repeat. We show that<br>this step can be done in linear time using pq-trees. We also extend the problem<br>to cyclic ones property and symmetric factorizations. Our experiments show that<br>our algorithms find high-quality factorizations and scale well.<br>
%K Computer Science, Data Structures and Algorithms, cs.DS,Computer Science, Discrete Mathematics, cs.DM,Computer Science, Learning, cs.LG

Conference paper

N. Tatti and P. Miettinen

“Boolean Matrix Factorization Meets Consecutive Ones Property,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.

mehr

BibTeX

@inproceedings{Tatti_SDM2019,
TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property},
AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-61197-567-3},
DOI = {10.1137/1.9781611975673.82},
PUBLISHER = {SIAM},
YEAR = {2019},
BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)},
EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh},
PAGES = {729--737},
ADDRESS = {Calgary, Canada},
}

Endnote

%0 Conference Proceedings
%A Tatti, Nikolaj
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Boolean Matrix Factorization Meets Consecutive Ones Property : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-030A-E
%R 10.1137/1.9781611975673.82
%D 2019
%B SIAM International Conference on Data Mining
%Z date of event: 2019-05-02 - 2019-05-04
%C Calgary, Canada
%B Proceedings of the 2019 SIAM International Conference on Data Mining
%E Berger-Wolf, Tanya; Chawla, Nitesh
%P 729 - 737
%I SIAM
%@ 978-1-61197-567-3

Paper

A. Tigunova, A. Yates, P. Mirza, and G. Weikum

“Listening between the Lines: Learning Personal Attributes from Conversations,” 2019. [Online]. Available: http://arxiv.org/abs/1904.10887.

mehr

Abstract

Open-domain dialogue agents must be able to converse about many topics while
incorporating knowledge about the user into the conversation. In this work we
address the acquisition of such knowledge, for personalization in downstream
Web applications, by extracting personal attributes from conversations. This
problem is more challenging than the established task of information extraction
from scientific publications or Wikipedia articles, because dialogues often
give merely implicit cues about the speaker. We propose methods for inferring
personal attributes, such as profession, age or family status, from
conversations using deep learning. Specifically, we propose several Hidden
Attribute Models, which are neural networks leveraging attention mechanisms and
embeddings. Our methods are trained on a per-predicate basis to output rankings
of object values for a given subject-predicate combination (e.g., ranking the
doctor and nurse professions high when speakers talk about patients, emergency
rooms, etc). Experiments with various conversational texts including Reddit
discussions, movie scripts and a collection of crowdsourced personal dialogues
demonstrate the viability of our methods and their superior performance
compared to state-of-the-art baselines.

BibTeX

@online{Tigunova_arXiv1904.10887,
TITLE = {Listening between the Lines: Learning Personal Attributes from Conversations},
AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1904.10887},
EPRINT = {1904.10887},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {Open-domain dialogue agents must be able to converse about many topics while<br>incorporating knowledge about the user into the conversation. In this work we<br>address the acquisition of such knowledge, for personalization in downstream<br>Web applications, by extracting personal attributes from conversations. This<br>problem is more challenging than the established task of information extraction<br>from scientific publications or Wikipedia articles, because dialogues often<br>give merely implicit cues about the speaker. We propose methods for inferring<br>personal attributes, such as profession, age or family status, from<br>conversations using deep learning. Specifically, we propose several Hidden<br>Attribute Models, which are neural networks leveraging attention mechanisms and<br>embeddings. Our methods are trained on a per-predicate basis to output rankings<br>of object values for a given subject-predicate combination (e.g., ranking the<br>doctor and nurse professions high when speakers talk about patients, emergency<br>rooms, etc). Experiments with various conversational texts including Reddit<br>discussions, movie scripts and a collection of crowdsourced personal dialogues<br>demonstrate the viability of our methods and their superior performance<br>compared to state-of-the-art baselines.<br>},
}

Endnote

%0 Report
%A Tigunova, Anna
%A Yates, Andrew
%A Mirza, Paramita
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Listening between the Lines: Learning Personal Attributes from
  Conversations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-FE7F-2
%U http://arxiv.org/abs/1904.10887
%D 2019
%X   Open-domain dialogue agents must be able to converse about many topics while<br>incorporating knowledge about the user into the conversation. In this work we<br>address the acquisition of such knowledge, for personalization in downstream<br>Web applications, by extracting personal attributes from conversations. This<br>problem is more challenging than the established task of information extraction<br>from scientific publications or Wikipedia articles, because dialogues often<br>give merely implicit cues about the speaker. We propose methods for inferring<br>personal attributes, such as profession, age or family status, from<br>conversations using deep learning. Specifically, we propose several Hidden<br>Attribute Models, which are neural networks leveraging attention mechanisms and<br>embeddings. Our methods are trained on a per-predicate basis to output rankings<br>of object values for a given subject-predicate combination (e.g., ranking the<br>doctor and nurse professions high when speakers talk about patients, emergency<br>rooms, etc). Experiments with various conversational texts including Reddit<br>discussions, movie scripts and a collection of crowdsourced personal dialogues<br>demonstrate the viability of our methods and their superior performance<br>compared to state-of-the-art baselines.<br>
%K Computer Science, Computation and Language, cs.CL

Conference paper

A. Tigunova, A. Yates, P. Mirza, and G. Weikum

“Listening between the Lines: Learning Personal Attributes from Conversations,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.

mehr

BibTeX

@inproceedings{tigunova2019listening,
TITLE = {Listening between the Lines: {L}earning Personal Attributes from Conversations},
AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-6674-8},
DOI = {10.1145/3308558.3313498},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)},
EDITOR = {McAuley, Julian},
PAGES = {1818--1828},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Tigunova, Anna
%A Yates, Andrew
%A Mirza, Paramita
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Listening between the Lines: Learning Personal Attributes from Conversations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-1460-A
%R 10.1145/3308558.3313498
%D 2019
%B The Web Conference
%Z date of event: 2019-05-13 - 2019-05-17
%C San Francisco, CA, USA
%B Proceedings of The World Wide Web Conference
%E McAuley, Julian
%P 1818 - 1828
%I ACM
%@ 978-1-4503-6674-8

Conference paper

B. D. Trisedya, G. Weikum, J. Qi, and R. Zhang

“Neural Relation Extraction for Knowledge Base Enrichment,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 2019.

mehr

BibTeX

@inproceedings{Trisedya_ACL2019,
TITLE = {Neural Relation Extraction for Knowledge Base Enrichment},
AUTHOR = {Trisedya, Bayu Distiawan and Weikum, Gerhard and Qi, Jianzhong and Zhang, Rui},
LANGUAGE = {eng},
URL = {https://www.aclweb.org/anthology/P19-1023},
DOI = {10.18653/v1/P19-1023},
PUBLISHER = {ACL},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
EDITOR = {Korhonen, Anna and Traum, David and M{\`a}rguez, Llu{\'i}s},
PAGES = {229--240},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Trisedya, Bayu Distiawan
%A Weikum, Gerhard
%A Qi, Jianzhong
%A Zhang, Rui
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Neural Relation Extraction for Knowledge Base Enrichment : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-6B08-B
%U https://www.aclweb.org/anthology/P19-1023
%R 10.18653/v1/P19-1023
%D 2019
%B 57th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2019-07-28 - 2019-08-02
%C Florence, Italy
%B Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
%E Korhonen, Anna; Traum, David; M&#224;rguez, Llu&#237;s
%P 229 - 240
%I ACL

Conference paper

M. Unterkalmsteiner and A. Yates

“Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation,” in Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019), Essen, Germany, 2019.

mehr

BibTeX

@inproceedings{Unterkalmsteiner_NLP4RE2019,
TITLE = {Expert-sourcing Domain-specific Knowledge: {The} Case of Synonym Validation},
AUTHOR = {Unterkalmsteiner, Michael and Yates, Andrew},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-2376-8},
PUBLISHER = {CEUR-WS},
YEAR = {2019},
BOOKTITLE = {Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019)},
EDITOR = {Dalpiaz, Fabiano and Ferrari, Alessio and Franch, Xavier and Gregory, Sarah and Houdek, Frank and Palomares, Cristina},
EID = {8},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2376},
ADDRESS = {Essen, Germany},
}

Endnote

%0 Conference Proceedings
%A Unterkalmsteiner, Michael
%A Yates, Andrew
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-02AE-6
%D 2019
%B 2nd Workshop on Natural Language Processing for Requirements Engineering and NLP Tool Showcase
%Z date of event: 2019-03-18 - 2019-03-18
%C Essen, Germany
%B Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019)
%E Dalpiaz, Fabiano; Ferrari, Alessio; Franch, Xavier; Gregory, Sarah; Houdek, Frank; Palomares, Cristina
%Z sequence number: 8
%I CEUR-WS
%B CEUR Workshop Proceedings
%N 2376
%@ false
%U http://ceur-ws.org/Vol-2376/NLP4RE19_paper08.pdf

Article

M. van Leeuwen, P. Chau, J. Vreeken, D. Shahaf, and C. Faloutsos

“Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors,” ACM Transactions on Knowledge Discovery from Data, vol. 13, no. 1, 2019.

mehr

BibTeX

@article{vanLeeuwen2019,
TITLE = {Addendum to the Special Issue on Interactive Data Exploration and Analytics ({TKDD}, Vol. 12, Iss. 1): Introduction by the Guest Editors},
AUTHOR = {van Leeuwen, Matthijs and Chau, Polo and Vreeken, Jilles and Shahaf, Dafna and Faloutsos, Christos},
LANGUAGE = {eng},
ISSN = {1556-4681},
DOI = {10.1145/3298786},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2019},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {13},
NUMBER = {1},
EID = {13},
}

Endnote

%0 Journal Article
%A van Leeuwen, Matthijs
%A Chau, Polo
%A Vreeken, Jilles
%A Shahaf, Dafna
%A Faloutsos, Christos
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-FFD5-E
%R 10.1145/3298786
%7 2019
%D 2019
%J ACM Transactions on Knowledge Discovery from Data
%V 13
%N 1
%Z sequence number: 13
%I ACM
%C New York, NY
%@ false

Paper

H. Wang, N. Grgic-Hlaca, P. Lahoti, K. P. Gummadi, and A. Weller

“An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision,” 2019. [Online]. Available: https://arxiv.org/abs/1910.10255.

mehr

Abstract

The notion of individual fairness requires that similar people receive
similar treatment. However, this is hard to achieve in practice since it is
difficult to specify the appropriate similarity metric. In this work, we
attempt to learn such similarity metric from human annotated data. We gather a
new dataset of human judgments on a criminal recidivism prediction (COMPAS)
task. By assuming the human supervision obeys the principle of individual
fairness, we leverage prior work on metric learning, evaluate the performance
of several metric learning methods on our dataset, and show that the learned
metrics outperform the Euclidean and Precision metric under various criteria.
We do not provide a way to directly learn a similarity metric satisfying the
individual fairness, but to provide an empirical study on how to derive the
similarity metric from human supervisors, then future work can use this as a
tool to understand human supervision.

BibTeX

@online{DBLP:journals/corr/abs-1910-10255,
TITLE = {An Empirical Study on Learning Fairness Metrics for {COMPAS} Data with Human Supervision},
AUTHOR = {Wang, Hanchen and Grgic-Hlaca, Nina and Lahoti, Preethi and Gummadi, Krishna P. and Weller, Adrian},
LANGUAGE = {eng},
URL = {https://arxiv.org/abs/1910.10255},
EPRINT = {1910.10255},
EPRINTTYPE = {arXiv},
YEAR = {2019},
ABSTRACT = {The notion of individual fairness requires that similar people receive<br>similar treatment. However, this is hard to achieve in practice since it is<br>difficult to specify the appropriate similarity metric. In this work, we<br>attempt to learn such similarity metric from human annotated data. We gather a<br>new dataset of human judgments on a criminal recidivism prediction (COMPAS)<br>task. By assuming the human supervision obeys the principle of individual<br>fairness, we leverage prior work on metric learning, evaluate the performance<br>of several metric learning methods on our dataset, and show that the learned<br>metrics outperform the Euclidean and Precision metric under various criteria.<br>We do not provide a way to directly learn a similarity metric satisfying the<br>individual fairness, but to provide an empirical study on how to derive the<br>similarity metric from human supervisors, then future work can use this as a<br>tool to understand human supervision.<br>},
}

Endnote

%0 Report
%A Wang, Hanchen
%A Grgic-Hlaca, Nina
%A Lahoti, Preethi
%A Gummadi, Krishna P.
%A Weller, Adrian
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0007-FCD3-F
%U https://arxiv.org/abs/1910.10255
%D 2019
%X   The notion of individual fairness requires that similar people receive<br>similar treatment. However, this is hard to achieve in practice since it is<br>difficult to specify the appropriate similarity metric. In this work, we<br>attempt to learn such similarity metric from human annotated data. We gather a<br>new dataset of human judgments on a criminal recidivism prediction (COMPAS)<br>task. By assuming the human supervision obeys the principle of individual<br>fairness, we leverage prior work on metric learning, evaluate the performance<br>of several metric learning methods on our dataset, and show that the learned<br>metrics outperform the Euclidean and Precision metric under various criteria.<br>We do not provide a way to directly learn a similarity metric satisfying the<br>individual fairness, but to provide an empirical study on how to derive the<br>similarity metric from human supervisors, then future work can use this as a<br>tool to understand human supervision.<br>
%K Computer Science, Computers and Society, cs.CY,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG

Article

L. Wang, Y. Wang, G. de Melo, and G. Weikum

“Understanding Archetypes of Fake News via Fine-grained Classification,” Social Network Analysis and Mining, vol. 9, no. 1, 2019.

mehr

BibTeX

@article{Wang2019_Understanding,
TITLE = {Understanding Archetypes of Fake News via Fine-grained Classification},
AUTHOR = {Wang, Liqiang and Wang, Yafang and de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1869-5450},
DOI = {10.1007/s13278-019-0580-z},
PUBLISHER = {Springer},
ADDRESS = {Cham},
YEAR = {2019},
DATE = {2019},
JOURNAL = {Social Network Analysis and Mining},
VOLUME = {9},
NUMBER = {1},
EID = {37},
}

Endnote

%0 Journal Article
%A Wang, Liqiang
%A Wang, Yafang
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Understanding Archetypes of Fake News via Fine-grained Classification : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-789A-7
%R 10.1007/s13278-019-0580-z
%7 2019
%D 2019
%J Social Network Analysis and Mining
%V 9
%N 1
%Z sequence number: 37
%I Springer
%C Cham
%@ false
%U https://rdcu.be/cFg6h

Book chapter / section

G. Weikum, J. Hoffart, and F. Suchanek

“Knowledge Harvesting: Achievements and Challenges,” in Computing and Software Science, Berlin: Springer, 2019.

mehr

BibTeX

@incollection{Weikum_KnowHarv2019,
TITLE = {Knowledge Harvesting: Achievements and Challenges},
AUTHOR = {Weikum, Gerhard and Hoffart, Johannes and Suchanek, Fabian},
LANGUAGE = {eng},
ISBN = {978-3-319-91907-2},
DOI = {10.1007/978-3-319-91908-9_13},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {Computing and Software Science},
EDITOR = {Steffen, Bernhard and Woeginger, Gerhard},
PAGES = {217--235},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10000},
}

Endnote

%0 Book Section
%A Weikum, Gerhard
%A Hoffart, Johannes
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Knowledge Harvesting: Achievements and Challenges : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-83B1-E
%R 10.1007/978-3-319-91908-9_13
%D 2019
%B Computing and Software Science
%E Steffen, Bernhard; Woeginger, Gerhard
%P 217 - 235
%I Springer
%C Berlin
%@ 978-3-319-91907-2
%S Lecture Notes in Computer Science
%N 10000

Conference paper

A. Wisesa, F. Darari, A. Krisnadhi, W. Nutt, and S. Razniewski

“Wikidata Completeness Profiling Using ProWD,” in K-CAP’ 19, 10th International Conference on Knowledge Capture, Marina del Rey, CA, USA, 2019.

mehr

BibTeX

@inproceedings{Wisesa_K-CAP2019,
TITLE = {Wikidata Completeness Profiling Using {ProWD}},
AUTHOR = {Wisesa, Avicenna and Darari, Fariz and Krisnadhi, Adila and Nutt, Werner and Razniewski, Simon},
LANGUAGE = {eng},
ISBN = {978-1-4503-7008-0},
DOI = {10.1145/3360901.3364425},
PUBLISHER = {ACM},
YEAR = {2019},
BOOKTITLE = {K-CAP' 19, 10th International Conference on Knowledge Capture},
EDITOR = {Kejriwal, Maynak and Szekely, Pedro},
PAGES = {123--130},
ADDRESS = {Marina del Rey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Wisesa, Avicenna
%A Darari, Fariz
%A Krisnadhi, Adila
%A Nutt, Werner
%A Razniewski, Simon
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Wikidata Completeness Profiling Using ProWD : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0005-849B-7
%R 10.1145/3360901.3364425
%D 2019
%B 10th International Conference on Knowledge Capture
%Z date of event: 2019-11-19 - 2019-11-21
%C Marina del Rey, CA, USA
%B K-CAP' 19
%E Kejriwal, Maynak; Szekely, Pedro
%P 123 - 130
%I ACM
%@ 978-1-4503-7008-0

Conference paper

A. Yates and M. Unterkalmsteiner

“Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain,” in Advances in Information Retrieval (ECIR 2019), Cologne, Germany, 2019.

mehr

BibTeX

@inproceedings{Yates_ECIR2019,
TITLE = {Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain},
AUTHOR = {Yates, Andrew and Unterkalmsteiner, Michael},
LANGUAGE = {eng},
ISBN = {978-3-030-15711-1},
DOI = {10.1007/978-3-030-15712-8_28},
PUBLISHER = {Springer},
YEAR = {2019},
DATE = {2019},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2019)},
EDITOR = {Azzopardi, Leif and Stein, Benno and Fuhr, Norbert and Mayr, Philipp and Hauff, Claudia and Hiemstra, Djoerd},
PAGES = {429--442},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11437},
ADDRESS = {Cologne, Germany},
}

Endnote

%0 Conference Proceedings
%A Yates, Andrew
%A Unterkalmsteiner, Michael
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-029B-B
%R 10.1007/978-3-030-15712-8_28
%D 2019
%B 41st European Conference on IR Research
%Z date of event: 2019-04-14 - 2019-04-18
%C Cologne, Germany
%B Advances in Information Retrieval
%E Azzopardi, Leif; Stein, Benno; Fuhr, Norbert; Mayr, Philipp; Hauff, Claudia; Hiemstra, Djoerd
%P 429 - 442
%I Springer
%@ 978-3-030-15711-1
%B Lecture Notes in Computer Science
%N 11437

Conference paper

Y. Zhao, X. Shen, W. Bi, and A. Aizawa

“Unsupervised Rewriter for Multi-Sentence Compression,” in The 57th Annual Meeting of theAssociation for Computational Linguistics (ACL 2019), Florence, Italy, 2019.

mehr

BibTeX

@inproceedings{zhao2019unsupervised,
TITLE = {Unsupervised Rewriter for Multi-Sentence Compression},
AUTHOR = {Zhao, Yang and Shen, Xiaoyu and Bi, Wei and Aizawa, Akiko},
LANGUAGE = {eng},
ISBN = {978-1-950737-48-2},
URL = {https://www.aclweb.org/anthology/P19-1216},
DOI = {10.18653/v1/P19-1216},
PUBLISHER = {ACL},
YEAR = {2019},
BOOKTITLE = {The 57th Annual Meeting of theAssociation for Computational Linguistics (ACL 2019)},
EDITOR = {Korhonen, Anna and Traum, David and M{\`a}rquez, Llu{\'i}s},
PAGES = {2235--2240},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Zhao, Yang
%A Shen, Xiaoyu
%A Bi, Wei
%A Aizawa, Akiko
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Unsupervised Rewriter for Multi-Sentence Compression : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0008-14AB-1
%R 10.18653/v1/P19-1216
%U https://www.aclweb.org/anthology/P19-1216
%D 2019
%B The 57th Annual Meeting of theAssociation for Computational Linguistics
%Z date of event: 2019-07-28 - 2019-08-02
%C Florence, Italy
%B The 57th Annual Meeting of theAssociation for Computational Linguistics
%E Korhonen, Anna; Traum, David; M&#224;rquez, Llu&#237;s
%P 2235 - 2240
%I ACL
%@ 978-1-950737-48-2

2018

Conference paper

A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum

“Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases,” in Proceedings of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{AbujabalWWW_2018,
TITLE = {Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases},
AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5639-8},
DOI = {10.1145/3178876.3186004},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel and Lalmas, Mounia and Ipeirotis, Panagiotis G.},
PAGES = {1053--1062},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Yahya, Mohamed
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-3C91-8
%R 10.1145/3178876.3186004
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Proceedings of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel; Lalmas, Mounia; Ipeirotis, Panagiotis G.
%P 1053 - 1062
%I ACM
%@ 978-1-4503-5639-8

Paper

A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum

“ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,” 2018. [Online]. Available: http://arxiv.org/abs/1809.09528.

mehr

Abstract

To bridge the gap between the capabilities of the state-of-the-art in factoid
question answering (QA) and what real users ask, we need large datasets of real
user questions that capture the various question phenomena users are interested
in, and the diverse ways in which these questions are formulated. We introduce
ComQA, a large dataset of real user questions that exhibit different
challenging aspects such as temporal reasoning, compositionality, etc. ComQA
questions come from the WikiAnswers community QA platform. Through a large
crowdsourcing effort, we clean the question dataset, group questions into
paraphrase clusters, and annotate clusters with their answers. ComQA contains
11,214 questions grouped into 4,834 paraphrase clusters. We detail the process
of constructing ComQA, including the measures taken to ensure its high quality
while making effective use of crowdsourcing. We also present an extensive
analysis of the dataset and the results achieved by state-of-the-art systems on
ComQA, demonstrating that our dataset can be a driver of future research on QA.

BibTeX

@online{Abujabal_arXiv1809.09528,
TITLE = {{ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters},
AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1809.09528},
EPRINT = {1809.09528},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {To bridge the gap between the capabilities of the state-of-the-art in factoid<br>question answering (QA) and what real users ask, we need large datasets of real<br>user questions that capture the various question phenomena users are interested<br>in, and the diverse ways in which these questions are formulated. We introduce<br>ComQA, a large dataset of real user questions that exhibit different<br>challenging aspects such as temporal reasoning, compositionality, etc. ComQA<br>questions come from the WikiAnswers community QA platform. Through a large<br>crowdsourcing effort, we clean the question dataset, group questions into<br>paraphrase clusters, and annotate clusters with their answers. ComQA contains<br>11,214 questions grouped into 4,834 paraphrase clusters. We detail the process<br>of constructing ComQA, including the measures taken to ensure its high quality<br>while making effective use of crowdsourcing. We also present an extensive<br>analysis of the dataset and the results achieved by state-of-the-art systems on<br>ComQA, demonstrating that our dataset can be a driver of future research on QA.<br>},
}

Endnote

%0 Report
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Yahya, Mohamed
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ComQA: A Community-sourced Dataset for Complex Factoid Question
  Answering with Paraphrase Clusters : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A0FE-B
%U http://arxiv.org/abs/1809.09528
%D 2018
%X   To bridge the gap between the capabilities of the state-of-the-art in factoid<br>question answering (QA) and what real users ask, we need large datasets of real<br>user questions that capture the various question phenomena users are interested<br>in, and the diverse ways in which these questions are formulated. We introduce<br>ComQA, a large dataset of real user questions that exhibit different<br>challenging aspects such as temporal reasoning, compositionality, etc. ComQA<br>questions come from the WikiAnswers community QA platform. Through a large<br>crowdsourcing effort, we clean the question dataset, group questions into<br>paraphrase clusters, and annotate clusters with their answers. ComQA contains<br>11,214 questions grouped into 4,834 paraphrase clusters. We detail the process<br>of constructing ComQA, including the measures taken to ensure its high quality<br>while making effective use of crowdsourcing. We also present an extensive<br>analysis of the dataset and the results achieved by state-of-the-art systems on<br>ComQA, demonstrating that our dataset can be a driver of future research on QA.<br>
%K Computer Science, Computation and Language, cs.CL

Conference paper

P. Agarwal, J. Strötgen, L. Del Corro, J. Hoffart, and G. Weikum

“diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora,” in The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018.

mehr

BibTeX

@inproceedings{AgrawalACL2018a,
TITLE = {{diaNED}: {T}ime-Aware Named Entity Disambiguation for Diachronic Corpora},
AUTHOR = {Agarwal, Prabal and Str{\"o}tgen, Jannik and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-948087-34-6},
URL = {https://aclanthology.coli.uni-saarland.de/volumes/proceedings-of-the-56th-annual-meeting-of-the-association-for-computational-linguistics-volume-2-short-papers},
PUBLISHER = {ACL},
YEAR = {2018},
BOOKTITLE = {The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)},
EDITOR = {Gurevych, Iryna and Miyao, Yusuke},
PAGES = {686--693},
EID = {602},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Agarwal, Prabal
%A Str&#246;tgen, Jannik
%A Del Corro, Luciano
%A Hoffart, Johannes
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-9055-C
%D 2018
%B The 56th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2018-07-15 - 2018-07-20
%C Melbourne, Australia
%B The 56th Annual Meeting of the Association for Computational Linguistics
%E Gurevych, Iryna; Miyao, Yusuke
%P 686 - 693
%Z sequence number: 602
%I ACL
%@ 978-1-948087-34-6
%U http://aclweb.org/anthology/P18-2109

Conference paper

M. Antenore, G. Leone, A. Panconesi, and E. Terolli

“Together We Buy, Alone I Quit: Some Experimental Studies of Online Persuaders,” in DTUC’18 Digital Tools & Uses Congres, Paris, France, 2018.

mehr

BibTeX

@inproceedings{Antenore:2018:TWB:3240117.3240119,
TITLE = {Together We Buy, Alone {I} Quit: {S}ome Experimental Studies of Online Persuaders},
AUTHOR = {Antenore, Marzia and Leone, Giovanna and Panconesi, Alessandro and Terolli, Erisa},
LANGUAGE = {eng},
ISBN = {978-1-4503-6451-5},
DOI = {10.1145/3240117.3240119},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {DTUC'18 Digital Tools \& Uses Congres},
EDITOR = {Reyes, E. and Szoniecky, S. and Mkadmi, A. and Kembellec, G. and Fournier-S'niehotta, R. and Siala-Kallel, F. and Ammi, M. and Labelle, S.},
EID = {2},
ADDRESS = {Paris, France},
}

Endnote

%0 Conference Proceedings
%A Antenore, Marzia
%A Leone, Giovanna
%A Panconesi, Alessandro
%A Terolli, Erisa
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Together We Buy, Alone I Quit: Some Experimental Studies of Online Persuaders : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A89D-0
%R 10.1145/3240117.3240119
%D 2018
%B First International Digital Tools & Uses Congress 
%Z date of event: 2018-10-03 - 2018-10-05
%C Paris, France
%B DTUC'18 Digital Tools & Uses Congres
%E Reyes, E.; Szoniecky, S.; Mkadmi, A.; Kembellec, G.; Fournier-S'niehotta, R.; Siala-Kallel, F.; Ammi, M.; Labelle, S.
%Z sequence number: 2
%I ACM
%@ 978-1-4503-6451-5

Conference paper

O. Balalau, C. Castillo, and M. Sozio

“EviDense: A Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions,” in Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018), Stanford, CA, USA, 2018.

mehr

BibTeX

@inproceedings{Balalau_ICWSM2018,
TITLE = {{EviDense}: {A} Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions},
AUTHOR = {Balalau, Oana and Castillo, Carlos and Sozio, Mauro},
LANGUAGE = {eng},
ISBN = {978-1-57735-798-8},
PUBLISHER = {AAAI},
YEAR = {2018},
BOOKTITLE = {Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018)},
PAGES = {560--563},
ADDRESS = {Stanford, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Balalau, Oana
%A Castillo, Carlos
%A Sozio, Mauro
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T EviDense: A Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9CE8-9
%D 2018
%B 12th International AAAI Conference on Web and Social Media
%Z date of event: 2018-06-25 - 2018-06-28
%C Stanford, CA, USA
%B Proceedings of the Twelfth International AAAI Conference on Web and Social Media
%P 560 - 563
%I AAAI
%@ 978-1-57735-798-8
%U https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17889

Conference paper

V. Balaraman, S. Razniewski, and W. Nutt

“Recoin: Relative Completeness in Wikidata,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{BalaramanWWW2017,
TITLE = {Recoin: {R}elative Completeness in {W}ikidata},
AUTHOR = {Balaraman, Vevake and Razniewski, Simon and Nutt, Werner},
LANGUAGE = {eng},
ISBN = {978-1-4503-5640-4},
DOI = {10.1145/3184558.3191641},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel},
PAGES = {1787--1792},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Balaraman, Vevake
%A Razniewski, Simon
%A Nutt, Werner
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Recoin: Relative Completeness in Wikidata : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-414A-3
%R 10.1145/3184558.3191641
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Companion of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel
%P 1787 - 1792
%I ACM
%@ 978-1-4503-5640-4

Conference paper

A. J. Biega, K. P. Gummadi, and G. Weikum

“Equity of Attention: Amortizing Individual Fairness in Rankings,” in SIGIR’18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 2018.

mehr

BibTeX

@inproceedings{BiegaSIGIR2018,
TITLE = {Equity of Attention: {A}mortizing Individual Fairness in Rankings},
AUTHOR = {Biega, Asia J. and Gummadi, Krishna P. and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5022-8},
DOI = {10.1145/3209978.3210063},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {SIGIR'18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {405--414},
ADDRESS = {Ann Arbor, MI, USA},
}

Endnote

%0 Conference Proceedings
%A Biega, Asia J.
%A Gummadi, Krishna P.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Equity of Attention: Amortizing Individual Fairness in Rankings : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-0D8A-5
%R 10.1145/3209978.3210063
%D 2018
%B 41st International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2018-07-08 - 2018-07-12
%C Ann Arbor, MI, USA
%B SIGIR'18
%P 405 - 414
%I ACM
%@ 978-1-4503-5022-8

Paper

A. J. Biega, K. P. Gummadi, and G. Weikum

“Equity of Attention: Amortizing Individual Fairness in Rankings,” 2018. [Online]. Available: http://arxiv.org/abs/1805.01788.

mehr

Abstract

Rankings of people and items are at the heart of selection-making,

match-making, and recommender systems, ranging from employment sites to sharing

economy platforms. As ranking positions influence the amount of attention the

ranked subjects receive, biases in rankings can lead to unfair distribution of

opportunities and resources, such as jobs or income.

This paper proposes new measures and mechanisms to quantify and mitigate

unfairness from a bias inherent to all rankings, namely, the position bias,

which leads to disproportionately less attention being paid to low-ranked

subjects. Our approach differs from recent fair ranking approaches in two

important ways. First, existing works measure unfairness at the level of

subject groups while our measures capture unfairness at the level of individual

subjects, and as such subsume group unfairness. Second, as no single ranking

can achieve individual attention fairness, we propose a novel mechanism that

achieves amortized fairness, where attention accumulated across a series of

rankings is proportional to accumulated relevance.

We formulate the challenge of achieving amortized individual fairness subject

to constraints on ranking quality as an online optimization problem and show

that it can be solved as an integer linear program. Our experimental evaluation

reveals that unfair attention distribution in rankings can be substantial, and

demonstrates that our method can improve individual fairness while retaining

high ranking quality.

BibTeX

@online{Biega_arXiv1805.01788,
TITLE = {Equity of Attention: Amortizing Individual Fairness in Rankings},
AUTHOR = {Biega, Asia J. and Gummadi, Krishna P. and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1805.01788},
EPRINT = {1805.01788},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality.},
}

Endnote

%0 Report
%A Biega, Asia J.
%A Gummadi, Krishna P.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Equity of Attention: Amortizing Individual Fairness in Rankings : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-1563-7
%U http://arxiv.org/abs/1805.01788
%D 2018
%X   Rankings of people and items are at the heart of selection-making,
match-making, and recommender systems, ranging from employment sites to sharing
economy platforms. As ranking positions influence the amount of attention the
ranked subjects receive, biases in rankings can lead to unfair distribution of
opportunities and resources, such as jobs or income.
  This paper proposes new measures and mechanisms to quantify and mitigate
unfairness from a bias inherent to all rankings, namely, the position bias,
which leads to disproportionately less attention being paid to low-ranked
subjects. Our approach differs from recent fair ranking approaches in two
important ways. First, existing works measure unfairness at the level of
subject groups while our measures capture unfairness at the level of individual
subjects, and as such subsume group unfairness. Second, as no single ranking
can achieve individual attention fairness, we propose a novel mechanism that
achieves amortized fairness, where attention accumulated across a series of
rankings is proportional to accumulated relevance.
  We formulate the challenge of achieving amortized individual fairness subject
to constraints on ranking quality as an online optimization problem and show
that it can be solved as an integer linear program. Our experimental evaluation
reveals that unfair attention distribution in rankings can be substantial, and
demonstrates that our method can improve individual fairness while retaining
high ranking quality.

%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computers and Society, cs.CY

Article

N. Boldyrev, M. Spaniol, and G. Weikum

“Multi-Cultural Interlinking of Web Taxonomies with ACROSS,” The Journal of Web Science, vol. 4, no. 2, 2018.

mehr

BibTeX

@article{Boldyrev2018,
TITLE = {Multi-Cultural Interlinking of Web Taxonomies with {ACROSS}},
AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1561/106.00000012},
PUBLISHER = {Now Publishers},
ADDRESS = {Boston},
YEAR = {2018},
JOURNAL = {The Journal of Web Science},
VOLUME = {4},
NUMBER = {2},
PAGES = {20--33},
}

Endnote

%0 Journal Article
%A Boldyrev, Natalia
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Multi-Cultural Interlinking of Web Taxonomies with ACROSS : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-3CA4-3
%R 10.1561/106.00000012 
%7 2018
%D 2018
%J The Journal of Web Science
%O Web Science
%V 4
%N 2
%& 20
%P 20 - 33
%I Now Publishers
%C Boston

Conference paper

K. Budhathoki, M. Boley, and J. Vreeken

“Rule Discovery for Exploratory Causal Reasoning,” in Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018), Montréal, Canada, 2018.

mehr

BibTeX

@inproceedings{budhathoki:18:dice,
TITLE = {Rule Discovery for Exploratory Causal Reasoning},
AUTHOR = {Budhathoki, Kailash and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {https://drive.google.com/file/d/1r-KTsok3VLIz-wUh0YtsK5YaEu53DcTf/view},
YEAR = {2018},
BOOKTITLE = {Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018)},
EID = {14},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Rule Discovery for Exploratory Causal Reasoning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9EBC-9
%U https://drive.google.com/file/d/1r-KTsok3VLIz-wUh0YtsK5YaEu53DcTf/view
%D 2018
%B NeurIPS 2018 Workshop on Causal Learning
%Z date of event: 2018-12-07 - 2018-12-07
%C Montr&#233;al, Canada
%B Proceedings of the NeurIPS 2018 workshop on Causal Learning 
%Z sequence number: 14

Article

K. Budhathoki and J. Vreeken

“Origo: Causal Inference by Compression,” Knowledge and Information Systems, vol. 56, no. 2, 2018.

mehr

BibTeX

@article{Budhathoki2018,
TITLE = {Origo: {C}ausal Inference by Compression},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {0219-1377},
DOI = {10.1007/s10115-017-1130-5},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2018},
DATE = {2018},
JOURNAL = {Knowledge and Information Systems},
VOLUME = {56},
NUMBER = {2},
PAGES = {285--307},
}

Endnote

%0 Journal Article
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Origo: Causal Inference by Compression : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-AF2B-B
%R 10.1007/s10115-017-1130-5
%7 2018
%D 2018
%J Knowledge and Information Systems
%V 56
%N 2
%& 285
%P 285 - 307
%I Springer
%C New York, NY
%@ false

Conference paper

K. Budhathoki and J. Vreeken

“Causal Inference on Event Sequences,” in Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018), San Diego, CA, USA, 2018.

mehr

BibTeX

@inproceedings{budhathoki_SDM2018,
TITLE = {Causal Inference on Event Sequences},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-532-1},
DOI = {10.1137/1.9781611975321.7},
PUBLISHER = {SIAM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018)},
EDITOR = {Ester, Martin and Pedreschi, Dino},
PAGES = {55--63},
ADDRESS = {San Diego, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Inference on Event Sequences : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5F34-A
%R 10.1137/1.9781611975321.7
%D 2018
%B SIAM International Conference on Data Mining
%Z date of event: 2018-05-03 - 2018-05-05
%C San Diego, CA, USA
%B Proceedings of the 2018 SIAM International Conference on Data Mining
%E Ester, Martin; Pedreschi, Dino
%P 55 - 63
%I SIAM
%@ 978-1-61197-532-1

Conference paper

K. Budhathoki and J. Vreeken

“Accurate Causal Inference on Discrete Data,” in IEEE International Conference on Data Mining (ICDM 2018), Singapore, Singapore, 2018.

mehr

BibTeX

@inproceedings{budhathoki:18:acid,
TITLE = {Accurate Causal Inference on Discrete Data},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-5386-9159-5},
DOI = {10.1109/ICDM.2018.00105},
PUBLISHER = {IEEE},
YEAR = {2018},
BOOKTITLE = {IEEE International Conference on Data Mining (ICDM 2018)},
PAGES = {881--886},
ADDRESS = {Singapore, Singapore},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Accurate Causal Inference on Discrete Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9E96-3
%R 10.1109/ICDM.2018.00105
%D 2018
%B IEEE International Conference on Data Mining
%Z date of event: 2018-11-17 - 2018-11-20
%C Singapore, Singapore
%B IEEE International Conference on Data Mining 
%P 881 - 886
%I IEEE
%@ 978-1-5386-9159-5

Paper

A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney, and N. Goharian

“SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions,” 2018. [Online]. Available: http://arxiv.org/abs/1806.05258.

mehr

Abstract

Mental health is a significant and growing public health concern. As language
usage can be leveraged to obtain crucial insights into mental health
conditions, there is a need for large-scale, labeled, mental health-related
datasets of users who have been diagnosed with one or more of such conditions.
In this paper, we investigate the creation of high-precision patterns to
identify self-reported diagnoses of nine different mental health conditions,
and obtain high-quality labeled data without the need for manual labelling. We
introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it
available. SMHD is a novel large dataset of social media posts from users with
one or multiple mental health conditions along with matched control users. We
examine distinctions in users' language, as measured by linguistic and
psychological variables. We further explore text classification methods to
identify individuals with mental conditions through their language.

BibTeX

@online{cohan_arXiv1806.05258,
TITLE = {{SMHD}: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions},
AUTHOR = {Cohan, Arman and Desmet, Bart and Yates, Andrew and Soldaini, Luca and MacAvaney, Sean and Goharian, Nazli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1806.05258},
EPRINT = {1806.05258},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Mental health is a significant and growing public health concern. As language<br>usage can be leveraged to obtain crucial insights into mental health<br>conditions, there is a need for large-scale, labeled, mental health-related<br>datasets of users who have been diagnosed with one or more of such conditions.<br>In this paper, we investigate the creation of high-precision patterns to<br>identify self-reported diagnoses of nine different mental health conditions,<br>and obtain high-quality labeled data without the need for manual labelling. We<br>introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it<br>available. SMHD is a novel large dataset of social media posts from users with<br>one or multiple mental health conditions along with matched control users. We<br>examine distinctions in users' language, as measured by linguistic and<br>psychological variables. We further explore text classification methods to<br>identify individuals with mental conditions through their language.<br>},
}

Endnote

%0 Report
%A Cohan, Arman
%A Desmet, Bart
%A Yates, Andrew
%A Soldaini, Luca
%A MacAvaney, Sean
%A Goharian, Nazli
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T SMHD: A Large-Scale Resource for Exploring Online Language Usage for
  Multiple Mental Health Conditions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5ED4-6
%U http://arxiv.org/abs/1806.05258
%D 2018
%X   Mental health is a significant and growing public health concern. As language<br>usage can be leveraged to obtain crucial insights into mental health<br>conditions, there is a need for large-scale, labeled, mental health-related<br>datasets of users who have been diagnosed with one or more of such conditions.<br>In this paper, we investigate the creation of high-precision patterns to<br>identify self-reported diagnoses of nine different mental health conditions,<br>and obtain high-quality labeled data without the need for manual labelling. We<br>introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it<br>available. SMHD is a novel large dataset of social media posts from users with<br>one or multiple mental health conditions along with matched control users. We<br>examine distinctions in users' language, as measured by linguistic and<br>psychological variables. We further explore text classification methods to<br>identify individuals with mental conditions through their language.<br>
%K Computer Science, Computation and Language, cs.CL

Conference paper

A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney, and N. Goharian

“SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions,” in The 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, NM, USA, 2018.

mehr

BibTeX

@inproceedings{Cohan_COLING2018,
TITLE = {{SMHD}: {A} Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions},
AUTHOR = {Cohan, Arman and Desmet, Bart and Yates, Andrew and Soldaini, Luca and MacAvaney, Sean and Goharian, Nazli},
LANGUAGE = {eng},
ISBN = {978-1-948087-50-6},
URL = {http://aclweb.org/anthology/C18-1126},
PUBLISHER = {ACL},
YEAR = {2018},
BOOKTITLE = {The 27th International Conference on Computational Linguistics (COLING 2018)},
EDITOR = {Bender, Emily M. and Derczynski, Leon and Isabelle, Pierre},
PAGES = {1485--1497},
ADDRESS = {Santa Fe, NM, USA},
}

Endnote

%0 Conference Proceedings
%A Cohan, Arman
%A Desmet, Bart
%A Yates, Andrew
%A Soldaini, Luca
%A MacAvaney, Sean
%A Goharian, Nazli
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E91-1
%U http://aclweb.org/anthology/C18-1126
%D 2018
%B 27th International Conference on Computational Linguistics 
%Z date of event: 2018-08-20 - 2018-08-26
%C Santa Fe, NM, USA
%B The 27th International Conference on Computational Linguistics
%E Bender, Emily M.; Derczynski, Leon; Isabelle, Pierre
%P 1485 - 1497
%I ACL
%@ 978-1-948087-50-6

Conference paper

M. Danisch, O. Balalau, and M. Sozio

“Listing k-cliques in Sparse Real-World Graphs,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{Danisch_WWW2018,
TITLE = {Listing k-cliques in Sparse Real-World Graphs},
AUTHOR = {Danisch, Maximilien and Balalau, Oana and Sozio, Mauro},
LANGUAGE = {eng},
ISBN = {978-1-4503-5640-4},
DOI = {10.1145/3178876.3186125},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel},
PAGES = {589--598},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Danisch, Maximilien
%A Balalau, Oana
%A Sozio, Mauro
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Listing k-cliques in Sparse Real-World Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9CDE-5
%R 10.1145/3178876.3186125
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Companion of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel
%P 589 - 598
%I ACM
%@ 978-1-4503-5640-4

Article

F. Darari, W. Nutt, G. Pirrò, and S. Razniewski

“Completeness Management for RDF Data Sources,” ACM Transactions on the Web, vol. 12, no. 3, 2018.

mehr

BibTeX

@article{Darari2018,
TITLE = {Completeness Management for {RDF} Data Sources},
AUTHOR = {Darari, Fariz and Nutt, Werner and Pirr{\`o}, Giuseppe and Razniewski, Simon},
LANGUAGE = {eng},
DOI = {10.1145/3196248},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2018},
DATE = {2018},
JOURNAL = {ACM Transactions on the Web},
VOLUME = {12},
NUMBER = {3},
EID = {18},
}

Endnote

%0 Journal Article
%A Darari, Fariz
%A Nutt, Werner
%A Pirr&#242;, Giuseppe
%A Razniewski, Simon
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Completeness Management for RDF Data Sources : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-E17F-3
%R 10.1145/3196248
%7 2018
%D 2018
%J ACM Transactions on the Web
%V 12
%N 3
%Z sequence number: 18
%I ACM
%C New York, NY

Conference paper

F. Darari, W. Nutt, and S. Razniewski

“Comparing Index Structures for Completeness Reasoning,” in IWBIS 2018, International Workshop on Big Data and Information Security, Jakarta, Indonesia, 2018.

mehr

BibTeX

@inproceedings{DarariIWBIS2018,
TITLE = {Comparing Index Structures for Completeness Reasoning},
AUTHOR = {Darari, Fariz and Nutt, Werner and Razniewski, Simon},
LANGUAGE = {eng},
ISBN = {978-1-5386-5525-2},
DOI = {10.1109/IWBIS.2018.8471712},
PUBLISHER = {IEEE},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {IWBIS 2018, International Workshop on Big Data and Information Security},
PAGES = {49--56},
ADDRESS = {Jakarta, Indonesia},
}

Endnote

%0 Conference Proceedings
%A Darari, Fariz
%A Nutt, Werner
%A Razniewski, Simon
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Comparing Index Structures for Completeness Reasoning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-E193-A
%R 10.1109/IWBIS.2018.8471712
%D 2018
%B International Workshop on Big Data and Information Security
%Z date of event: 2018-05-12 - 2018-05-13
%C Jakarta, Indonesia
%B IWBIS 2018
%P 49 - 56
%I IEEE
%@ 978-1-5386-5525-2

Conference paper

S. Degaetano-Ortlieb and J. Strötgen

“Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy,” in Language Technologies for the Challenges of the Digital Age (GSCL 2017), Berlin, Germany, 2018.

mehr

BibTeX

@inproceedings{DegaetanoortliebStroetgen2017,
TITLE = {Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy},
AUTHOR = {Degaetano-Ortlieb, Stefania and Str{\"o}tgen, Jannik},
LANGUAGE = {eng},
ISBN = {978-3-319-73705-8},
DOI = {10.1007/978-3-319-73706-5_22},
PUBLISHER = {Springer},
YEAR = {2017},
DATE = {2018},
BOOKTITLE = {Language Technologies for the Challenges of the Digital Age (GSCL 2017)},
EDITOR = {Rehm, Georg and Declerck, Thierry},
PAGES = {259--275},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {10713},
ADDRESS = {Berlin, Germany},
}

Endnote

%0 Conference Proceedings
%A Degaetano-Ortlieb, Stefania
%A Str&#246;tgen, Jannik
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Diachronic Variation of Temporal Expressions in Scientific
Writing through the Lens of Relative Entropy : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-A8E8-5
%R 10.1007/978-3-319-73706-5_22
%D 2018
%B Conference of the German Society for Computational Linguistics and Language Technology
%Z date of event: 2017-09-13 - 2017-09-14
%C Berlin, Germany
%B Language Technologies for the Challenges of the Digital Age
%E Rehm, Georg; Declerck, Thierry
%P 259 - 275
%I Springer
%@ 978-3-319-73705-8
%B Lecture Notes in Artificial Intelligence
%N 10713

Thesis

D5IMPR-CS

P. Ernst

“Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems,” Universität des Saarlandes, Saarbrücken, 2018.

mehr

Abstract

While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.

BibTeX

@phdthesis{Ernstphd2017,
TITLE = {Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems},
AUTHOR = {Ernst, Patrick},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-ds-271051},
DOI = {10.22028/D291-27105},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2018},
ABSTRACT = {While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: -- To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. -- To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. -- To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.},
}

Endnote

%0 Thesis
%A Ernst, Patrick
%Y Weikum, Gerhard
%A referee: Verspoor, Karin
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems	 : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-1864-4
%U urn:nbn:de:bsz:291-scidok-ds-271051
%R 10.22028/D291-27105 
%F OTHER: hdl:20.500.11880/26987
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2018
%8 20.02.2018
%P 147 p.
%V phd
%9 phd
%X While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26987

Conference paper

P. Ernst, A. Siu, and G. Weikum

“HighLife: Higher-arity Fact Harvesting,” in Proceedings of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{ErnstlWWW_2018,
TITLE = {{HighLife}: Higher-arity Fact Harvesting},
AUTHOR = {Ernst, Patrick and Siu, Amy and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5639-8},
DOI = {10.1145/3178876.3186000},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel and Lalmas, Mounia and Ipeirotis, Panagiotis G.},
PAGES = {1013--1022},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Ernst, Patrick
%A Siu, Amy
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T HighLife: Higher-arity Fact Harvesting : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-3C96-3
%R 10.1145/3178876.3186000
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Proceedings of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel; Lalmas, Mounia; Ipeirotis, Panagiotis G.
%P 1013 - 1022
%I ACM
%@ 978-1-4503-5639-8

Article

A. K. Fischer, J. Vreeken, and D. Klakov

“Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL,” Computación y Sistemas, vol. 21, no. 4, 2018.

mehr

BibTeX

@article{Fischer2018,
TITLE = {Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by {MDL}},
AUTHOR = {Fischer, Andrea K. and Vreeken, Jilles and Klakov, Dietrich},
LANGUAGE = {eng},
DOI = {10.13053/CyS-21-4-2865},
PUBLISHER = {Instituto Polit{\'e}cnico Nacional},
ADDRESS = {M{\'e}xico},
YEAR = {2018},
JOURNAL = {Computaci{\'o}n y Sistemas},
VOLUME = {21},
NUMBER = {4},
PAGES = {829--839},
}

Endnote

%0 Journal Article
%A Fischer, Andrea K.
%A Vreeken, Jilles
%A Klakov, Dietrich
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Beyond Pairwise Similarity: Quantifying and Characterizing
Linguistic Similarity between Groups of Languages by MDL : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-4156-5
%R 10.13053/CyS-21-4-2865
%7 2018
%D 2018
%J Computaci&#243;n y Sistemas
%V 21
%N 4
%& 829
%P 829 - 839
%I Instituto Polit&#233;cnico Nacional
%C M&#233;xico
%U http://www.redalyc.org/articulo.oa?id=61553900023

Article

E. Galbrun and P. Miettinen

“Mining Redescriptions with Siren,” ACM Transactions on Knowledge Discovery from Data, vol. 12, no. 1, 2018.

mehr

BibTeX

@article{galbrun17mining,
TITLE = {Mining Redescriptions with {Siren}},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
DOI = {10.1145/3007212},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2018},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {12},
NUMBER = {1},
EID = {6},
}

Endnote

%0 Journal Article
%A Galbrun, Esther
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mining Redescriptions with Siren : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-227B-F
%R 10.1145/3007212
%7 2018
%D 2018
%J ACM Transactions on Knowledge Discovery from Data
%V 12
%N 1
%Z sequence number: 6
%I ACM
%C New York, NY

Conference paper

E. Gius, N. Reiter, J. Strötgen, and M. Willand

“SANTA: Systematische Analyse Narrativer Texte durch Annotation,” in DHd 2018, 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V., Köln, Germany, 2018.

mehr

BibTeX

@inproceedings{GiusDHd2018,
TITLE = {{{SANTA}: {Systematische Analyse Narrativer Texte durch Annotation}}},
AUTHOR = {Gius, Evelyn and Reiter, Nils and Str{\"o}tgen, Jannik and Willand, Marcus},
LANGUAGE = {deu},
ISBN = {978-3-946275-02-2},
URL = {http://dhd2018.uni-koeln.de/},
YEAR = {2018},
BOOKTITLE = {DHd 2018, 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.},
PAGES = {302--305},
ADDRESS = {K{\"o}ln, Germany},
}

Endnote

%0 Conference Proceedings
%A Gius, Evelyn
%A Reiter, Nils
%A Str&#246;tgen, Jannik
%A Willand, Marcus
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T SANTA: Systematische Analyse Narrativer Texte durch Annotation : 
%G deu
%U http://hdl.handle.net/11858/00-001M-0000-002E-73EC-4
%D 2018
%B 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.
%Z date of event: 2018-02-26 - 2018-03-02
%C K&#246;ln, Germany
%B DHd 2018
%P 302 - 305
%@ 978-3-946275-02-2

Conference paper

D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti

“Generating Semantic Aspects for Queries,” in JCDL’18, Joint Conference on Digital Libraries, Fort Worth, TX, USA, 2018.

mehr

BibTeX

@inproceedings{GuptaJCDL2018,
TITLE = {Generating Semantic Aspects for Queries},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios},
LANGUAGE = {eng},
ISBN = {978-1-4503-5178-2},
DOI = {10.1145/3197026.3203900},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {JCDL'18, Joint Conference on Digital Libraries},
PAGES = {335--336},
ADDRESS = {Fort Worth, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%A Str&#246;tgen, Jannik
%A Zeinalipour-Yazti, Demetrios
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Generating Semantic Aspects for Queries : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-904D-6
%R 10.1145/3197026.3203900
%D 2018
%B Joint Conference on Digital Libraries
%Z date of event: 2018-06-03 - 2018-06-07
%C Fort Worth, TX, USA
%B JCDL'18
%P 335 - 336
%I ACM
%@  978-1-4503-5178-2

Conference paper

D. Gupta and K. Berberich

“Identifying Time Intervals for Knowledge Graph Facts,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{GuptaWWW2017,
TITLE = {Identifying Time Intervals for Knowledge Graph Facts},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-5640-4},
DOI = {10.1145/3184558.3186917},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel},
PAGES = {37--38},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Identifying Time Intervals for Knowledge Graph Facts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-411F-4
%R 10.1145/3184558.3186917
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Companion of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel
%P 37 - 38
%I ACM
%@ 978-1-4503-5640-4

Conference paper

D. Gupta and K. Berberich

“GYANI: An Indexing Infrastructure for Knowledge-Centric Tasks,” in CIKM’18, 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 2018.

mehr

BibTeX

@inproceedings{Gupta_CIKM2018,
TITLE = {{GYANI}: {A}n Indexing Infrastructure for Knowledge-Centric Tasks},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-6014-2},
DOI = {10.1145/3269206.3271745},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {CIKM'18, 27th ACM International Conference on Information and Knowledge Management},
EDITOR = {Cuzzocrea, Alfredo and Allan, James and Paton, Norman and Srivastava, Divesh and Agrawal, Rakesh and Broder, Andrei and Zaki, Mohamed and Candan, Selcuk and Labrinidis, Alexandros and Schuster, Assaf and Wang, Haixun},
PAGES = {487--496},
ADDRESS = {Torino, Italy},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T GYANI: An Indexing Infrastructure for Knowledge-Centric Tasks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A8B7-2
%R 10.1145/3269206.3271745
%D 2018
%B 27th ACM International Conference on Information and Knowledge Management
%Z date of event: 2018-10-22 - 2018-10-26
%C Torino, Italy
%B CIKM'18
%E Cuzzocrea, Alfredo; Allan, James; Paton, Norman; Srivastava, Divesh; Agrawal, Rakesh; Broder, Andrei; Zaki, Mohamed; Candan, Selcuk; Labrinidis, Alexandros; Schuster, Assaf; Wang, Haixun
%P 487 - 496
%I ACM
%@ 978-1-4503-6014-2

Article

D2BIOD5

A. Horňáková, M. List, J. Vreeken, and M. H. Schulz

“JAMI: Fast Computation of Conditional Mutual Information for ceRNA Network Analysis,” Bioinformatics, vol. 34, no. 17, 2018.

mehr

BibTeX

@article{Hornakova_Bioinformatics2018,
TITLE = {{JAMI}: {F}ast Computation of Conditional Mutual Information for {ceRNA} Network Analysis},
AUTHOR = {Hor{\v n}{\'a}kov{\'a}, Andrea and List, Markus and Vreeken, Jilles and Schulz, Marcel H.},
LANGUAGE = {eng},
ISSN = {1367-4803},
DOI = {10.1093/bioinformatics/bty221},
PUBLISHER = {Oxford University Press},
ADDRESS = {Oxford},
YEAR = {2018},
DATE = {2018},
JOURNAL = {Bioinformatics},
VOLUME = {34},
NUMBER = {17},
PAGES = {3050--3051},
}

Endnote

%0 Journal Article
%A Hor&#328;&#225;kov&#225;, Andrea
%A List, Markus
%A Vreeken, Jilles
%A Schulz, Marcel H.
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society
%T JAMI: Fast Computation of Conditional Mutual Information for ceRNA Network Analysis : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-573A-C
%R 10.1093/bioinformatics/bty221
%7 2018
%D 2018
%J Bioinformatics
%V 34
%N 17
%& 3050
%P 3050 - 3051
%I Oxford University Press
%C Oxford
%@ false

Thesis

D5IMPR-CS

V. T. Ho

“An Embedding-based Approach to Rule Learning from Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2018.

mehr

Abstract

Knowledge Graphs (KGs) play an important role in various information systems and
have application in many ﬁelds such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts.

Rules over a Knowledge Graph capture interpretable patterns in data and various
methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as conﬁdence reﬂect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated.
Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external
information sources including text corpora. The contributions of this thesis are as follows:

• We introduce a framework for rule learning guided by external sources.

• We propose a concrete instantiation of our framework to show how to learn high-
quality rules by utilizing feedback from a pretrained embedding model.

• We conducted experiments on real-world KGs that demonstrate the eﬀectiveness
of our novel approach with respect to both the quality of the learned rules and fact
predictions that they produce.

BibTeX

@mastersthesis{HoMaster2018,
TITLE = {An Embedding-based Approach to Rule Learning from Knowledge Graphs},
AUTHOR = {Ho, Vinh Thinh},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2018},
DATE = {2018},
ABSTRACT = {Knowledge Graphs (KGs) play an important role in various information systems and<br>have application in many {fi}elds such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts.<br><br>Rules over a Knowledge Graph capture interpretable patterns in data and various<br>methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as con{fi}dence re{fl}ect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated.<br>Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external <br>information sources including text corpora. The contributions of this thesis are as follows:<br><br>\mbox{$\bullet$} We introduce a framework for rule learning guided by external sources.<br><br>\mbox{$\bullet$} We propose a concrete instantiation of our framework to show how to learn high-<br>quality rules by utilizing feedback from a pretrained embedding model.<br><br>\mbox{$\bullet$} We conducted experiments on real-world KGs that demonstrate the e&#64256;ectiveness<br>of our novel approach with respect to both the quality of the learned rules and fact<br>predictions that they produce.},
}

Endnote

%0 Thesis
%A Ho, Vinh Thinh
%A referee: Weikum, Gerhard
%Y Stepanova, Daria
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T An Embedding-based Approach to Rule Learning from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-DE06-F
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2018
%P 60
%V master
%9 master
%X Knowledge Graphs (KGs) play an important role in various information systems and<br>have application in many &#64257;elds such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts.<br><br>Rules over a Knowledge Graph capture interpretable patterns in data and various<br>methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as con&#64257;dence re&#64258;ect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated.<br>Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external <br>information sources including text corpora. The contributions of this thesis are as follows:<br><br>&#8226; We introduce a framework for rule learning guided by external sources.<br><br>&#8226; We propose a concrete instantiation of our framework to show how to learn high-<br>quality rules by utilizing feedback from a pretrained embedding model.<br><br>&#8226; We conducted experiments on real-world KGs that demonstrate the e&#64256;ectiveness<br>of our novel approach with respect to both the quality of the learned rules and fact<br>predictions that they produce.

Conference paper

V. T. Ho, D. Stepanova, M. H. Gad-Elrab, E. Kharlamov, and G. Weikum

“Rule Learning from Knowledge Graphs Guided by Embedding Models,” in The Semantic Web -- ISWC 2018, Monterey, CA, USA, 2018.

mehr

BibTeX

@inproceedings{StepanovaISWC2018,
TITLE = {Rule Learning from Knowledge Graphs Guided by Embedding Models},
AUTHOR = {Ho, Vinh Thinh and Stepanova, Daria and Gad-Elrab, Mohamed Hassan and Kharlamov, Evgeny and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-030-00670-9},
DOI = {10.1007/978-3-030-00671-6_5},
PUBLISHER = {Springer},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {The Semantic Web -- ISWC 2018},
EDITOR = {Vrande{\v c}i{\'c}, Denny and Bontcheva, Kalina and Su{\'a}rez-Figueroa, Mari Carmen and Presutti, Valentina and Celino, Irene and Sabou, Marta and Kaffee, Lucie-Aim{\'e}e and Simperl, Elena},
PAGES = {72--90},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11136},
ADDRESS = {Monterey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Ho, Vinh Thinh
%A Stepanova, Daria
%A Gad-Elrab, Mohamed Hassan
%A Kharlamov, Evgeny
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Rule Learning from Knowledge Graphs Guided by Embedding Models : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-9058-9
%R 10.1007/978-3-030-00671-6_5
%D 2018
%B The 17th International Semantic Web Conference 
%Z date of event: 2018-10-08 - 2018-10-12
%C Monterey, CA, USA
%B The Semantic Web  -- ISWC 2018
%E Vrande&#269;i&#263;, Denny; Bontcheva, Kalina; Su&#225;rez-Figueroa, Mari Carmen; Presutti, Valentina; Celino, Irene; Sabou, Marta; Kaffee, Lucie-Aim&#233;e; Simperl, Elena
%P 72 - 90
%I Springer
%@ 978-3-030-00670-9
%B Lecture Notes in Computer Science
%N 11136

Conference paper

V. T. Ho, D. Stepanova, M. H. Gad-Elrab, E. Kharlamov, and G. Weikum

“Learning Rules from Incomplete KGs using Embeddings,” in ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks (ISWC-P&D-Industry-BlueSky 2018), Monterey, CA, USA, 2018.

mehr

BibTeX

@inproceedings{StepanovaISWC2018b,
TITLE = {Learning Rules from Incomplete {KGs} using Embeddings},
AUTHOR = {Ho, Vinh Thinh and Stepanova, Daria and Gad-Elrab, Mohamed Hassan and Kharlamov, Evgeny and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://ceur-ws.org/Vol-2180/paper-25.pdf; urn:nbn:de:0074-2180-3},
PUBLISHER = {ceur.ws.org},
YEAR = {2018},
BOOKTITLE = {ISWC 2018 Posters \& Demonstrations, Industry and Blue Sky Ideas Tracks (ISWC-P\&D-Industry-BlueSky 2018)},
EDITOR = {van Erp, Marieke and Atre, Medha and Lopez, Vanessa and Srinivas, Kavitha and Fortuna, Carolina},
EID = {25},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2180},
ADDRESS = {Monterey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Ho, Vinh Thinh
%A Stepanova, Daria
%A Gad-Elrab, Mohamed Hassan
%A Kharlamov, Evgeny
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Learning Rules from Incomplete KGs using Embeddings : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-905B-6
%U http://ceur-ws.org/Vol-2180/paper-25.pdf
%D 2018
%B The 17th International Semantic Web Conference 
%Z date of event: 2018-10-08 - 2018-10-12
%C Monterey, CA, USA
%B ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks
%E van Erp, Marieke; Atre, Medha; Lopez, Vanessa; Srinivas, Kavitha; Fortuna, Carolina
%Z sequence number: 25
%I ceur.ws.org
%B CEUR Workshop Proceedings
%N 2180

Conference paper

K. Hui, A. Yates, K. Berberich, and G. de Melo

“Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.

mehr

BibTeX

@inproceedings{Hui_WSDM2018,
TITLE = {Co-{PACRR}: {A} Context-Aware Neural {IR} Model for Ad-hoc Retrieval},
AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5581-0},
DOI = {10.1145/3159652.3159689},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining},
PAGES = {279--287},
ADDRESS = {Marina Del Rey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Yates, Andrew
%A Berberich, Klaus
%A de Melo, Gerard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-6367-D
%R 10.1145/3159652.3159689
%D 2018
%B 11th ACM International Conference on Web Search and Data Mining
%Z date of event: 2018-02-05 - 2018-02-09
%C Marina Del Rey, CA, USA
%B WSDM'18
%P 279 - 287
%I ACM
%@ 978-1-4503-5581-0

Thesis

M. Humble

“Redescription Mining on Financial Time Series Data,” Universität des Saarlandes, Saarbrücken, 2018.

mehr

BibTeX

@mastersthesis{Humble_BSc2017,
TITLE = {Redescription Mining on Financial Time Series Data},
AUTHOR = {Humble, Megan},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2018},
DATE = {2018},
TYPE = {Bachelor's thesis},
}

Endnote

%0 Thesis
%A Humble, Megan
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Redescription Mining on Financial Time Series Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-F042-4
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2018
%P XV, 100 p.
%V bachelor
%9 bachelor

Conference paper

H. Jhavar and P. Mirza

“EMOFIEL: Mapping Emotions of Relationships in a Story,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{JhavarWWW2018,
TITLE = {{EMOFIEL}: {M}apping Emotions of Relationships in a Story},
AUTHOR = {Jhavar, Harshita and Mirza, Paramita},
LANGUAGE = {eng},
ISBN = {978-1-4503-5640-4},
DOI = {10.1145/3184558.3186989},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel},
PAGES = {243--246},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Jhavar, Harshita
%A Mirza, Paramita
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T EMOFIEL: Mapping Emotions of Relationships in a Story : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-4B96-2
%R 10.1145/3184558.3186989
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Companion of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel
%P 243 - 246
%I ACM
%@ 978-1-4503-5640-4

Conference paper

Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum

“TempQuestions: A Benchmark for Temporal Question Answering,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{JiaWWW2017,
TITLE = {{TempQuestions}: {A} Benchmark for Temporal Question Answering},
AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5640-4},
DOI = {10.1145/3184558.3191536},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel},
PAGES = {1057--1062},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Jia, Zhen
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TempQuestions: A Benchmark for Temporal Question Answering : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-3C80-B
%R 10.1145/3184558.3191536
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Companion of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel
%P 1057 - 1062
%I ACM
%@  978-1-4503-5640-4

Conference paper

Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum

“TEQUILA: Temporal Question Answering over Knowledge Bases,” in CIKM’18, 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 2018.

mehr

BibTeX

@inproceedings{Jia_CIKM2018,
TITLE = {{TEQUILA}: {T}emporal Question Answering over Knowledge Bases},
AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-6014-2},
DOI = {10.1145/3269206.3269247},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {CIKM'18, 27th ACM International Conference on Information and Knowledge Management},
EDITOR = {Cuzzocrea, Alfredo and Allan, James and Paton, Norman and Srivastava, Divesh and Agrawal, Rakesh and Broder, Andrei and Zaki, Mohamed and Candan, Selcuk and Labrinidis, Alexandros and Schuster, Assaf and Wang, Haixun},
PAGES = {1807--1810},
ADDRESS = {Torino, Italy},
}

Endnote

%0 Conference Proceedings
%A Jia, Zhen
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TEQUILA: Temporal Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A106-1
%R 10.1145/3269206.3269247
%D 2018
%B 27th ACM International Conference on Information and Knowledge Management
%Z date of event: 2018-10-22 - 2018-10-26
%C Torino, Italy
%B CIKM'18
%E Cuzzocrea, Alfredo; Allan, James; Paton, Norman; Srivastava, Divesh; Agrawal, Rakesh; Broder, Andrei; Zaki, Mohamed; Candan, Selcuk; Labrinidis, Alexandros; Schuster, Assaf; Wang, Haixun
%P 1807 - 1810
%I ACM
%@ 978-1-4503-6014-2

Article

J. Kalofolias, E. Galbrun, and P. Miettinen

“From Sets of Good Redescriptions to Good Sets of Redescriptions,” Knowledge and Information Systems, vol. 57, no. 1, 2018.

mehr

BibTeX

@article{kalofolias18from,
TITLE = {From Sets of Good Redescriptions to Good Sets of Redescriptions},
AUTHOR = {Kalofolias, Janis and Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISSN = {0219-1377},
DOI = {10.1007/s10115-017-1149-7},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2018},
DATE = {2018},
JOURNAL = {Knowledge and Information Systems},
VOLUME = {57},
NUMBER = {1},
PAGES = {21--54},
}

Endnote

%0 Journal Article
%A Kalofolias, Janis
%A Galbrun, Esther
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T From Sets of Good Redescriptions to Good Sets of Redescriptions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-90D1-5
%R 10.1007/s10115-017-1149-7
%7 2018-01-19
%D 2018
%J Knowledge and Information Systems
%V 57
%N 1
%& 21
%P 21 - 54
%I Springer
%C New York, NY
%@ false

Conference paper

S. Karaev, J. Hook, and P. Miettinen

“Latitude: A Model for Mixed Linear-Tropical Matrix Factorization,” in Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018), San Diego, CA, USA, 2018.

mehr

BibTeX

@inproceedings{Karaev_SDM2018,
TITLE = {Latitude: A Model for Mixed Linear-Tropical Matrix Factorization},
AUTHOR = {Karaev, Sanjar and Hook, James and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-61197-532-1},
DOI = {10.1137/1.9781611975321.41},
PUBLISHER = {SIAM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018)},
EDITOR = {Ester, Martin and Pedreschi, Dino},
PAGES = {360--368},
ADDRESS = {San Diego, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Karaev, Sanjar
%A Hook, James
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Latitude: A Model for Mixed Linear-Tropical Matrix Factorization : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E2D-4
%R 10.1137/1.9781611975321.41
%D 2018
%B SIAM International Conference on Data Mining
%Z date of event: 2018-05-03 - 2018-05-05
%C San Diego, CA, USA
%B Proceedings of the 2018 SIAM International Conference on Data Mining
%E Ester, Martin; Pedreschi, Dino
%P 360 - 368
%I SIAM
%@ 978-1-61197-532-1

Conference paper

S. Karaev, S. Metzler, and P. Miettinen

“Logistic-Tropical Decompositions and Nested Subgraphs,” in Proceedings of the 14th International Workshop on Mining and Learning with Graphs (MLG 2018), London, UK, 2018.

mehr

BibTeX

@inproceedings{Karaev_MLG2018,
TITLE = {Logistic-Tropical Decompositions and Nested Subgraphs},
AUTHOR = {Karaev, Sanjar and Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
PUBLISHER = {MLG Workshop},
YEAR = {2018},
BOOKTITLE = {Proceedings of the 14th International Workshop on Mining and Learning with Graphs (MLG 2018)},
EID = {35},
ADDRESS = {London, UK},
}

Endnote

%0 Conference Proceedings
%A Karaev, Sanjar
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Logistic-Tropical Decompositions and Nested Subgraphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A91F-E
%D 2018
%B 14th International Workshop on Mining and Learning with Graphs
%Z date of event: 2018-08-20 - 2018-08-20
%C London, UK
%B Proceedings of the 14th International Workshop on Mining and Learning with Graphs
%Z sequence number: 35
%I MLG Workshop
%U http://www.mlgworkshop.org/2018/papers/MLG2018_paper_35.pdf

Paper

S. Karaev, J. Hook, and P. Miettinen

“Latitude: A Model for Mixed Linear-Tropical Matrix Factorization,” 2018. [Online]. Available: http://arxiv.org/abs/1801.06136.

mehr

Abstract

Nonnegative matrix factorization (NMF) is one of the most frequently-used

matrix factorization models in data analysis. A significant reason to the

popularity of NMF is its interpretability and the `parts of whole'

interpretation of its components. Recently, max-times, or subtropical, matrix

factorization (SMF) has been introduced as an alternative model with equally

interpretable `winner takes it all' interpretation. In this paper we propose a

new mixed linear--tropical model, and a new algorithm, called Latitude, that

combines NMF and SMF, being able to smoothly alternate between the two. In our

model, the data is modeled using the latent factors and latent parameters that

control whether the factors are interpreted as NMF or SMF features, or their

mixtures. We present an algorithm for our novel matrix factorization. Our

experiments show that our algorithm improves over both baselines, and can yield

interpretable results that reveal more of the latent structure than either NMF

or SMF alone.

BibTeX

@online{Karaev2018,
TITLE = {Latitude: A Model for Mixed Linear-Tropical Matrix Factorization},
AUTHOR = {Karaev, Sanjar and Hook, James and Miettinen, Pauli},
URL = {http://arxiv.org/abs/1801.06136},
EPRINT = {1801.06136},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.},
}

Endnote

%0 Report
%A Karaev, Sanjar
%A Hook, James
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Latitude: A Model for Mixed Linear-Tropical Matrix Factorization : 
%U http://hdl.handle.net/21.11116/0000-0000-636B-9
%U http://arxiv.org/abs/1801.06136
%D 2018
%X   Nonnegative matrix factorization (NMF) is one of the most frequently-used
matrix factorization models in data analysis. A significant reason to the
popularity of NMF is its interpretability and the `parts of whole'
interpretation of its components. Recently, max-times, or subtropical, matrix
factorization (SMF) has been introduced as an alternative model with equally
interpretable `winner takes it all' interpretation. In this paper we propose a
new mixed linear--tropical model, and a new algorithm, called Latitude, that
combines NMF and SMF, being able to smoothly alternate between the two. In our
model, the data is modeled using the latent factors and latent parameters that
control whether the factors are interpreted as NMF or SMF features, or their
mixtures. We present an algorithm for our novel matrix factorization. Our
experiments show that our algorithm improves over both baselines, and can yield
interpretable results that reveal more of the latent structure than either NMF
or SMF alone.

%K Computer Science, Learning, cs.LG

Paper

P. Lahoti, G. Weikum, and K. P. Gummadi

“iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making,” 2018. [Online]. Available: http://arxiv.org/abs/1806.01059.

mehr

Abstract

People are rated and ranked, towards algorithmic decision making in an

increasing number of applications, typically based on machine learning.

Research on how to incorporate fairness into such tasks has prevalently pursued

the paradigm of group fairness: ensuring that each ethnic or social group

receives its fair share in the outcome of classifiers and rankings. In

contrast, the alternative paradigm of individual fairness has received

relatively little attention. This paper introduces a method for

probabilistically clustering user records into a low-rank representation that

captures individual fairness yet also achieves high accuracy in classification

and regression models. Our notion of individual fairness requires that users

who are similar in all task-relevant attributes such as job qualification, and

disregarding all potentially discriminating attributes such as gender, should

have similar outcomes. Since the case for fairness is ubiquitous across many

tasks, we aim to learn general representations that can be applied to arbitrary

downstream use-cases. We demonstrate the versatility of our method by applying

it to classification and learning-to-rank tasks on two real-world datasets. Our

experiments show substantial improvements over the best prior work for this

setting.

BibTeX

@online{Lahoti_arXiv1806.01059,
TITLE = {{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making},
AUTHOR = {Lahoti, Preethi and Weikum, Gerhard and Gummadi, Krishna P.},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1806.01059},
EPRINT = {1806.01059},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.},
}

Endnote

%0 Report
%A Lahoti, Preethi
%A Weikum, Gerhard
%A Gummadi, Krishna P.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T iFair: Learning Individually Fair Data Representations for Algorithmic
  Decision Making : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-1545-9
%U http://arxiv.org/abs/1806.01059
%D 2018
%X   People are rated and ranked, towards algorithmic decision making in an
increasing number of applications, typically based on machine learning.
Research on how to incorporate fairness into such tasks has prevalently pursued
the paradigm of group fairness: ensuring that each ethnic or social group
receives its fair share in the outcome of classifiers and rankings. In
contrast, the alternative paradigm of individual fairness has received
relatively little attention. This paper introduces a method for
probabilistically clustering user records into a low-rank representation that
captures individual fairness yet also achieves high accuracy in classification
and regression models. Our notion of individual fairness requires that users
who are similar in all task-relevant attributes such as job qualification, and
disregarding all potentially discriminating attributes such as gender, should
have similar outcomes. Since the case for fairness is ubiquitous across many
tasks, we aim to learn general representations that can be applied to arbitrary
downstream use-cases. We demonstrate the versatility of our method by applying
it to classification and learning-to-rank tasks on two real-world datasets. Our
experiments show substantial improvements over the best prior work for this
setting.

%K Computer Science, Learning, cs.LG,Computer Science, Information Retrieval, cs.IR,Statistics, Machine Learning, stat.ML

Conference paper

P. Lahoti, K. Garimella, and A. Gionis

“Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.

mehr

BibTeX

@inproceedings{Lahoti_WSDM2018,
TITLE = {Joint Non-negative Matrix Factorization for Learning Ideological Leaning on {T}witter},
AUTHOR = {Lahoti, Preethi and Garimella, Kiran and Gionis, Aristides},
LANGUAGE = {eng},
ISBN = {978-1-4503-5581-0},
DOI = {10.1145/3159652.3159669},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining},
PAGES = {351--359},
ADDRESS = {Marina Del Rey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Lahoti, Preethi
%A Garimella, Kiran
%A Gionis, Aristides
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9C4F-7
%R 10.1145/3159652.3159669
%D 2018
%B 11th ACM International Conference on Web Search and Data Mining
%Z date of event: 2018-02-05 - 2018-02-09
%C Marina Del Rey, CA, USA
%B WSDM'18
%P 351 - 359
%I ACM
%@ 978-1-4503-5581-0

Conference paper

C. Li, Y. Sun, B. He, L. Wang, K. Hui, A. Yates, L. Sun, and J. Xu

“NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.

mehr

BibTeX

@inproceedings{DBLP:conf/emnlp/LiSHWHYSX18,
TITLE = {{NPRF}: {A} Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval},
AUTHOR = {Li, Canjia and Sun, Yingfei and He, Ben and Wang, Le and Hui, Kai and Yates, Andrew and Sun, Le and Xu, Jungang},
LANGUAGE = {eng},
ISBN = {978-1-948087-84-1},
URL = {https://aclanthology.info/papers/D18-1478/d18-1478},
PUBLISHER = {ACL},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)},
EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii},
PAGES = {4482--4491},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Li, Canjia
%A Sun, Yingfei
%A He, Ben
%A Wang, Le
%A Hui, Kai
%A Yates, Andrew
%A Sun, Le
%A Xu, Jungang
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-11BB-7
%U https://aclanthology.info/papers/D18-1478/d18-1478
%D 2018
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2018-10-31 - 2018-11-04
%C Brussels, Belgium
%B The Conference on Empirical Methods in Natural Language Processing

%E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii
%P 4482 - 4491
%I ACL
%@ 978-1-948087-84-1

Conference paper

S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder

“Characterizing Question Facets for Complex Answer Retrieval,” in SIGIR’18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 2018.

mehr

BibTeX

@inproceedings{MacAvaney_SIGIR2018,
TITLE = {Characterizing Question Facets for Complex Answer Retrieval},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir},
LANGUAGE = {eng},
ISBN = {978-1-4503-5657-2},
DOI = {10.1145/3209978.3210135},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {SIGIR'18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {1205--1208},
ADDRESS = {Ann Arbor, MI, USA},
}

Endnote

%0 Conference Proceedings
%A MacAvaney, Sean
%A Yates, Andrew
%A Cohan, Arman
%A Soldaini, Luca
%A Hui, Kai
%A Goharian, Nazli
%A Frieder, Ophir
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Characterizing Question Facets for Complex Answer Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5ECA-2
%R 10.1145/3209978.3210135
%D 2018
%B 41st International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2018-07-08 - 2018-07-12
%C Ann Arbor, MI, USA
%B SIGIR'18
%P 1205 - 1208
%I ACM
%@ 978-1-4503-5657-2

Paper

S. MacAvaney, B. Desmet, A. Cohan, L. Soldaini, A. Yates, A. Zirikly, and N. Goharian

“RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses,” 2018. [Online]. Available: http://arxiv.org/abs/1806.07916.

mehr

Abstract

Self-reported diagnosis statements have been widely employed in studying
language related to mental health in social media. However, existing research
has largely ignored the temporality of mental health diagnoses. In this work,
we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported
depression diagnosis posts from Reddit that include temporal information about
the diagnosis. Annotations include whether a mental health condition is present
and how recently the diagnosis happened. Furthermore, we include exact temporal
spans that relate to the date of diagnosis. This information is valuable for
various computational methods to examine mental health through social media
because one's mental health state is not static. We also test several baseline
classification and extraction approaches, which suggest that extracting
temporal information from self-reported diagnosis statements is challenging.

BibTeX

@online{MacAveray_arXiv1806.07916,
TITLE = {{RSDD}-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses},
AUTHOR = {MacAvaney, Sean and Desmet, Bart and Cohan, Arman and Soldaini, Luca and Yates, Andrew and Zirikly, Ayah and Goharian, Nazli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1806.07916},
EPRINT = {1806.07916},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Self-reported diagnosis statements have been widely employed in studying<br>language related to mental health in social media. However, existing research<br>has largely ignored the temporality of mental health diagnoses. In this work,<br>we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported<br>depression diagnosis posts from Reddit that include temporal information about<br>the diagnosis. Annotations include whether a mental health condition is present<br>and how recently the diagnosis happened. Furthermore, we include exact temporal<br>spans that relate to the date of diagnosis. This information is valuable for<br>various computational methods to examine mental health through social media<br>because one's mental health state is not static. We also test several baseline<br>classification and extraction approaches, which suggest that extracting<br>temporal information from self-reported diagnosis statements is challenging.<br>},
}

Endnote

%0 Report
%A MacAvaney, Sean
%A Desmet, Bart
%A Cohan, Arman
%A Soldaini, Luca
%A Yates, Andrew
%A Zirikly, Ayah
%A Goharian, Nazli
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5ED9-1
%U http://arxiv.org/abs/1806.07916
%D 2018
%X   Self-reported diagnosis statements have been widely employed in studying<br>language related to mental health in social media. However, existing research<br>has largely ignored the temporality of mental health diagnoses. In this work,<br>we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported<br>depression diagnosis posts from Reddit that include temporal information about<br>the diagnosis. Annotations include whether a mental health condition is present<br>and how recently the diagnosis happened. Furthermore, we include exact temporal<br>spans that relate to the date of diagnosis. This information is valuable for<br>various computational methods to examine mental health through social media<br>because one's mental health state is not static. We also test several baseline<br>classification and extraction approaches, which suggest that extracting<br>temporal information from self-reported diagnosis statements is challenging.<br>
%K Computer Science, Computation and Language, cs.CL

Conference paper

S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder

“Overcoming Low-Utility Facets for Complex Answer Retrieval,” in SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH (ProfS-KG4IR-Data:Search 2018), Ann Arbor, MI, USA, 2018.

mehr

BibTeX

@inproceedings{MacAvaney_KG4IR2018,
TITLE = {Overcoming Low-Utility Facets for Complex Answer Retrieval},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir},
LANGUAGE = {eng},
URL = {http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf; urn:nbn:de:0074-2127-8},
PUBLISHER = {ceur.ws.org},
YEAR = {2018},
BOOKTITLE = {SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH (ProfS-KG4IR-Data:Search 2018)},
EDITOR = {Dietz, Laura and Koetzen, Laura and Verberne, Suzan},
PAGES = {46--47},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2127},
ADDRESS = {Ann Arbor, MI, USA},
}

Endnote

%0 Conference Proceedings
%A MacAvaney, Sean
%A Yates, Andrew
%A Cohan, Arman
%A Soldaini, Luca
%A Hui, Kai
%A Goharian, Nazli
%A Frieder, Ophir
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Overcoming Low-Utility Facets for Complex Answer Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E9C-6
%U http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf
%D 2018
%B Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding
%Z date of event: 2018-07-12 - 2018-07-12
%C Ann Arbor, MI, USA
%B SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH
%E Dietz, Laura; Koetzen, Laura; Verberne, Suzan
%P 46 - 47
%I ceur.ws.org
%B CEUR Workshop Proceedings
%N 2127
%U http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf

Conference paper

S. MacAvaney, B. Desmet, A. Cohan, L. Soldaini, A. Yates, A. Zirikly, and N. Goharian

“RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses,” in Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2018), New Orleans, LA, USA, 2018.

mehr

BibTeX

@inproceedings{MacAvaney_NAACL_HLT2018,
TITLE = {{RSDD}-Time: {T}emporal Annotation of Self-Reported Mental Health Diagnoses},
AUTHOR = {MacAvaney, Sean and Desmet, Bart and Cohan, Arman and Soldaini, Luca and Yates, Andrew and Zirikly, Ayah and Goharian, Nazli},
LANGUAGE = {eng},
ISBN = {978-1-948087-12-4},
URL = {http://aclweb.org/anthology/W18-0618},
DOI = {10.18653/v1/W18-0618},
PUBLISHER = {ACL},
YEAR = {2018},
BOOKTITLE = {Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2018)},
EDITOR = {Loveys, Kate and Niederhoffer, Kate and Prud'hommeaux, Emily and Resnik, Rebecca and Resnik, Philip},
PAGES = {168--173},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A MacAvaney, Sean
%A Desmet, Bart
%A Cohan, Arman
%A Soldaini, Luca
%A Yates, Andrew
%A Zirikly, Ayah
%A Goharian, Nazli
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T RSDD-Time: Temporal Annotation of Self-Reported Mental Health
Diagnoses : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E8C-8
%U http://aclweb.org/anthology/W18-0618
%R 10.18653/v1/W18-0618 
%D 2018
%B Fifth Workshop on Computational Linguistics and Clinical Psychology
%Z date of event: 2018-06-05 - 2018-06-05
%C New Orleans, LA, USA
%B Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology
%E Loveys, Kate; Niederhoffer, Kate; Prud'hommeaux, Emily; Resnik, Rebecca; Resnik, Philip
%P 168 - 173
%I ACL
%@ 978-1-948087-12-4
%U https://aclanthology.info/papers/W18-0618/w18-0618

Paper

S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder

“Characterizing Question Facets for Complex Answer Retrieval,” 2018. [Online]. Available: http://arxiv.org/abs/1805.00791.

mehr

Abstract

Complex answer retrieval (CAR) is the process of retrieving answers to
questions that have multifaceted or nuanced answers. In this work, we present
two novel approaches for CAR based on the observation that question facets can
vary in utility: from structural (facets that can apply to many similar topics,
such as 'History') to topical (facets that are specific to the question's
topic, such as the 'Westward expansion' of the United States). We first explore
a way to incorporate facet utility into ranking models during query term score
combination. We then explore a general approach to reform the structure of
ranking models to aid in learning of facet utility in the query-document term
matching phase. When we use our techniques with a leading neural ranker on the
TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and
yield up to 26% higher performance than the next best method.

BibTeX

@online{MacAvernay_arXIv1805.00791,
TITLE = {Characterizing Question Facets for Complex Answer Retrieval},
AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1805.00791},
EPRINT = {1805.00791},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Complex answer retrieval (CAR) is the process of retrieving answers to<br>questions that have multifaceted or nuanced answers. In this work, we present<br>two novel approaches for CAR based on the observation that question facets can<br>vary in utility: from structural (facets that can apply to many similar topics,<br>such as 'History') to topical (facets that are specific to the question's<br>topic, such as the 'Westward expansion' of the United States). We first explore<br>a way to incorporate facet utility into ranking models during query term score<br>combination. We then explore a general approach to reform the structure of<br>ranking models to aid in learning of facet utility in the query-document term<br>matching phase. When we use our techniques with a leading neural ranker on the<br>TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and<br>yield up to 26% higher performance than the next best method.<br>},
}

Endnote

%0 Report
%A MacAvaney, Sean
%A Yates, Andrew
%A Cohan, Arman
%A Soldaini, Luca
%A Hui, Kai
%A Goharian, Nazli
%A Frieder, Ophir
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Characterizing Question Facets for Complex Answer Retrieval : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5ECE-E
%U http://arxiv.org/abs/1805.00791
%D 2018
%X   Complex answer retrieval (CAR) is the process of retrieving answers to<br>questions that have multifaceted or nuanced answers. In this work, we present<br>two novel approaches for CAR based on the observation that question facets can<br>vary in utility: from structural (facets that can apply to many similar topics,<br>such as 'History') to topical (facets that are specific to the question's<br>topic, such as the 'Westward expansion' of the United States). We first explore<br>a way to incorporate facet utility into ranking models during query term score<br>combination. We then explore a general approach to reform the structure of<br>ranking models to aid in learning of facet utility in the query-document term<br>matching phase. When we use our techniques with a leading neural ranker on the<br>TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and<br>yield up to 26% higher performance than the next best method.<br>
%K Computer Science, Information Retrieval, cs.IR

Paper

P. Mandros, M. Boley, and J. Vreeken

“Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” 2018. [Online]. Available: http://arxiv.org/abs/1809.05467.

mehr

Abstract

The reliable fraction of information is an attractive score for quantifying
(functional) dependencies in high-dimensional data. In this paper, we
systematically explore the algorithmic implications of using this measure for
optimization. We show that the problem is NP-hard, which justifies the usage of
worst-case exponential-time as well as heuristic search methods. We then
substantially improve the practical performance for both optimization styles by
deriving a novel admissible bounding function that has an unbounded potential
for additional pruning over the previously proposed one. Finally, we
empirically investigate the approximation ratio of the greedy algorithm and
show that it produces highly competitive results in a fraction of time needed
for complete branch-and-bound style search.

BibTeX

@online{Mandros_arXiv1809.05467,
TITLE = {Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1809.05467},
EPRINT = {1809.05467},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {The reliable fraction of information is an attractive score for quantifying<br>(functional) dependencies in high-dimensional data. In this paper, we<br>systematically explore the algorithmic implications of using this measure for<br>optimization. We show that the problem is NP-hard, which justifies the usage of<br>worst-case exponential-time as well as heuristic search methods. We then<br>substantially improve the practical performance for both optimization styles by<br>deriving a novel admissible bounding function that has an unbounded potential<br>for additional pruning over the previously proposed one. Finally, we<br>empirically investigate the approximation ratio of the greedy algorithm and<br>show that it produces highly competitive results in a fraction of time needed<br>for complete branch-and-bound style search.<br>},
}

Endnote

%0 Report
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Reliable Dependencies from Data: Hardness and Improved
  Algorithms : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9EC9-A
%U http://arxiv.org/abs/1809.05467
%D 2018
%X   The reliable fraction of information is an attractive score for quantifying<br>(functional) dependencies in high-dimensional data. In this paper, we<br>systematically explore the algorithmic implications of using this measure for<br>optimization. We show that the problem is NP-hard, which justifies the usage of<br>worst-case exponential-time as well as heuristic search methods. We then<br>substantially improve the practical performance for both optimization styles by<br>deriving a novel admissible bounding function that has an unbounded potential<br>for additional pruning over the previously proposed one. Finally, we<br>empirically investigate the approximation ratio of the greedy algorithm and<br>show that it produces highly competitive results in a fraction of time needed<br>for complete branch-and-bound style search.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT

Conference paper

P. Mandros, M. Boley, and J. Vreeken

“Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” in IEEE International Conference on Data Mining (ICDM 2018), Singapore, Singapore, 2018.

mehr

BibTeX

@inproceedings{mandros:18:fedora,
TITLE = {Discovering Reliable Dependencies from Data: {H}ardness and Improved Algorithms},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-5386-9159-5},
DOI = {10.1109/ICDM.2018.00047},
PUBLISHER = {IEEE},
YEAR = {2018},
BOOKTITLE = {IEEE International Conference on Data Mining (ICDM 2018)},
PAGES = {317--326},
ADDRESS = {Singapore, Singapore},
}

Endnote

%0 Conference Proceedings
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9EA2-5
%R 10.1109/ICDM.2018.00047
%D 2018
%B IEEE International Conference on Data Mining
%Z date of event: 2018-11-17 - 2018-11-20
%C Singapore, Singapore
%B IEEE International Conference on Data Mining 
%P 317 - 326
%I IEEE
%@ 978-1-5386-9159-5

Conference paper

A. Marx and J. Vreeken

“Stochastic Complexity for Testing Conditional Independence on Discrete Data,” in Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018), Montréal, Canada, 2018.

mehr

BibTeX

@inproceedings{marx:18:dice,
TITLE = {Stochastic Complexity for Testing Conditional Independence on Discrete Data},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {https://drive.google.com/file/d/1mMkO5YZ5gkBRRFbfYb4DDRCsCN243eb2/view},
YEAR = {2018},
BOOKTITLE = {Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018)},
EID = {10},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Stochastic Complexity for Testing Conditional Independence on Discrete Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-9EC2-1
%U https://drive.google.com/file/d/1mMkO5YZ5gkBRRFbfYb4DDRCsCN243eb2/view
%D 2018
%B NeurIPS 2018 Workshop on Causal Learning
%Z date of event: 2018-12-07 - 2018-12-07
%C Montr&#233;al, Canada
%B Proceedings of the NeurIPS 2018 workshop on Causal Learning 
%Z sequence number: 10

Paper

A. Marx and J. Vreeken

“Causal Discovery by Telling Apart Parents and Children,” 2018. [Online]. Available: http://arxiv.org/abs/1808.06356.

mehr

Abstract

We consider the problem of inferring the directed, causal graph from
observational data, assuming no hidden confounders. We take an information
theoretic approach, and make three main contributions.
First, we show how through algorithmic information theory we can obtain SCI,
a highly robust, effective and computationally efficient test for conditional
independence---and show it outperforms the state of the art when applied in
constraint-based inference methods such as stable PC.
Second, building upon on SCI, we show how to tell apart the parents and
children of a given node based on the algorithmic Markov condition. We give the
Climb algorithm to efficiently discover the directed, causal Markov
blanket---and show it is at least as accurate as inferring the global network,
while being much more efficient.
Last, but not least, we detail how we can use the Climb score to direct those
edges that state of the art causal discovery algorithms based on PC or GES
leave undirected---and show this improves their precision, recall and F1 scores
by up to 20%.

BibTeX

@online{Marx_arXiv1808.06356,
TITLE = {Causal Discovery by Telling Apart Parents and Children},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1808.06356},
EPRINT = {1808.06356},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {We consider the problem of inferring the directed, causal graph from<br>observational data, assuming no hidden confounders. We take an information<br>theoretic approach, and make three main contributions.<br> First, we show how through algorithmic information theory we can obtain SCI,<br>a highly robust, effective and computationally efficient test for conditional<br>independence---and show it outperforms the state of the art when applied in<br>constraint-based inference methods such as stable PC.<br> Second, building upon on SCI, we show how to tell apart the parents and<br>children of a given node based on the algorithmic Markov condition. We give the<br>Climb algorithm to efficiently discover the directed, causal Markov<br>blanket---and show it is at least as accurate as inferring the global network,<br>while being much more efficient.<br> Last, but not least, we detail how we can use the Climb score to direct those<br>edges that state of the art causal discovery algorithms based on PC or GES<br>leave undirected---and show this improves their precision, recall and F1 scores<br>by up to 20%.<br>},
}

Endnote

%0 Report
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Discovery by Telling Apart Parents and Children : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5F36-8
%U http://arxiv.org/abs/1808.06356
%D 2018
%X   We consider the problem of inferring the directed, causal graph from<br>observational data, assuming no hidden confounders. We take an information<br>theoretic approach, and make three main contributions.<br>  First, we show how through algorithmic information theory we can obtain SCI,<br>a highly robust, effective and computationally efficient test for conditional<br>independence---and show it outperforms the state of the art when applied in<br>constraint-based inference methods such as stable PC.<br>  Second, building upon on SCI, we show how to tell apart the parents and<br>children of a given node based on the algorithmic Markov condition. We give the<br>Climb algorithm to efficiently discover the directed, causal Markov<br>blanket---and show it is at least as accurate as inferring the global network,<br>while being much more efficient.<br>  Last, but not least, we detail how we can use the Climb score to direct those<br>edges that state of the art causal discovery algorithms based on PC or GES<br>leave undirected---and show this improves their precision, recall and F1 scores<br>by up to 20%.<br>
%K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG

Conference paper

S. Metzler and P. Miettinen

“Random Graph Generators for Hyperbolic Community Structures,” in Complex Networks and Their Applications VII, Cambridge, UK, 2018.

mehr

BibTeX

@inproceedings{Metzler_COMPLEXNETWORKS2018,
TITLE = {Random Graph Generators for Hyperbolic Community Structures},
AUTHOR = {Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-3-030-05410-6; 978-3-030-05411-3},
DOI = {10.1007/978-3-030-05411-3_54},
PUBLISHER = {Springer},
YEAR = {2018},
BOOKTITLE = {Complex Networks and Their Applications VII},
EDITOR = {Aiello, Luca Maria and Cherifi, Chantal and Cherifi, Hocine and Lambiotte, Renaud and Li{\'o}, Pietro and Rocha, Luis M.},
PAGES = {680--693},
SERIES = {Studies in Computational Intelligence},
VOLUME = {812},
ADDRESS = {Cambridge, UK},
}

Endnote

%0 Conference Proceedings
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Random Graph Generators for Hyperbolic Community Structures : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A929-2
%R 10.1007/978-3-030-05411-3_54
%D 2018
%B 7th International Conference on Complex Networks and Their Applications
%Z date of event: 2018-12-11 - 2018-12-13
%C Cambridge, UK
%B Complex Networks and Their Applications VII
%E Aiello, Luca Maria; Cherifi, Chantal; Cherifi, Hocine; Lambiotte, Renaud; Li&#243;, Pietro; Rocha, Luis M.
%P 680 - 693
%I Springer
%@ 978-3-030-05410-6 978-3-030-05411-3
%B Studies in Computational Intelligence
%N 812

Paper

P. Mirza, S. Razniewski, F. Darari, and G. Weikum

“Enriching Knowledge Bases with Counting Quantifiers,” 2018. [Online]. Available: http://arxiv.org/abs/1807.03656.

mehr

Abstract

Information extraction traditionally focuses on extracting relations between

identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts

often also contain Counting information, stating that a subject is in a

specific relation with a number of objects, without mentioning the objects

themselves, for example, "California is divided into 58 counties". Such

counting quantifiers can help in a variety of tasks such as query answering or

knowledge base curation, but are neglected by prior work. This paper develops

the first full-fledged system for extracting counting information from text,

called CINEX. We employ distant supervision using fact counts from a knowledge

base as training seeds, and develop novel techniques for dealing with several

challenges: (i) non-maximal training seeds due to the incompleteness of

knowledge bases, (ii) sparse and skewed observations in text sources, and (iii)

high diversity of linguistic patterns. Experiments with five human-evaluated

relations show that CINEX can achieve 60% average precision for extracting

counting information. In a large-scale experiment, we demonstrate the potential

for knowledge base enrichment by applying CINEX to 2,474 frequent relations in

Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct

relations, which is 28% more than the existing Wikidata facts for these

relations.

BibTeX

@online{Mirza_arXiv:1807.03656,
TITLE = {Enriching Knowledge Bases with Counting Quantifiers},
AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1807.03656},
EPRINT = {1807.03656},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Information extraction traditionally focuses on extracting relations between identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.},
}

Endnote

%0 Report
%A Mirza, Paramita
%A Razniewski, Simon
%A Darari, Fariz
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Enriching Knowledge Bases with Counting Quantifiers : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-E16D-7
%U http://arxiv.org/abs/1807.03656
%D 2018
%X   Information extraction traditionally focuses on extracting relations between
identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts
often also contain Counting information, stating that a subject is in a
specific relation with a number of objects, without mentioning the objects
themselves, for example, "California is divided into 58 counties". Such
counting quantifiers can help in a variety of tasks such as query answering or
knowledge base curation, but are neglected by prior work. This paper develops
the first full-fledged system for extracting counting information from text,
called CINEX. We employ distant supervision using fact counts from a knowledge
base as training seeds, and develop novel techniques for dealing with several
challenges: (i) non-maximal training seeds due to the incompleteness of
knowledge bases, (ii) sparse and skewed observations in text sources, and (iii)
high diversity of linguistic patterns. Experiments with five human-evaluated
relations show that CINEX can achieve 60% average precision for extracting
counting information. In a large-scale experiment, we demonstrate the potential
for knowledge base enrichment by applying CINEX to 2,474 frequent relations in
Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct
relations, which is 28% more than the existing Wikidata facts for these
relations.

%K Computer Science, Computation and Language, cs.CL

Conference paper

P. Mirza, F. Darari, and R. Mahendra

“KOI at SemEval-2018 Task 5: Building Knowledge Graph of Incidents,” in Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, 2018.

mehr

BibTeX

@inproceedings{S18-1010,
TITLE = {{KOI} at {SemEval}-2018 Task 5: {B}uilding Knowledge Graph of Incidents},
AUTHOR = {Mirza, Paramita and Darari, Fariz and Mahendra, Rahmad},
LANGUAGE = {eng},
ISBN = {978-1-948087-20-9},
DOI = {10.18653/v1/S18-1010},
PUBLISHER = {ACL},
YEAR = {2018},
BOOKTITLE = {Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018)},
EDITOR = {Apidianaki, Marianna and Mohammad, Saif M. and May, Jonathan and Shutova, Ekatarina and Bethard, Steven and Carpuat, Marine},
PAGES = {81--87},
ADDRESS = {New Orleans, LA},
}

Endnote

%0 Conference Proceedings
%A Mirza, Paramita
%A Darari, Fariz
%A Mahendra, Rahmad
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T KOI at SemEval-2018 Task 5: Building Knowledge Graph of Incidents : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A818-6
%R 10.18653/v1/S18-1010
%D 2018
%B Twelfth International Workshop on Semantic Evaluation
%Z date of event: 2018-06-05 - 2018-06-06
%C New Orleans, LA
%B Proceedings of the 12th International Workshop on Semantic Evaluation 
%E Apidianaki, Marianna; Mohammad, Saif M.; May, Jonathan; Shutova, Ekatarina; Bethard, Steven; Carpuat, Marine
%P 81 - 87
%I ACL
%@ 978-1-948087-20-9
%U http://aclweb.org/anthology/S18-1010

Conference paper

P. Mirza, S. Razniewski, F. Darari, and G. Weikum

“Enriching Knowledge Bases with Counting Quantifiers,” in The Semantic Web -- ISWC 201, Monterey, CA, USA, 2018.

mehr

BibTeX

@inproceedings{MirzaISWC2018,
TITLE = {Enriching Knowledge Bases with Counting Quantifiers},
AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-030-00670-9},
DOI = {10.1007/978-3-030-00671-6_11},
PUBLISHER = {Springer},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {The Semantic Web -- ISWC 201},
EDITOR = {Vrande{\v c}i{\'c}, Denny and Bontcheva, Kalina and Su{\'a}rez-Figueroa, Mari Carmen and Presutti, Valentina and Celino, Irene and Sabou, Marta and Kaffee, Luci-Aim{\'e}e and Simperl, Elena},
PAGES = {179--197},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11136},
ADDRESS = {Monterey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Mirza, Paramita
%A Razniewski, Simon
%A Darari, Fariz
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Enriching Knowledge Bases with Counting Quantifiers : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-E170-2
%R 10.1007/978-3-030-00671-6_11
%D 2018
%B The 17th International Semantic Web Conference 
%Z date of event: 2018-10-08 - 2018-10-12
%C Monterey, CA, USA
%B The Semantic Web -- ISWC 201
%E Vrande&#269;i&#263;, Denny; Bontcheva, Kalina; Su&#225;rez-Figueroa, Mari Carmen; Presutti, Valentina; Celino, Irene; Sabou, Marta; Kaffee, Luci-Aim&#233;e; Simperl, Elena
%P 179 - 197
%I Springer
%@ 978-3-030-00670-9
%B Lecture Notes in Computer Science
%N 11136

Thesis

D5IMPR-CS

A. Mishra

“Leveraging Semantic Annotations for Event-focused Search & Summarization,” Universität des Saarlandes, Saarbrücken, 2018.

mehr

Abstract

Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.

BibTeX

@phdthesis{Mishraphd2018,
TITLE = {Leveraging Semantic Annotations for Event-focused Search \& Summarization},
AUTHOR = {Mishra, Arunav},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-ds-271081},
DOI = {10.22028/D291-27108},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2018},
ABSTRACT = {Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: \mbox{$\bullet$} We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. \mbox{$\bullet$} We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. \mbox{$\bullet$} To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.},
}

Endnote

%0 Thesis
%A Mishra, Arunav
%Y Berberich, Klaus
%A referee: Weikum, Gerhard
%A referee: Hauff, Claudia
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Leveraging Semantic Annotations for Event-focused Search & Summarization : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-1844-8
%U urn:nbn:de:bsz:291-scidok-ds-271081
%R 10.22028/D291-27108 
%F OTHER: hdl:20.500.11880/26995
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2018
%8 08.02.2018
%P 252 p.
%V phd
%9 phd
%X Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: &#8226; We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. &#8226; We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. &#8226; To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26995

Conference paper

S. Nag Chowdhury, N. Tandon, H. Ferhatosmanoglu, and G. Weikum

“VISIR: Visual and Semantic Image Label Refinement,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.

mehr

BibTeX

@inproceedings{NagChowdhury_WSDM2018,
TITLE = {{VISIR}: {V}isual and Semantic Image Label Refinement},
AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Ferhatosmanoglu, Hakan and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5581-0},
DOI = {10.1145/3159652.3159693},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining},
PAGES = {117--125},
ADDRESS = {Marina Del Rey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Nag Chowdhury, Sreyasi
%A Tandon, Niket
%A Ferhatosmanoglu, Hakan
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T VISIR: Visual and Semantic Image Label Refinement : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-3CA2-5
%R 10.1145/3159652.3159693
%D 2018
%B 11th ACM International Conference on Web Search and Data Mining
%Z date of event: 2018-02-05 - 2018-02-09
%C Marina Del Rey, CA, USA
%B WSDM'18
%P 117 - 125
%I ACM
%@ 978-1-4503-5581-0

Paper

S. Paramonov, D. Stepanova, and P. Miettinen

“Hybrid ASP-based Approach to Pattern Mining,” 2018. [Online]. Available: http://arxiv.org/abs/1808.07302.

mehr

Abstract

Detecting small sets of relevant patterns from a given dataset is a central
challenge in data mining. The relevance of a pattern is based on user-provided
criteria; typically, all patterns that satisfy certain criteria are considered
relevant. Rule-based languages like Answer Set Programming (ASP) seem
well-suited for specifying such criteria in a form of constraints. Although
progress has been made, on the one hand, on solving individual mining problems
and, on the other hand, developing generic mining systems, the existing methods
either focus on scalability or on generality. In this paper we make steps
towards combining local (frequency, size, cost) and global (various condensed
representations like maximal, closed, skyline) constraints in a generic and
efficient way. We present a hybrid approach for itemset, sequence and graph
mining which exploits dedicated highly optimized mining systems to detect
frequent patterns and then filters the results using declarative ASP. To
further demonstrate the generic nature of our hybrid framework we apply it to a
problem of approximately tiling a database. Experiments on real-world datasets
show the effectiveness of the proposed method and computational gains for
itemset, sequence and graph mining, as well as approximate tiling.
Under consideration in Theory and Practice of Logic Programming (TPLP).

BibTeX

@online{Paramonov_arXiv1808.07302,
TITLE = {Hybrid {ASP}-based Approach to Pattern Mining},
AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1808.07302},
EPRINT = {1808.07302},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Detecting small sets of relevant patterns from a given dataset is a central<br>challenge in data mining. The relevance of a pattern is based on user-provided<br>criteria; typically, all patterns that satisfy certain criteria are considered<br>relevant. Rule-based languages like Answer Set Programming (ASP) seem<br>well-suited for specifying such criteria in a form of constraints. Although<br>progress has been made, on the one hand, on solving individual mining problems<br>and, on the other hand, developing generic mining systems, the existing methods<br>either focus on scalability or on generality. In this paper we make steps<br>towards combining local (frequency, size, cost) and global (various condensed<br>representations like maximal, closed, skyline) constraints in a generic and<br>efficient way. We present a hybrid approach for itemset, sequence and graph<br>mining which exploits dedicated highly optimized mining systems to detect<br>frequent patterns and then filters the results using declarative ASP. To<br>further demonstrate the generic nature of our hybrid framework we apply it to a<br>problem of approximately tiling a database. Experiments on real-world datasets<br>show the effectiveness of the proposed method and computational gains for<br>itemset, sequence and graph mining, as well as approximate tiling.<br> Under consideration in Theory and Practice of Logic Programming (TPLP).<br>},
}

Endnote

%0 Report
%A Paramonov, Sergey
%A Stepanova, Daria
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Hybrid ASP-based Approach to Pattern Mining : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E60-9
%U http://arxiv.org/abs/1808.07302
%D 2018
%X   Detecting small sets of relevant patterns from a given dataset is a central<br>challenge in data mining. The relevance of a pattern is based on user-provided<br>criteria; typically, all patterns that satisfy certain criteria are considered<br>relevant. Rule-based languages like Answer Set Programming (ASP) seem<br>well-suited for specifying such criteria in a form of constraints. Although<br>progress has been made, on the one hand, on solving individual mining problems<br>and, on the other hand, developing generic mining systems, the existing methods<br>either focus on scalability or on generality. In this paper we make steps<br>towards combining local (frequency, size, cost) and global (various condensed<br>representations like maximal, closed, skyline) constraints in a generic and<br>efficient way. We present a hybrid approach for itemset, sequence and graph<br>mining which exploits dedicated highly optimized mining systems to detect<br>frequent patterns and then filters the results using declarative ASP. To<br>further demonstrate the generic nature of our hybrid framework we apply it to a<br>problem of approximately tiling a database. Experiments on real-world datasets<br>show the effectiveness of the proposed method and computational gains for<br>itemset, sequence and graph mining, as well as approximate tiling.<br>  Under consideration in Theory and Practice of Logic Programming (TPLP).<br>
%K Computer Science, Artificial Intelligence, cs.AI

Conference paper

T. Pellissier Tanon, D. Stepanova, S. Razniewski, P. Mirza, and G. Weikum

“Completeness-aware Rule Learning from Knowledge Graphs,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, 2018.

mehr

BibTeX

@inproceedings{PellissierIJCAI2018,
TITLE = {Completeness-aware Rule Learning from Knowledge Graphs},
AUTHOR = {Pellissier Tanon, Thomas and Stepanova, Daria and Razniewski, Simon and Mirza, Paramita and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-0-9992411-2-7},
DOI = {10.24963/ijcai.2018/749},
PUBLISHER = {IJCAI},
YEAR = {2018},
BOOKTITLE = {Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018)},
EDITOR = {Lang, J{\'e}r{\^o}me},
PAGES = {5339--5343},
ADDRESS = {Stockholm, Sweden},
}

Endnote

%0 Conference Proceedings
%A Pellissier Tanon, Thomas
%A Stepanova, Daria
%A Razniewski, Simon
%A Mirza, Paramita
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Completeness-aware Rule Learning from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-9070-D
%R 10.24963/ijcai.2018/749
%D 2018
%B 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence
%Z date of event: 2018-07-13 - 2018-07-19
%C Stockholm, Sweden
%B Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
%E Lang, J&#233;r&#244;me
%P 5339 - 5343
%I IJCAI
%@ 978-0-9992411-2-7 
%U https://doi.org/10.24963/ijcai.2018/749

Conference paper

M. Ponza, L. Del Corro, and G. Weikum

“Facts That Matter,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.

mehr

BibTeX

@inproceedings{D18-1129,
TITLE = {Facts That Matter},
AUTHOR = {Ponza, Marco and Del Corro, Luciano and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-948087-84-1},
URL = {https://aclanthology.coli.uni-saarland.de/papers/D18-1129/d18-1129},
PUBLISHER = {ACL},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)},
EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii},
PAGES = {1043--1048},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Ponza, Marco
%A Del Corro, Luciano
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Facts That Matter : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A2C1-C
%U https://aclanthology.coli.uni-saarland.de/papers/D18-1129/d18-1129
%D 2018
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2018-10-31 - 2018-11-04
%C Brussels, Belgium
%B The Conference on Empirical Methods in Natural Language Processing

%E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii
%P 1043 - 1048
%I ACL
%@ 978-1-948087-84-1

Conference paper

K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum

“CredEye: A Credibility Lens for Analyzing and Explaining Misinformation,” in Companion of the Word Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{PopatWWW2017,
TITLE = {{CredEye}: {A} Credibility Lens for Analyzing and Explaining Misinformation},
AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5640-4},
DOI = {10.1145/3184558.3186967},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Companion of the Word Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel},
PAGES = {155--158},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Popat, Kashyap
%A Mukherjee, Subhabrata
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T CredEye: A Credibility Lens for Analyzing and Explaining Misinformation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-B546-5
%R 10.1145/3184558.3186967
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Companion of the Word Wide Web Conference
%E Champin, Pierre-Antoine; Gandon , Fabien; M&#233;dini, Lionel
%P 155 - 158
%I ACM
%@ 978-1-4503-5640-4

Conference paper

K. Popat, S. Mukherjee, A. Yates, and G. Weikum

“DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.

mehr

BibTeX

@inproceedings{D18-1003,
TITLE = {{DeClarE}: {D}ebunking Fake News and False Claims using Evidence-Aware Deep Learning},
AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-948087-84-1},
URL = {https://aclanthology.coli.uni-saarland.de/papers/D18-1003/d18-1003},
PUBLISHER = {ACL},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)},
EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii},
PAGES = {22--32},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Popat, Kashyap
%A Mukherjee, Subhabrata
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T DeClarE: Debunking Fake News and False Claims
using Evidence-Aware Deep Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-B348-3
%U https://aclanthology.coli.uni-saarland.de/papers/D18-1003/d18-1003
%D 2018
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2018-10-31 - 2018-11-04
%C Brussels, Belgium
%B The Conference on Empirical Methods in Natural Language Processing

%E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii
%P 22 - 32
%I ACL
%@ 978-1-948087-84-1

Paper

K. Popat, S. Mukherjee, A. Yates, and G. Weikum

“DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning,” 2018. [Online]. Available: http://arxiv.org/abs/1809.06416.

mehr

Abstract

Misinformation such as fake news is one of the big challenges of our society.
Research on automated fact-checking has proposed methods based on supervised
learning, but these approaches do not consider external evidence apart from
labeled training instances. Recent approaches counter this deficit by
considering external sources related to a claim. However, these methods require
substantial feature modeling and rich lexicons. This paper overcomes these
limitations of prior work with an end-to-end model for evidence-aware
credibility assessment of arbitrary textual claims, without any human
intervention. It presents a neural network model that judiciously aggregates
signals from external evidence articles, the language of these articles and the
trustworthiness of their sources. It also derives informative features for
generating user-comprehensible explanations that makes the neural network
predictions transparent to the end-user. Experiments with four datasets and
ablation studies show the strength of our method.

BibTeX

@online{Popat_arXiv1809.06416,
TITLE = {{DeClarE}: {D}ebunking Fake News and False Claims using Evidence-Aware Deep Learning},
AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1809.06416},
EPRINT = {1809.06416},
EPRINTTYPE = {arXiv},
YEAR = {2018},
ABSTRACT = {Misinformation such as fake news is one of the big challenges of our society.<br>Research on automated fact-checking has proposed methods based on supervised<br>learning, but these approaches do not consider external evidence apart from<br>labeled training instances. Recent approaches counter this deficit by<br>considering external sources related to a claim. However, these methods require<br>substantial feature modeling and rich lexicons. This paper overcomes these<br>limitations of prior work with an end-to-end model for evidence-aware<br>credibility assessment of arbitrary textual claims, without any human<br>intervention. It presents a neural network model that judiciously aggregates<br>signals from external evidence articles, the language of these articles and the<br>trustworthiness of their sources. It also derives informative features for<br>generating user-comprehensible explanations that makes the neural network<br>predictions transparent to the end-user. Experiments with four datasets and<br>ablation studies show the strength of our method.<br>},
}

Endnote

%0 Report
%A Popat, Kashyap
%A Mukherjee, Subhabrata
%A Yates, Andrew
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5EE1-7
%U http://arxiv.org/abs/1809.06416
%D 2018
%X   Misinformation such as fake news is one of the big challenges of our society.<br>Research on automated fact-checking has proposed methods based on supervised<br>learning, but these approaches do not consider external evidence apart from<br>labeled training instances. Recent approaches counter this deficit by<br>considering external sources related to a claim. However, these methods require<br>substantial feature modeling and rich lexicons. This paper overcomes these<br>limitations of prior work with an end-to-end model for evidence-aware<br>credibility assessment of arbitrary textual claims, without any human<br>intervention. It presents a neural network model that judiciously aggregates<br>signals from external evidence articles, the language of these articles and the<br>trustworthiness of their sources. It also derives informative features for<br>generating user-comprehensible explanations that makes the neural network<br>predictions transparent to the end-user. Experiments with four datasets and<br>ablation studies show the strength of our method.<br>
%K Computer Science, Computation and Language, cs.CL,Computer Science, Learning, cs.LG

Article

Y. Ran, B. He, K. Hui, J. Xu, and L. Sun

“Neural Relevance Model Using Similarities with Elite Documents for Effective Clinical Decision Support,” International Journal of Data Mining and Bioinformatics, vol. 20, no. 2, 2018.

mehr

BibTeX

@article{Ran_2018,
TITLE = {Neural Relevance Model Using Similarities with Elite Documents for Effective Clinical Decision Support},
AUTHOR = {Ran, Yanhua and He, Ben and Hui, Kai and Xu, Jungang and Sun, Le},
LANGUAGE = {eng},
ISSN = {1748-5673},
DOI = {10.1504/IJDMB.2018.10015098},
PUBLISHER = {Inderscience Publ.},
ADDRESS = {Gen{\`e}ve},
YEAR = {2018},
DATE = {2018},
JOURNAL = {International Journal of Data Mining and Bioinformatics},
VOLUME = {20},
NUMBER = {2},
PAGES = {91--108},
}

Endnote

%0 Journal Article
%A Ran, Yanhua
%A He, Ben
%A Hui, Kai
%A Xu, Jungang
%A Sun, Le
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Neural Relevance Model Using Similarities with Elite Documents for Effective Clinical Decision Support : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5743-1
%R 10.1504/IJDMB.2018.10015098
%7 2018
%D 2018
%J International Journal of Data Mining and Bioinformatics
%V 20
%N 2
%& 91
%P 91 - 108
%I Inderscience Publ.
%C Gen&#232;ve
%@ false

Article

S. Razniewski and G. Weikum

“Knowledge Base Recall: Detecting and Resolving the Unknown Unknowns,” ACM SIGWEB Newsletter, no. Spring, 2018.

mehr

BibTeX

@article{Razniewski2018,
TITLE = {Knowledge Base Recall: Detecting and Resolving the Unknown Unknowns},
AUTHOR = {Razniewski, Simon and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1145/3210578.3210581},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2018},
JOURNAL = {ACM SIGWEB Newsletter},
NUMBER = {Spring},
EID = {3},
}

Endnote

%0 Journal Article
%A Razniewski, Simon
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Base Recall: Detecting and Resolving the Unknown Unknowns : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-E175-D
%R 10.1145/3210578.3210581
%7 2018
%D 2018
%J ACM SIGWEB Newsletter
%N Spring
%Z sequence number: 3
%I ACM
%C New York, NY

Conference paper

M. Ringsquandl, E. Kharlamov, D. Stepanova, M. Hildebrandt, S. Lamparter, R. Lepratti, I. Horrocks, and P. Kroeger

“Filling Gaps in Industrial Knowledge Graphs via Event-Enhanced Embedding,” in ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018) (ISWC-P&D-Industry-BlueSky 2018), Monterey, CA, USA, 2018.

mehr

BibTeX

@inproceedings{Ringsquandl_ISWC2018_Poster,
TITLE = {Filling Gaps in Industrial Knowledge Graphs via Event-Enhanced Embedding},
AUTHOR = {Ringsquandl, Martin and Kharlamov, Evgeny and Stepanova, Daria and Hildebrandt, Marcel and Lamparter, Steffen and Lepratti, Raffaello and Horrocks, Ian and Kroeger, Peer},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-2180/paper-52.pdf; urn:nbn:de:0074-2180-3},
PUBLISHER = {CEUR-WS.org},
YEAR = {2018},
BOOKTITLE = {ISWC 2018 Posters \& Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018) (ISWC-P\&D-Industry-BlueSky 2018)},
EDITOR = {van Erp, Marieke and Atre, Medha and Lopez, Vanessa and Srinivas, Kavitha and Fortuna, Carolina},
EID = {52},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {2180},
ADDRESS = {Monterey, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Ringsquandl, Martin
%A Kharlamov, Evgeny
%A Stepanova, Daria
%A Hildebrandt, Marcel
%A Lamparter, Steffen
%A Lepratti, Raffaello
%A Horrocks, Ian
%A Kroeger, Peer
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Filling Gaps in Industrial Knowledge Graphs via Event-Enhanced Embedding : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E67-2
%U http://ceur-ws.org/Vol-2180/paper-52.pdf
%D 2018
%B 17th International Semantic Web Conference
%Z date of event: 2018-10-08 - 2018-10-12
%C Monterey, CA, USA
%B ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018)
%E van Erp, Marieke; Atre, Medha; Lopez, Vanessa; Srinivas, Kavitha; Fortuna, Carolina
%Z sequence number: 52
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 2180
%@ false

Conference paper

M. Ringsquandl, E. Kharlamov, D. Stepanova, M. Hildebrandt, S. Lamparter, R. Lepratti, I. Horrocks, and P. Kröger

“Event-Enhanced Learning for KG Completion,” in The Semantic Web (ESWC 2018), Heraklion, Crete, Greece, 2018.

mehr

BibTeX

@inproceedings{Ringsquandl_ESWC2018,
TITLE = {Event-Enhanced Learning for {KG} Completion},
AUTHOR = {Ringsquandl, Martin and Kharlamov, Evgeny and Stepanova, Daria and Hildebrandt, Marcel and Lamparter, Steffen and Lepratti, Raffaello and Horrocks, Ian and Kr{\"o}ger, Peer},
LANGUAGE = {eng},
ISBN = {978-3-319-93416-7},
DOI = {10.1007/978-3-319-93417-4_35},
PUBLISHER = {Springer},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {The Semantic Web (ESWC 2018)},
EDITOR = {Gangem, Aldo and Navigli, Roberto and Vidal, Maria-Esther and Hitzler, Pascal and Troncy, Rapha{\"e}l and Hollink, Laura and Tordai, Anna and Alam, Mehwish},
PAGES = {541--559},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10843},
ADDRESS = {Heraklion, Crete, Greece},
}

Endnote

%0 Conference Proceedings
%A Ringsquandl, Martin
%A Kharlamov, Evgeny
%A Stepanova, Daria
%A Hildebrandt, Marcel
%A Lamparter, Steffen
%A Lepratti, Raffaello
%A Horrocks, Ian
%A Kr&#246;ger, Peer
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Event-Enhanced Learning for KG Completion : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E82-2
%R 10.1007/978-3-319-93417-4_35
%D 2018
%B 15th Extended Semantic Web Conference
%Z date of event: 2018-06-03 - 2018-06-07
%C Heraklion, Crete, Greece
%B The Semantic Web
%E Gangem, Aldo; Navigli, Roberto; Vidal, Maria-Esther; Hitzler, Pascal; Troncy, Rapha&#235;l; Hollink, Laura; Tordai, Anna; Alam, Mehwish
%P 541 - 559
%I Springer
%@ 978-3-319-93416-7
%B Lecture Notes in Computer Science
%N 10843

Conference paper

D. Seyler, T. Dembelova, L. Del Corro, J. Hoffart, and G. Weikum

“A Study of the Importance of External Knowledge in the Named Entity Recognition Task,” in The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018.

mehr

BibTeX

@inproceedings{AgrawalACL2018b,
TITLE = {A Study of the Importance of External Knowledge in the Named Entity Recognition Task},
AUTHOR = {Seyler, Dominic and Dembelova, Tatiana and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-948087-34-6},
PUBLISHER = {ACL},
YEAR = {2018},
BOOKTITLE = {The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)},
PAGES = {241--246},
EID = {602},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Seyler, Dominic
%A Dembelova, Tatiana
%A Del Corro, Luciano
%A Hoffart, Johannes
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Study of the Importance of External Knowledge in the Named Entity Recognition Task : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-0C65-0
%D 2018
%B The 56th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2018-07-15 - 2018-07-20
%C Melbourne, Australia
%B The 56th Annual Meeting of the Association for Computational Linguistics
%P 241 - 246
%Z sequence number: 602
%I ACL
%@ 978-1-948087-34-6
%U http://aclweb.org/anthology/P18-2039

Conference paper

X. Shen, H. Su, S. Niu, and V. Demberg

“Improving Variational Encoder-Decoders in Dialogue Generation,” in Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2018.

mehr

BibTeX

@inproceedings{shen2018improving,
TITLE = {Improving Variational Encoder-Decoders in Dialogue Generation},
AUTHOR = {Shen, Xiaoyu and Su, Hui and Niu, Shuzi and Demberg, Vera},
LANGUAGE = {eng},
ISBN = {978-1-57735-800-8},
URL = {https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16402/16100},
PUBLISHER = {AAAI},
YEAR = {2018},
BOOKTITLE = {Thirty-Second AAAI Conference on Artificial Intelligence},
PAGES = {5456--5463},
EID = {16402},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A Shen, Xiaoyu
%A Su, Hui
%A Niu, Shuzi
%A Demberg, Vera
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Improving Variational Encoder-Decoders in Dialogue Generation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-0DAB-F
%U https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16402/16100
%D 2018
%B Thirty-Second AAAI Conference on Artificial Intelligence
%Z date of event: 2018-02-02 - 2018-02-07
%C New Orleans, LA, USA
%B Thirty-Second AAAI Conference on Artificial Intelligence
%P 5456 - 5463
%Z sequence number: 16402
%I AAAI
%@ 978-1-57735-800-8

Conference paper

X. Shen, H. Su, W. Li, and D. Klakow

“NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.

mehr

BibTeX

@inproceedings{shen2018nexus,
TITLE = {{NEXUS} Network: {C}onnecting the Preceding and the Following in Dialogue Generation},
AUTHOR = {Shen, Xiaoyu and Su, Hui and Li, Wenjie and Klakow, Dietrich},
LANGUAGE = {eng},
ISBN = {978-1-948087-84-1},
URL = {http://aclweb.org/anthology/D18-1463},
PUBLISHER = {ACL},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)},
EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii},
PAGES = {4316--4327},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Shen, Xiaoyu
%A Su, Hui
%A Li, Wenjie
%A Klakow, Dietrich
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-0DBD-B
%U http://aclweb.org/anthology/D18-1463
%D 2018
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2018-10-31 - 2018-11-04
%C Brussels, Belgium
%B The Conference on Empirical Methods in Natural Language Processing

%E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii
%P 4316 - 4327
%I ACL
%@ 978-1-948087-84-1

Conference paper

M. Singh, A. Mishra, Y. Oualil, K. Berberich, and D. Klakow

“Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization,” in Advances in Information Retrieval (ECIR 2018), Grenoble, France, 2018.

mehr

BibTeX

@inproceedings{SinghECIR2ss18,
TITLE = {Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization},
AUTHOR = {Singh, Mittul and Mishra, Arunav and Oualil, Youssef and Berberich, Klaus and Klakow, Dietrich},
LANGUAGE = {eng},
ISBN = {978-3-319-76940-0},
DOI = {10.1007/978-3-319-76941-7_59},
PUBLISHER = {Springer},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2018)},
EDITOR = {Pasi, Gabriella and Piwowarski, Benjamin and Azzopardi, Leif and Hanbury, Allan},
PAGES = {657--664},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10772},
ADDRESS = {Grenoble, France},
}

Endnote

%0 Conference Proceedings
%A Singh, Mittul
%A Mishra, Arunav
%A Oualil, Youssef
%A Berberich, Klaus
%A Klakow, Dietrich
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-413D-2
%R 10.1007/978-3-319-76941-7_59
%D 2018
%B 40th European Conference on IR Research
%Z date of event: 2018-03-26 - 2018-03-29
%C Grenoble, France
%B Advances in Information Retrieval
%E Pasi, Gabriella; Piwowarski, Benjamin; Azzopardi, Leif; Hanbury, Allan
%P 657 - 664
%I Springer
%@ 978-3-319-76940-0
%B Lecture Notes in Computer Science
%N 10772

Conference paper

A. Spitz, J. Strötgen, and M. Gertz

“Predicting Document Creation Times in News Citation Networks,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.

mehr

BibTeX

@inproceedings{SpitzWWW2017,
TITLE = {Predicting Document Creation Times in News Citation Networks},
AUTHOR = {Spitz, Andreas and Str{\"o}tgen, Jannik and Gertz, Michael},
LANGUAGE = {eng},
ISBN = {978-1-4503-5640-4},
DOI = {10.1145/3184558.3191633},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)},
EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel},
PAGES = {1731--1736},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Spitz, Andreas
%A Str&#246;tgen, Jannik
%A Gertz, Michael
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Predicting Document Creation Times in News Citation Networks : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-B544-7
%R 10.1145/3184558.3191633
%D 2018
%B The Web Conference
%Z date of event: 2018-04-23 - 2018-04-27
%C Lyon, France
%B Companion of the World Wide Web Conference
%E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel
%P 1731 - 1736
%I ACM
%@ 978-1-4503-5640-4

Conference paper

D. Stepanova, V. T. Ho, and M. H. Gad-Elrab

“Rule Induction and Reasoning over Knowledge Graphs,” in Reasoning Web, Esch-sur-Alzette, Luxembourg, 2018.

mehr

BibTeX

@inproceedings{StepanovaRW2018,
TITLE = {Rule Induction and Reasoning over Knowledge Graphs},
AUTHOR = {Stepanova, Daria and Ho, Vinh Thinh and Gad-Elrab, Mohamed Hassan},
LANGUAGE = {eng},
ISBN = {978-3-030-00337-1},
DOI = {10.1007/978-3-030-00338-8_6},
PUBLISHER = {Springer},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Reasoning Web},
EDITOR = {D'Amato, Claudia and Theobald, Martin},
PAGES = {142--172},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {11078},
ADDRESS = {Esch-sur-Alzette, Luxembourg},
}

Endnote

%0 Conference Proceedings
%A Stepanova, Daria
%A Ho, Vinh Thinh
%A Gad-Elrab, Mohamed Hassan
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Rule Induction and Reasoning over Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-9066-9
%R 10.1007/978-3-030-00338-8_6
%D 2018
%B 14th Reasoning Web Summer School
%Z date of event: 2018-09-22 - 2018-09-26
%C Esch-sur-Alzette, Luxembourg
%B Reasoning Web
%E D'Amato, Claudia; Theobald, Martin
%P 142 - 172
%I Springer
%@ 978-3-030-00337-1
%B Lecture Notes in Computer Science
%N 11078

Conference paper

J. Strötgen, A.-L. Minard, L. Lange, M. Speranza, and B. Magnini

“KRAUTS: A German Temporally Annotated News Corpus,” in Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.

mehr

BibTeX

@inproceedings{StroetgenELREC2018,
TITLE = {{KRAUTS}: {A German} Temporally Annotated News Corpus},
AUTHOR = {Str{\"o}tgen, Jannik and Minard, Anne-Lyse and Lange, Lukas and Speranza, Manuela and Magnini, Bernardo},
LANGUAGE = {eng},
ISBN = {979-10-95546-00-9},
URL = {http://lrec2018.lrec-conf.org/en/},
PUBLISHER = {ELRA},
YEAR = {2018},
BOOKTITLE = {Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
EDITOR = {Calzolari, Nicoletta and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Hasida, Koiti},
PAGES = {536--540},
ADDRESS = {Miyazaki, Japan},
}

Endnote

%0 Conference Proceedings
%A Str&#246;tgen, Jannik
%A Minard, Anne-Lyse
%A Lange, Lukas
%A Speranza, Manuela
%A Magnini, Bernardo
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T KRAUTS: A German Temporally Annotated News Corpus : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-8B8C-E
%U http://lrec2018.lrec-conf.org/en/
%D 2018
%B 11th Language Resources and Evaluation Conference
%Z date of event: 2018-05-07 - 2018-05-12
%C Miyazaki, Japan
%B Eleventh International Conference on Language Resources and Evaluation
%E Calzolari, Nicoletta; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Hasida, Koiti
%P 536 - 540
%I ELRA
%@ 979-10-95546-00-9

Conference paper

J. Strötgen, R. Andrade, and D. Gupta

“Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations,” in JCDL’18, Joint Conference on Digital Libraries, Fort Worth, TX, USA, 2018.

mehr

BibTeX

@inproceedings{StroetgenJCDL2018,
TITLE = {Putting Dates on the Map: {H}arvesting and Analyzing Street Names with Date Mentions and their Explanations},
AUTHOR = {Str{\"o}tgen, Jannik and Andrade, Rosita and Gupta, Dhruv},
LANGUAGE = {eng},
ISBN = {978-1-4503-5178-2},
DOI = {10.1145/3197026.3197035},
PUBLISHER = {ACM},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {JCDL'18, Joint Conference on Digital Libraries},
PAGES = {79--88},
ADDRESS = {Fort Worth, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Str&#246;tgen, Jannik
%A Andrade, Rosita
%A Gupta, Dhruv
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-B548-3
%R 10.1145/3197026.3197035
%D 2018
%B Joint Conference on Digital Libraries
%Z date of event: 2018-06-03 - 2018-06-07
%C Fort Worth, TX, USA
%B JCDL'18
%P 79 - 88
%I ACM
%@  978-1-4503-5178-2

Conference paper

H. Su, X. Shen, P. Hu, W. Li, and Y. Chen

“Dialogue Generation with GAN,” in Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2018.

mehr

BibTeX

@inproceedings{Su_AAAI2018,
TITLE = {Dialogue Generation with {GAN}},
AUTHOR = {Su, Hui and Shen, Xiaoyu and Hu, Pengwei and Li, Wenjie and Chen, Yun},
LANGUAGE = {eng},
ISBN = {978-1-57735-800-8},
URL = {https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16508/16519},
PUBLISHER = {AAAI},
YEAR = {2018},
BOOKTITLE = {Thirty-Second AAAI Conference on Artificial Intelligence},
PAGES = {8163--8164},
EID = {16402},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A Su, Hui
%A Shen, Xiaoyu
%A Hu, Pengwei
%A Li, Wenjie
%A Chen, Yun
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Dialogue Generation with GAN : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0004-E562-B
%U https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16508/16519
%D 2018
%B Thirty-Second AAAI Conference on Artificial Intelligence
%Z date of event: 2018-02-02 - 2018-02-07
%C New Orleans, LA, USA
%B Thirty-Second AAAI Conference on Artificial Intelligence
%P 8163 - 8164
%Z sequence number: 16402
%I AAAI
%@ 978-1-57735-800-8

Thesis

G. H. Torbati

“Joint Disambiguation of Named Entities and Concepts,” Universität des Saarlandes, Saarbrücken, 2018.

mehr

BibTeX

@mastersthesis{torbati2018concept,
TITLE = {Joint Disambiguation of Named Entities and Concepts},
AUTHOR = {Torbati, Ghazaleh Haratinezhad},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2018},
DATE = {2018},
}

Endnote

%0 Thesis
%A Torbati, Ghazaleh Haratinezhad
%Y Del Corro, Luciano
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Joint Disambiguation of Named Entities and Concepts : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-38D0-3
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2018
%P XIII, 70 p.
%V master
%9 master

Conference paper

L. Wang, Y. Wang, G. de Melo, and G. Weikum

“Five Shades of Untruth: Finer-Grained Classification of Fake News,” in Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mini (ASONAM 2018), Barcelona, Spain, 2018.

mehr

BibTeX

@inproceedings{DBLP:conf/asunam/WangWMW18,
TITLE = {Five Shades of Untruth: {F}iner-Grained Classification of Fake News},
AUTHOR = {Wang, Liqiang and Wang, Yafang and de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-5386-6051-5},
DOI = {10.1109/ASONAM.2018.8508256},
PUBLISHER = {IEEE},
YEAR = {2018},
DATE = {2018},
BOOKTITLE = {Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mini (ASONAM 2018)},
EDITOR = {Brandes, Ulrik and Reddy, Chandan and Tagarelli, Andrea},
PAGES = {553--594},
ADDRESS = {Barcelona, Spain},
}

Endnote

%0 Conference Proceedings
%A Wang, Liqiang
%A Wang, Yafang
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Five Shades of Untruth: Finer-Grained Classification of Fake News : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0003-3633-7
%R 10.1109/ASONAM.2018.8508256
%D 2018
%B IEEE/ACM International Conference on Advances in Social Networks Analysis and Mini
%Z date of event: 2018-08-28 - 2018-08-31
%C Barcelona, Spain
%B Proceedings of the 2018 IEEE/ACM International Conference on  
Advances in Social Networks Analysis and Mini
%E Brandes, Ulrik; Reddy, Chandan; Tagarelli, Andrea
%P 553 - 594
%I IEEE
%@ 978-1-5386-6051-5

Article

H. Wu, Y. Ning, P. Chakraborty, J. Vreeken, N. Tatti, and N. Ramakrishnan

“Generating Realistic Synthetic Population Datasets,” ACM Transactions on Knowledge Discovery from Data, vol. 12, no. 4, 2018.

mehr

BibTeX

@article{Wu_2018,
TITLE = {Generating Realistic Synthetic Population Datasets},
AUTHOR = {Wu, Hao and Ning, Yue and Chakraborty, Prithwish and Vreeken, Jilles and Tatti, Nikolaj and Ramakrishnan, Naren},
LANGUAGE = {eng},
DOI = {10.1145/3182383},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2018},
DATE = {2018},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {12},
NUMBER = {4},
PAGES = {1--22},
EID = {45},
}

Endnote

%0 Journal Article
%A Wu, Hao
%A Ning, Yue
%A Chakraborty, Prithwish
%A Vreeken, Jilles
%A Tatti, Nikolaj
%A Ramakrishnan, Naren
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Generating Realistic Synthetic Population Datasets : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-16ED-B
%R 10.1145/3182383
%7 2018
%D 2018
%J ACM Transactions on Knowledge Discovery from Data
%O TKDD
%V 12
%N 4
%& 1
%P 1 - 22
%Z sequence number: 45
%I ACM
%C New York, NY

Article

Y. Zhao, X. Shen, H. Senuma, and A. Aizawa

“A Comprehensive Study: Sentence Compression with Linguistic Knowledge-enhanced Gated Neural Network,” Data & Knowledge Engineering, vol. 117, 2018.

mehr

BibTeX

@article{Zhao_2018,
TITLE = {A Comprehensive Study: Sentence Compression with Linguistic Knowledge-enhanced Gated Neural Network},
AUTHOR = {Zhao, Yang and Shen, Xiaoyu and Senuma, Hajime and Aizawa, Akiko},
LANGUAGE = {eng},
ISSN = {0169-023X},
DOI = {10.1016/j.datak.2018.05.007},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2018},
DATE = {2018},
JOURNAL = {Data \& Knowledge Engineering},
VOLUME = {117},
PAGES = {307--318},
}

Endnote

%0 Journal Article
%A Zhao, Yang
%A Shen, Xiaoyu
%A Senuma, Hajime
%A Aizawa, Akiko
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T A Comprehensive Study: Sentence Compression with Linguistic Knowledge-enhanced Gated Neural Network : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-72D7-B
%R 10.1016/j.datak.2018.05.007
%7 2018
%D 2018
%J Data & Knowledge Engineering
%V 117
%& 307
%P 307 - 318
%I Elsevier
%C Amsterdam
%@ false

2017

Conference paper

A. Abujabal, M. Yahya, M. Riedewald, and G. Weikum

“Automated Template Generation for Question Answering over Knowledge Graphs,” in WWW’17, 26th International Conference on World Wide Web, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{AbujabalWWW2017,
TITLE = {Automated Template Generation for Question Answering over Knowledge Graphs},
AUTHOR = {Abujabal, Abdalghani and Yahya, Mohamed and Riedewald, Mirek and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4913-0},
DOI = {10.1145/3038912.3052583},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17, 26th International Conference on World Wide Web},
PAGES = {1191--1200},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Abujabal, Abdalghani
%A Yahya, Mohamed
%A Riedewald, Mirek
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Automated Template Generation for Question Answering over Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4F9C-E
%R 10.1145/3038912.3052583
%D 2017
%B 26th International Conference on World Wide Web 
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17
%P 1191 - 1200
%I ACM
%@ 978-1-4503-4913-0

Conference paper

A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum

“QUINT: Interpretable Question Answering over Knowledge Bases,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.

mehr

BibTeX

@inproceedings{AbujabalENMLP2017,
TITLE = {{QUINT}: {I}nterpretable Question Answering over Knowledge Bases},
AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-945626-97-5},
URL = {http://aclweb.org/anthology/D17-2011},
PUBLISHER = {ACL},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)},
PAGES = {61--66},
ADDRESS = {Copenhagen, Denmark},
}

Endnote

%0 Conference Proceedings
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Yahya, Mohamed
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T QUINT: Interpretable Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-F97C-E
%U http://aclweb.org/anthology/D17-2011
%D 2017
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2017-09-09 - 2017-09-11
%C Copenhagen, Denmark
%B The Conference on Empirical Methods in Natural Language Processing

%P 61 - 66
%I ACL
%@ 978-1-945626-97-5
%U http://aclweb.org/anthology/D17-2011

Conference paper

IMPR-CSD5

P. Agarwal and J. Strötgen

“Tiwiki: Searching Wikipedia with Temporal Constraints,” in WWW ’17 Companion, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{AgarwalStroetgen2017_TempWeb,
TITLE = {Tiwiki: Searching {W}ikipedia with Temporal Constraints},
AUTHOR = {Agarwal, Prabal and Str{\"o}tgen, Jannik},
LANGUAGE = {eng},
ISBN = {978-1-4503-4914-7},
DOI = {10.1145/3041021.3051112},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW '17 Companion},
PAGES = {1595--1600},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Agarwal, Prabal
%A Str&#246;tgen, Jannik
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Tiwiki: Searching Wikipedia with Temporal Constraints : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-53AE-9
%R 10.1145/3041021.3051112
%D 2017
%B 26th International Conference on World Wide Web Companion
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW '17 Companion
%P 1595 - 1600
%I ACM
%@ 978-1-4503-4914-7

Conference paper

R. Andrade and J. Strötgen

“All Dates Lead to Rome: Extracting and Explaining Temporal References in Street Names,” in WWW’17 Companion, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{AndradeWWW2017,
TITLE = {All Dates Lead to {R}ome: {E}xtracting and Explaining Temporal References in Street Names},
AUTHOR = {Andrade, Rosita and Str{\"o}tgen, Jannik},
LANGUAGE = {eng},
ISBN = {978-1-4503-4914-7},
DOI = {10.1145/3041021.3054249},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17 Companion},
PAGES = {757--758},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Andrade, Rosita
%A Str&#246;tgen, Jannik
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T All Dates Lead to Rome: Extracting and Explaining Temporal References in Street Names : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-62AE-1
%R 10.1145/3041021.3054249
%D 2017
%B 26th International Conference on World Wide Web
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17 Companion
%P 757 - 758
%I ACM
%@ 978-1-4503-4914-7

Conference paper

D5D2

A. Bhattacharyya and J. Vreeken

“Efficiently Summarising Event Sequences with Rich Interleaving Patterns,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.

mehr

BibTeX

@inproceedings{bhattacharyya:17:squish,
TITLE = {Efficiently Summarising Event Sequences with Rich Interleaving Patterns},
AUTHOR = {Bhattacharyya, Apratim and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-497-3},
DOI = {10.1137/1.9781611974973.89},
PUBLISHER = {SIAM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)},
PAGES = {795--803},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Bhattacharyya, Apratim
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficiently Summarising Event Sequences with Rich Interleaving Patterns : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4BDC-D
%R 10.1137/1.9781611974973.89
%D 2017
%B 17th SIAM International Conference on Data Mining
%Z date of event: 2017-04-27 - 2017-04-29
%C Houston, TX, USA
%B Proceedings of the Seventeenth SIAM International Conference on Data Mining
%P 795 - 803
%I SIAM
%@ 978-1-61197-497-3

Paper

A. Bhattacharyya and J. Vreeken

“Efficiently Summarising Event Sequences with Rich Interleaving Patterns,” 2017. [Online]. Available: http://arxiv.org/abs/1701.08096.

mehr

Abstract

Discovering the key structure of a database is one of the main goals of data

mining. In pattern set mining we do so by discovering a small set of patterns

that together describe the data well. The richer the class of patterns we

consider, and the more powerful our description language, the better we will be

able to summarise the data. In this paper we propose \ourmethod, a novel greedy

MDL-based method for summarising sequential data using rich patterns that are

allowed to interleave. Experiments show \ourmethod is orders of magnitude

faster than the state of the art, results in better models, as well as

discovers meaningful semantics in the form patterns that identify multiple

choices of values.

BibTeX

@online{DBLP:journals/corr/BhattacharyyaV17,
TITLE = {Efficiently Summarising Event Sequences with Rich Interleaving Patterns},
AUTHOR = {Bhattacharyya, Apratim and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1701.08096},
EPRINT = {1701.08096},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values.},
}

Endnote

%0 Report
%A Bhattacharyya, Apratim
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficiently Summarising Event Sequences with Rich Interleaving Patterns : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90E4-A
%U http://arxiv.org/abs/1701.08096
%D 2017
%X   Discovering the key structure of a database is one of the main goals of data
mining. In pattern set mining we do so by discovering a small set of patterns
that together describe the data well. The richer the class of patterns we
consider, and the more powerful our description language, the better we will be
able to summarise the data. In this paper we propose \ourmethod, a novel greedy
MDL-based method for summarising sequential data using rich patterns that are
allowed to interleave. Experiments show \ourmethod is orders of magnitude
faster than the state of the art, results in better models, as well as
discovers meaningful semantics in the form patterns that identify multiple
choices of values.

%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB

Conference paper

A. J. Biega, R. Saha Roy, and G. Weikum

“Privacy through Solidarity: A User-Utility-Preserving Framework to Counter Profiling,” in SIGIR’17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 2017.

mehr

BibTeX

@inproceedings{BiegaSIGIR2017,
TITLE = {Privacy through Solidarity: {A} User-Utility-Preserving Framework to Counter Profiling},
AUTHOR = {Biega, Asia J. and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5022-8},
DOI = {10.1145/3077136.3080830},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {SIGIR'17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {675--684},
ADDRESS = {Shinjuku, Tokyo, Japan},
}

Endnote

%0 Conference Proceedings
%A Biega, Asia J.
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Privacy through Solidarity: A User-Utility-Preserving Framework to Counter Profiling : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-F901-2
%R 10.1145/3077136.3080830
%D 2017
%B 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2017-08-07 - 2017-08-11
%C Shinjuku, Tokyo, Japan
%B SIGIR'17
%P 675 - 684
%I ACM
%@ 978-1-4503-5022-8

Conference paper

A. J. Biega, A. Ghazimatin, H. Ferhatosmanoglu, K. P. Gummadi, and G. Weikum

“Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.

mehr

BibTeX

@inproceedings{Biega_CIKM2017,
TITLE = {Learning to Un-Rank: {Q}uantifying Search Exposure for Users in Online Communities},
AUTHOR = {Biega, Asia J. and Ghazimatin, Azin and Ferhatosmanoglu, Hakan and Gummadi, Krishna P. and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4918-5},
DOI = {10.1145/3132847.3133040},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management},
PAGES = {267--276},
ADDRESS = {Singapore, Singapore},
}

Endnote

%0 Conference Proceedings
%A Biega, Asia J.
%A Ghazimatin, Azin
%A Ferhatosmanoglu, Hakan
%A Gummadi, Krishna P.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-3BA4-5
%R 10.1145/3132847.3133040
%D 2017
%B 26th ACM International Conference on Information and Knowledge Management 
%Z date of event: 2017-11-06 - 2017-11-10
%C Singapore, Singapore
%B CIKM'17
%P 267 - 276
%I ACM
%@ 978-1-4503-4918-5

Conference paper

N. Boldyrev, M. Spaniol, J. Strötgen, and G. Weikum

“SESAME: European Statistics Explored via Semantic Alignment onto Wikipedia,” in WWW’17 Companion, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{BoldyrevWWW2017,
TITLE = {{SESAME}: {E}uropean Statistics Explored via Semantic Alignment onto {Wikipedia}},
AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4914-7},
DOI = {10.1145/3041021.3054732},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17 Companion},
PAGES = {177--181},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Boldyrev, Natalia
%A Spaniol, Marc
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T SESAME: European Statistics Explored via Semantic Alignment onto Wikipedia : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-80B0-0
%R 10.1145/3041021.3054732
%D 2017
%B 26th International Conference on World Wide Web 
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17 Companion
%P 177 - 181
%I ACM
%@ 978-1-4503-4914-7

Thesis

D5IMPR-CS

N. Boldyrev

“Alignment of Multi-Cultural Knowledge Repositories,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.

BibTeX

@phdthesis{BOLDYREVPHD2017,
TITLE = {Alignment of Multi-Cultural Knowledge Repositories},
AUTHOR = {Boldyrev, Natalia},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-ds-269407},
DOI = {10.22028/D291-26940},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
ABSTRACT = {The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.},
}

Endnote

%0 Thesis
%A Boldyrev, Natalia
%Y Weikum, Gerhard
%A referee: Berberich, Klaus
%A referee: Spaniol, Marc
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Alignment of Multi-Cultural Knowledge Repositories :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-87D8-2
%R 10.22028/D291-26940
%U urn:nbn:de:bsz:291-scidok-ds-269407
%F OTHER: hdl:20.500.11880/26891
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%8 06.12.2017
%P X, 124 p.
%V phd
%9 phd
%X The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26891

Article

M. Boley, B. R. Goldsmith, L. M. Ghiringhelli, and J. Vreeken

“Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery,” Data Mining and Knowledge Discovery, vol. 31, no. 5, 2017.

mehr

BibTeX

@article{Boley2017,
TITLE = {Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery},
AUTHOR = {Boley, Mario and Goldsmith, Bryan R. and Ghiringhelli, Luca M. and Vreeken, Jilles},
LANGUAGE = {eng},
DOI = {10.1007/s10618-017-0520-3},
PUBLISHER = {Springer},
ADDRESS = {London},
YEAR = {2017},
JOURNAL = {Data Mining and Knowledge Discovery},
VOLUME = {31},
NUMBER = {5},
PAGES = {1391--1418},
}

Endnote

%0 Journal Article
%A Boley, Mario
%A Goldsmith, Bryan R.
%A Ghiringhelli, Luca M.
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Identifying Consistent Statements about Numerical Data with
  Dispersion-Corrected Subgroup Discovery : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90E1-0
%R 10.1007/s10618-017-0520-3
%7 2017-06-28
%D 2017
%8 28.06.2017
%J Data Mining and Knowledge Discovery
%V 31
%N 5
%& 1391
%P 1391 - 1418
%I Springer
%C London

Paper

M. Boley, B. R. Goldsmith, L. M. Ghiringhelli, and J. Vreeken

“Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery,” 2017. [Online]. Available: http://arxiv.org/abs/1701.07696.

mehr

Abstract

Existing algorithms for subgroup discovery with numerical targets do not

optimize the error or target variable dispersion of the groups they find. This

often leads to unreliable or inconsistent statements about the data, rendering

practical applications, especially in scientific domains, futile. Therefore, we

here extend the optimistic estimator framework for optimal subgroup discovery

to a new class of objective functions: we show how tight estimators can be

computed efficiently for all functions that are determined by subgroup size

(non-decreasing dependence), the subgroup median value, and a dispersion

measure around the median (non-increasing dependence). In the important special

case when dispersion is measured using the average absolute deviation from the

median, this novel approach yields a linear time algorithm. Empirical

evaluation on a wide range of datasets shows that, when used within

branch-and-bound search, this approach is highly efficient and indeed discovers

subgroups with much smaller errors.

BibTeX

@online{DBLP:journals/corr/BoleyGGV17,
TITLE = {Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery},
AUTHOR = {Boley, Mario and Goldsmith, Bryan R. and Ghiringhelli, Luca M. and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1701.07696},
EPRINT = {1701.07696},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.},
}

Endnote

%0 Report
%A Boley, Mario
%A Goldsmith, Bryan R.
%A Ghiringhelli, Luca M.
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Identifying Consistent Statements about Numerical Data with
  Dispersion-Corrected Subgroup Discovery : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90DB-F
%U http://arxiv.org/abs/1701.07696
%D 2017
%X   Existing algorithms for subgroup discovery with numerical targets do not
optimize the error or target variable dispersion of the groups they find. This
often leads to unreliable or inconsistent statements about the data, rendering
practical applications, especially in scientific domains, futile. Therefore, we
here extend the optimistic estimator framework for optimal subgroup discovery
to a new class of objective functions: we show how tight estimators can be
computed efficiently for all functions that are determined by subgroup size
(non-decreasing dependence), the subgroup median value, and a dispersion
measure around the median (non-increasing dependence). In the important special
case when dispersion is measured using the average absolute deviation from the
median, this novel approach yields a linear time algorithm. Empirical
evaluation on a wide range of datasets shows that, when used within
branch-and-bound search, this approach is highly efficient and indeed discovers
subgroups with much smaller errors.

%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB

Conference paper

K. Budhathoki and J. Vreeken

“MDL for Causal Inference on Discrete Data,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.

mehr

BibTeX

@inproceedings{BudhathokiICDM2017,
TITLE = {{MDL} for Causal Inference on Discrete Data},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-5386-3835-4},
DOI = {10.1109/ICDM.2017.87},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)},
PAGES = {751--756},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T MDL for Causal Inference on Discrete Data : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-6458-D
%R 10.1109/ICDM.2017.87
%D 2017
%B 17th IEEE International Conference on Data Mining
%Z date of event: 2017-11-18 - 2017-11-21
%C New Orleans, LA, USA
%B 17th IEEE International Conference on Data Mining 
%P 751 - 756
%I IEEE
%@ 978-1-5386-3835-4

Conference paper

K. Budhathoki and J. Vreeken

“Causal Inference by Compression,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.

mehr

BibTeX

@inproceedings{budhathoki:16:origo,
TITLE = {Causal Inference by Compression},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-5090-5473-2},
DOI = {10.1109/ICDM.2016.0015},
PUBLISHER = {IEEE},
YEAR = {2016},
BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)},
EDITOR = {Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong},
PAGES = {41--50},
ADDRESS = {Barcelona, Spain},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Inference by Compression : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1CC0-6
%R 10.1109/ICDM.2016.0015
%D 2017
%8 02.02.2017
%B 16th International Conference on Data Mining
%Z date of event: 2016-12-12 - 2016-12-15
%C Barcelona, Spain
%B 16th IEEE International Conference on Data Mining 
%E Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong
%P 41 - 50
%I IEEE
%@ 978-1-5090-5473-2

Conference paper

K. Budhathoki and J. Vreeken

“Correlation by Compression,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.

mehr

BibTeX

@inproceedings{budhathoki:17:cbc,
TITLE = {Correlation by Compression},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-611974-87-4},
DOI = {10.1137/1.9781611974973.59},
PUBLISHER = {SIAM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)},
EDITOR = {Chawla, Nitesh},
PAGES = {525--533},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Correlation by Compression : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4BD8-6
%R 10.1137/1.9781611974973.59
%D 2017
%B 17th SIAM International Conference on Data Mining
%Z date of event: 2017-04-27 - 2017-04-29
%C Houston, TX, USA
%B Proceedings of the Seventeenth SIAM International Conference on Data Mining
%E Chawla, Nitesh; Wang, Wei
%P 525 - 533
%I SIAM
%@ 978-1-611974-87-4

Paper

K. Budhathoki and J. Vreeken

“Causal Inference by Stochastic Complexity,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06776.

mehr

Abstract

The algorithmic Markov condition states that the most likely causal direction

between two random variables X and Y can be identified as that direction with

the lowest Kolmogorov complexity. Due to the halting problem, however, this

notion is not computable.

We hence propose to do causal inference by stochastic complexity. That is, we

propose to approximate Kolmogorov complexity via the Minimum Description Length

(MDL) principle, using a score that is mini-max optimal with regard to the

model class under consideration. This means that even in an adversarial

setting, such as when the true distribution is not in this class, we still

obtain the optimal encoding for the data relative to the class.

We instantiate this framework, which we call CISC, for pairs of univariate

discrete variables, using the class of multinomial distributions. Experiments

show that CISC is highly accurate on synthetic, benchmark, as well as

real-world data, outperforming the state of the art by a margin, and scales

extremely well with regard to sample and domain sizes.

BibTeX

@online{DBLP:journals/corr/BudhathokiV17,
TITLE = {Causal Inference by Stochastic Complexity},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1702.06776},
EPRINT = {1702.06776},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.},
}

Endnote

%0 Report
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Inference by Stochastic Complexity : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90F2-A
%U http://arxiv.org/abs/1702.06776
%D 2017
%X   The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
  We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
  We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes.

%K Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI

Conference paper

A. Chakraborty, A. Hannak, A. J. Biega, and K. Gummadi

“Fair Sharing for Sharing Economy Platforms,” in FATREC-Workshop on Responsible Recommendation, Como, Itlay, 2017.

mehr

BibTeX

@inproceedings{Chakraborty_FATREC2017,
TITLE = {Fair Sharing for Sharing Economy Platforms},
AUTHOR = {Chakraborty, Abhijnan and Hannak, Aniko and Biega, Asia J. and Gummadi, Krishna},
LANGUAGE = {eng},
DOI = {10.18122/B2BX2S},
YEAR = {2017},
BOOKTITLE = {FATREC-Workshop on Responsible Recommendation},
ADDRESS = {Como, Itlay},
}

Endnote

%0 Conference Proceedings
%A Chakraborty, Abhijnan
%A Hannak, Aniko
%A Biega, Asia J.
%A Gummadi, Krishna
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Fair Sharing for Sharing Economy Platforms
 : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-57E1-E
%R 10.18122/B2BX2S
%D 2017
%B Fairness, Accountability and Transparency in Recommender Systems - Workshop on Responsible Recommendation
%Z date of event: 2017-08-31 - 2017-08-31
%C Como, Itlay
%B FATREC-Workshop on Responsible Recommendation

Conference paper

C. X. Chu, N. Tandon, and G. Weikum

“Distilling Task Knowledge from How-To Communities,” in WWW’17, 26th International Conference on World Wide Web, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{Cuong:WWW2017,
TITLE = {Distilling Task Knowledge from How-To Communities},
AUTHOR = {Chu, Cuong Xuan and Tandon, Niket and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4913-0},
DOI = {10.1145/3038912.3052715},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17, 26th International Conference on World Wide Web},
PAGES = {805--814},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Chu, Cuong Xuan
%A Tandon, Niket
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Distilling Task Knowledge from How-To Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-54BE-E
%R 10.1145/3038912.3052715
%D 2017
%B 26th International Conference on World Wide Web 
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17
%P 805 - 814
%I ACM
%@ 978-1-4503-4913-0

Article

A. Cohan, S. Young, A. Yates, and N. Goharian

“Triaging Content Severity in Online Mental Health Forums,” Journal of the Association for Information Science and Technology, vol. 68, no. 11, 2017.

mehr

BibTeX

@article{Cohan2017,
TITLE = {Triaging Content Severity in Online Mental Health Forums},
AUTHOR = {Cohan, Arman and Young, Sydney and Yates, Andrew and Goharian, Nazli},
LANGUAGE = {eng},
ISSN = {2330-1635},
DOI = {10.1002/asi.23865},
PUBLISHER = {Wiley},
ADDRESS = {Chichester, UK},
YEAR = {2017},
JOURNAL = {Journal of the Association for Information Science and Technology},
VOLUME = {68},
NUMBER = {11},
PAGES = {2675--2689},
}

Endnote

%0 Journal Article
%A Cohan, Arman
%A Young, Sydney
%A Yates, Andrew
%A Goharian, Nazli
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Triaging Content Severity in Online Mental Health Forums : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-06B9-8
%R 10.1002/asi.23865
%7 2017-09-25
%D 2017
%8 25.09.2017
%J Journal of the Association for Information Science and Technology
%O asis&t
%V 68
%N 11
%& 2675
%P 2675 - 2689
%I Wiley
%C Chichester, UK
%@ false

Paper

A. Cohan, S. Young, A. Yates, and N. Goharian

“Triaging Content Severity in Online Mental Health Forums,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06875.

mehr

Abstract

Mental health forums are online communities where people express their issues

and seek help from moderators and other users. In such forums, there are often

posts with severe content indicating that the user is in acute distress and

there is a risk of attempted self-harm. Moderators need to respond to these

severe posts in a timely manner to prevent potential self-harm. However, the

large volume of daily posted content makes it difficult for the moderators to

locate and respond to these critical posts. We present a framework for triaging

user content into four severity categories which are defined based on

indications of self-harm ideation. Our models are based on a feature-rich

classification framework which includes lexical, psycholinguistic, contextual

and topic modeling features. Our approaches improve the state of the art in

triaging the content severity in mental health forums by large margins (up to

17% improvement over the F-1 scores). Using the proposed model, we analyze the

mental state of users and we show that overall, long-term users of the forum

demonstrate a decreased severity of risk over time. Our analysis on the

interaction of the moderators with the users further indicates that without an

automatic way to identify critical content, it is indeed challenging for the

moderators to provide timely response to the users in need.

BibTeX

@online{Cohan_arXiv2017,
TITLE = {Triaging Content Severity in Online Mental Health Forums},
AUTHOR = {Cohan, Arman and Young, Sydney and Yates, Andrew and Goharian, Nazli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1702.06875},
EPRINT = {1702.06875},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We present a framework for triaging user content into four severity categories which are defined based on indications of self-harm ideation. Our models are based on a feature-rich classification framework which includes lexical, psycholinguistic, contextual and topic modeling features. Our approaches improve the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Using the proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate a decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.},
}

Endnote

%0 Report
%A Cohan, Arman
%A Young, Sydney
%A Yates, Andrew
%A Goharian, Nazli
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Triaging Content Severity in Online Mental Health Forums : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-06AF-F
%U http://arxiv.org/abs/1702.06875
%D 2017
%X   Mental health forums are online communities where people express their issues
and seek help from moderators and other users. In such forums, there are often
posts with severe content indicating that the user is in acute distress and
there is a risk of attempted self-harm. Moderators need to respond to these
severe posts in a timely manner to prevent potential self-harm. However, the
large volume of daily posted content makes it difficult for the moderators to
locate and respond to these critical posts. We present a framework for triaging
user content into four severity categories which are defined based on
indications of self-harm ideation. Our models are based on a feature-rich
classification framework which includes lexical, psycholinguistic, contextual
and topic modeling features. Our approaches improve the state of the art in
triaging the content severity in mental health forums by large margins (up to
17% improvement over the F-1 scores). Using the proposed model, we analyze the
mental state of users and we show that overall, long-term users of the forum
demonstrate a decreased severity of risk over time. Our analysis on the
interaction of the moderators with the users further indicates that without an
automatic way to identify critical content, it is indeed challenging for the
moderators to provide timely response to the users in need.

%K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI

Conference paper

C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel

“SPATE: Compacting and Exploring Telco Big Data,” in ICDE 2017, 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 2017.

mehr

BibTeX

@inproceedings{icde17-spate-demo,
TITLE = {{SPATE}: Compacting and Exploring Telco Big Data},
AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.},
LANGUAGE = {eng},
ISBN = {978-1-5090-6544-8},
DOI = {10.1109/ICDE.2017.203},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {ICDE 2017, 33rd IEEE International Conference on Data Engineering},
PAGES = {1419--1420},
ADDRESS = {San Diego, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Costa, Constantinos
%A Chatzimilioudis, Georgios
%A Zeinalipour-Yazti, Demetrios
%A Mokbel, Mohamed F.
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T SPATE: Compacting and Exploring Telco Big Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-62BA-5
%R 10.1109/ICDE.2017.203
%D 2017
%B 33rd IEEE International Conference on Data Engineering
%Z date of event: 2017-04-19 - 2017-04-22
%C San Diego, CA, USA
%B ICDE 2017
%P 1419 - 1420
%I IEEE
%@ 978-1-5090-6544-8

Conference paper

C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel

“Towards Real-Time Road Traffic Analytics using Telco Big Data,” in BIRTE ’17, Eleventh International Workshop on Real-Time Business Intelligence and Analytics, Munich, Germany, 2017.

mehr

BibTeX

@inproceedings{birte17traffictbd,
TITLE = {Towards Real-Time Road Traffic Analytics using {Telco Big Data}},
AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.},
LANGUAGE = {eng},
ISBN = {978-1-4503-5425-7},
DOI = {10.1145/3129292.3129296},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {BIRTE '17, Eleventh International Workshop on Real-Time Business Intelligence and Analytics},
EDITOR = {Chatziantoniou, Damianos and Castellanos, Malu and Chrysanthis, Panos K.},
EID = {5},
ADDRESS = {Munich, Germany},
}

Endnote

%0 Conference Proceedings
%A Costa, Constantinos
%A Chatzimilioudis, Georgios
%A Zeinalipour-Yazti, Demetrios
%A Mokbel, Mohamed F.
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Towards Real-Time Road Traffic Analytics using Telco Big Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-DDB7-A
%R 10.1145/3129292.3129296
%D 2017
%B Eleventh International Workshop on Real-Time Business Intelligence and Analytics 
%Z date of event: 2017-08-28 - 2017-08-28
%C Munich, Germany
%B BIRTE '17
%E Chatziantoniou, Damianos; Castellanos, Malu; Chrysanthis, Panos K.
%Z sequence number: 5
%I ACM
%@ 978-1-4503-5425-7

Conference paper

C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel

“Efficient Exploration of Telco Big Data with Compression and Decaying,” in ICDE 2017, 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 2017.

mehr

BibTeX

@inproceedings{icde17-spate,
TITLE = {Efficient Exploration of Telco Big Data with Compression and Decaying},
AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.},
LANGUAGE = {eng},
ISBN = {978-1-5090-6544-8},
DOI = {10.1109/ICDE.2017.175},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {ICDE 2017, 33rd IEEE International Conference on Data Engineering},
PAGES = {1332--1343},
ADDRESS = {San Diego, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Costa, Constantinos
%A Chatzimilioudis, Georgios
%A Zeinalipour-Yazti, Demetrios
%A Mokbel, Mohamed F.
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Efficient Exploration of Telco Big Data with Compression and Decaying : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-62B3-4
%R 10.1109/ICDE.2017.175
%D 2017
%B 33rd IEEE International Conference on Data Engineering
%Z date of event: 2017-04-19 - 2017-04-22
%C San Diego, CA, USA
%B ICDE 2017
%P 1332 - 1343
%I IEEE
%@ 978-1-5090-6544-8

Thesis

S. Das, K. Berberich, D. Klakow, A. Mishra, and V. Setty

“Estimating Event Focus Time with Distributed Representation of Words,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

Time is an important dimension as it aids in disambiguating and understanding news-

worthy events that happened in the past. It helps in chronological ordering of events to

understand its causality, evolution, and ramifications. In Information Retrieval, time

alongside text is known to improve the quality of search results. So, making use of

the temporal dimensionality in the text-based analysis is an interesting idea to explore.

Considering the importance of time, methods to automatically resolve temporal foci’s

of events are essential. In this thesis, we try to solve this research question by training

our models on two different kinds of corpora and then evaluate on a set of historical

event-queries.

BibTeX

@mastersthesis{dasmaster17,
TITLE = {Estimating Event Focus Time with Distributed Representation of Words},
AUTHOR = {Das, Supratim and Berberich, Klaus and Klakow, Dietrich and Mishra, Arunav and Setty, Vinay},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {Time is an important dimension as it aids in disambiguating and understanding news- worthy events that happened in the past. It helps in chronological ordering of events to understand its causality, evolution, and ramifications. In Information Retrieval, time alongside text is known to improve the quality of search results. So, making use of the temporal dimensionality in the text-based analysis is an interesting idea to explore. Considering the importance of time, methods to automatically resolve temporal foci{\textquoteright}s of events are essential. In this thesis, we try to solve this research question by training our models on two different kinds of corpora and then evaluate on a set of historical event-queries.},
}

Endnote

%0 Thesis
%A Das, Supratim
%A Berberich, Klaus
%A Klakow, Dietrich
%A Mishra, Arunav
%A Setty, Vinay
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Estimating Event Focus Time with Distributed Representation of Words : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-DFF1-7
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P 83 p.
%V master
%9 master
%X Time is an important dimension as it aids in disambiguating and understanding news-
worthy events that happened in the past. It helps in chronological ordering of events to
understand its causality, evolution, and ramifications. In Information Retrieval, time
alongside text is known to improve the quality of search results. So, making use of
the temporal dimensionality in the text-based analysis is an interesting idea to explore.
Considering the importance of time, methods to automatically resolve temporal foci&#8217;s
of events are essential. In this thesis, we try to solve this research question by training
our models on two different kinds of corpora and then evaluate on a set of historical
event-queries.

Conference paper

S. Das, A. Mishra, K. Berberich, and V. Setty

“Estimating Event Focus Time Using Neural Word Embeddings,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.

mehr

BibTeX

@inproceedings{Das_CIKM2017,
TITLE = {Estimating Event Focus Time Using Neural Word Embeddings},
AUTHOR = {Das, Supratim and Mishra, Arunav and Berberich, Klaus and Setty, Vinay},
LANGUAGE = {eng},
ISBN = {978-1-4503-4918-5},
DOI = {10.1145/3132847.3133131},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management},
PAGES = {2039--2042},
ADDRESS = {Singapore, Singapore},
}

Endnote

%0 Conference Proceedings
%A Das, Supratim
%A Mishra, Arunav
%A Berberich, Klaus
%A Setty, Vinay
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Estimating Event Focus Time Using Neural Word Embeddings : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-635B-B
%R 10.1145/3132847.3133131
%D 2017
%B 26th ACM International Conference on Information and Knowledge Management 
%Z date of event: 2017-11-06 - 2017-11-10
%C Singapore, Singapore
%B CIKM'17
%P 2039 - 2042
%I ACM
%@ 978-1-4503-4918-5

Thesis

D5IMPR-CS

S. Dutta

“Efficient knowledge Management for Named Entities from Text,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations.

A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge
acquisition from texts and its representation in knowledge repositories.

This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented.

BibTeX

@phdthesis{duttaphd17,
TITLE = {Efficient knowledge Management for Named Entities from Text},
AUTHOR = {Dutta, Sourav},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-67924},
DOI = {10.22028/D291-26701},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations.<br><br>A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge<br>acquisition from texts and its representation in knowledge repositories.<br><br>This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented.},
}

Endnote

%0 Thesis
%A Dutta, Sourav
%Y Weikum, Gerhard
%A referee: Nejdl, Wolfgang
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficient knowledge Management for Named Entities from Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-A793-E
%U urn:nbn:de:bsz:291-scidok-67924
%R 10.22028/D291-26701
%F OTHER: hdl:20.500.11880/26757
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P xv, 134 p.
%V phd
%9 phd
%X The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations.<br><br>A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge<br>acquisition from texts and its representation in knowledge repositories.<br><br>This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented.
%U http://scidok.sulb.uni-saarland.de/volltexte/2017/6792/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

P. Ernst, A. Mishra, A. Anand, and V. Setty

“BioNex: A System For Biomedical News Event Exploration,” in SIGIR’17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 2017.

mehr

BibTeX

@inproceedings{Ernst_SIGIR2017,
TITLE = {{BioNex}: {A} System For Biomedical News Event Exploration},
AUTHOR = {Ernst, Patrick and Mishra, Arunav and Anand, Avishek and Setty, Vinay},
LANGUAGE = {eng},
ISBN = {978-1-4503-5022-8},
DOI = {10.1145/3077136.3084150},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {SIGIR'17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {1277--1280},
ADDRESS = {Shinjuku, Tokyo, Japan},
}

Endnote

%0 Conference Proceedings
%A Ernst, Patrick
%A Mishra, Arunav
%A Anand, Avishek
%A Setty, Vinay
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T BioNex: A System For Biomedical News Event Exploration : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A2D1-A
%R 10.1145/3077136.3084150
%D 2017
%B 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2017-08-07 - 2017-08-11
%C Shinjuku, Tokyo, Japan
%B SIGIR'17
%P 1277 - 1280
%I ACM
%@ 978-1-4503-5022-8

Thesis

S. Eslami

“Utility-preserving Profile Removal in Online Forums,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

BibTeX

@mastersthesis{EslamiMSc2017,
TITLE = {Utility-preserving Profile Removal in Online Forums},
AUTHOR = {Eslami, Sedigheh},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
}

Endnote

%0 Thesis
%A Eslami, Sedigheh
%Y Weikum, Gerhard
%A referee: Saha Roy, Rishiraj
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Utility-preserving Profile Removal in Online Forums : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-9236-4
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P XII, 66 p.
%V master
%9 master

Conference paper

S. Eslami, A. J. Biega, R. Saha Roy, and G. Weikum

“Privacy of Hidden Profiles: Utility-Preserving Profile Removal in Online Forums,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.

mehr

BibTeX

@inproceedings{Eslami_CIKM2017,
TITLE = {Privacy of Hidden Profiles: {U}tility-Preserving Profile Removal in Online Forums},
AUTHOR = {Eslami, Sedigheh and Biega, Asia J. and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4918-5},
DOI = {10.1145/3132847.3133140},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management},
PAGES = {2063--2066},
ADDRESS = {Singapore, Singapore},
}

Endnote

%0 Conference Proceedings
%A Eslami, Sedigheh
%A Biega, Asia J.
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Privacy of Hidden Profiles: Utility-Preserving Profile Removal in Online Forums : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-3BA2-7
%R 10.1145/3132847.3133140
%D 2017
%B 26th ACM International Conference on Information and Knowledge Management 
%Z date of event: 2017-11-06 - 2017-11-10
%C Singapore, Singapore
%B CIKM'17
%P 2063 - 2066
%I ACM
%@ 978-1-4503-4918-5

Book

E. Galbrun and P. Miettinen

Redescription Mining. Cham: Springer International, 2017.

mehr

BibTeX

@book{galbrun18redescription,
TITLE = {Redescription Mining},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-3-319-72889-6},
DOI = {10.1007/978-3-319-72889-6},
PUBLISHER = {Springer International},
ADDRESS = {Cham},
YEAR = {2017},
DATE = {2017},
PAGES = {XI, 80 p.},
}

Endnote

%0 Book
%A Galbrun, Esther
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Redescription Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-90D3-1
%R 10.1007/978-3-319-72889-6
%@ 978-3-319-72889-6
%I Springer International
%C Cham
%D 2017
%P XI, 80 p.

Article

E. Galbrun and P. Miettinen

“Redescription Mining: An Overview,” IEEE Intelligent Informatics Bulletin, vol. 18, no. 2, 2017.

mehr

BibTeX

@article{Galbrun_2017c,
TITLE = {Redescription Mining: An Overview},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISSN = {1727-5997},
PUBLISHER = {IEEE Computer Society},
YEAR = {2017},
DATE = {2017},
JOURNAL = {IEEE Intelligent Informatics Bulletin},
VOLUME = {18},
NUMBER = {2},
PAGES = {7--12},
EID = {2},
}

Endnote

%0 Journal Article
%A Galbrun, Esther
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Redescription Mining: An Overview : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-5E2B-6
%7 2017
%D 2017
%J IEEE Intelligent Informatics Bulletin
%V 18
%N 2
%& 7
%P 7 - 12
%Z sequence number: 2
%I IEEE Computer Society
%@ false
%U http://www.comp.hkbu.edu.hk/~iib/2017/Dec/article2/iib_vol18no2_article2.pdf

Conference paper

E. Galbrun and P. Miettinen

“Analysing Political Opinions Using Redescription Mining,” in 16th IEEE International Conference on Data Mining Workshops (ICDMW 2016), Barcelona, Spain, 2017.

mehr

BibTeX

@inproceedings{galbrun16analysing,
TITLE = {Analysing Political Opinions Using Redescription Mining},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-5090-5910-2},
DOI = {10.1109/ICDMW.2016.121},
PUBLISHER = {IEEE},
YEAR = {2015},
BOOKTITLE = {16th IEEE International Conference on Data Mining Workshops (ICDMW 2016)},
EDITOR = {Domeniconi, Carlotta and Gullo, Francesco and Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong},
PAGES = {422--427},
ADDRESS = {Barcelona, Spain},
}

Endnote

%0 Conference Proceedings
%A Galbrun, Esther
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Analysing Political Opinions Using Redescription Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2247-5
%R 10.1109/ICDMW.2016.121
%D 2017
%8 02.02.2017
%B 16th International Conference on Data Mining
%Z date of event: 2015-12-12 - 2015-12-15
%C Barcelona, Spain
%B 16th IEEE International Conference on Data Mining Workshops 
%E Domeniconi, Carlotta; Gullo, Francesco; Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong
%P 422 - 427
%I IEEE
%@  978-1-5090-5910-2

Conference paper

K. Gashteovski, R. Gemulla, and L. Del Corro

“MinIE: Minimizing Facts in Open Information Extraction,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.

mehr

BibTeX

@inproceedings{DBLP:conf/emnlp/GashteovskiGC17,
TITLE = {{MinIE}: {M}inimizing Facts in Open Information Extraction},
AUTHOR = {Gashteovski, Kiril and Gemulla, Rainer and Del Corro, Luciano},
LANGUAGE = {eng},
ISBN = {978-1-945626-83-8},
URL = {http://aclanthology.info/papers/D17-1277/d17-1277},
PUBLISHER = {ACL},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)},
PAGES = {2620--2630},
ADDRESS = {Copenhagen, Denmark},
}

Endnote

%0 Conference Proceedings
%A Gashteovski, Kiril
%A Gemulla, Rainer
%A Del Corro, Luciano
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T MinIE: Minimizing Facts in Open Information Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-30F4-2
%U http://aclanthology.info/papers/D17-1277/d17-1277
%D 2017
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2017-09-09 - 2017-09-11
%C Copenhagen, Denmark
%B The Conference on Empirical Methods in Natural Language Processing

%P 2620 - 2630
%I ACL
%@ 978-1-945626-83-8
%U http://www.aclweb.org/anthology/D17-1277

Conference paper

X. Ge, A. Daphalapurkar, M. Shmipi, K. Darpun, K. Pelechrinis, P. K. Chrysanthis, and D. Zeinalipour-Yazti

“Data-driven Serendipity Navigation in Urban Places,” in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017), Atlanta, GA, USA, 2017.

mehr

BibTeX

@inproceedings{icdcs17-serendipity-demo,
TITLE = {Data-driven Serendipity Navigation in Urban Places},
AUTHOR = {Ge, Xiaoyi and Daphalapurkar, Ameya and Shmipi, Manali and Darpun, Kohli and Pelechrinis, Konstantinos and Chrysanthis, Panos K. and Zeinalipour-Yazti, Demetrios},
LANGUAGE = {eng},
ISBN = {978-1-5386-1792-2},
DOI = {10.1109/ICDCS.2017.286},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017)},
EDITOR = {Lee, Kisung and Liu, Ling},
PAGES = {2501--2504},
ADDRESS = {Atlanta, GA, USA},
}

Endnote

%0 Conference Proceedings
%A Ge, Xiaoyi
%A Daphalapurkar, Ameya
%A Shmipi, Manali
%A Darpun, Kohli
%A Pelechrinis, Konstantinos
%A Chrysanthis, Panos K.
%A Zeinalipour-Yazti, Demetrios
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Data-driven Serendipity Navigation in Urban Places : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-082B-7
%R 10.1109/ICDCS.2017.286
%D 2017
%B 37th IEEE International Conference on Distributed Computing Systems
%Z date of event: 2017-06-05 - 2017-06-08
%C Atlanta, GA, USA
%B IEEE 37th International Conference on Distributed Computing Systems
%E Lee, Kisung; Liu, Ling
%P 2501 - 2504
%I IEEE
%@ 978-1-5386-1792-2

Article

B. Goldsmith, M. Boley, J. Vreeken, M. Scheffler, and L. Ghiringhelli,

“Uncovering Structure-property Relationships of Materials by Subgroup Discovery,” New Journal of Physics, vol. 19, no. 1, 2017.

mehr

BibTeX

@article{goldsmith:17:gold,
TITLE = {Uncovering Structure-property Relationships of Materials by Subgroup Discovery},
AUTHOR = {Goldsmith, Brian and Boley, Mario and Vreeken, Jilles and Scheffler, Matthias and Ghiringhelli,, Luca},
LANGUAGE = {eng},
ISSN = {1367-2630},
DOI = {10.1088/1367-2630/aa57c2},
PUBLISHER = {IOP Publishing},
ADDRESS = {Bristol},
YEAR = {2017},
JOURNAL = {New Journal of Physics},
VOLUME = {19},
NUMBER = {1},
EID = {013031},
}

Endnote

%0 Journal Article
%A Goldsmith, Brian
%A Boley, Mario
%A Vreeken, Jilles
%A Scheffler, Matthias
%A Ghiringhelli,, Luca
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Uncovering Structure-property Relationships of Materials by Subgroup Discovery : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4BF5-4
%R 10.1088/1367-2630/aa57c2
%7 2017
%D 2017
%J New Journal of Physics
%O New J. Phys.
%V 19
%N 1
%Z sequence number: 013031
%I IOP Publishing
%C Bristol
%@ false
%U http://iopscience.iop.org/article/10.1088/1367-2630/aa57c2

Thesis

D5IMPR-CS

A. Grycner

“Constructing Lexicons of Relational Phrases,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus.

BibTeX

@phdthesis{Grynerphd17,
TITLE = {Constructing Lexicons of Relational Phrases},
AUTHOR = {Grycner, Adam},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-69101},
DOI = {10.22028/D291-26776},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus.},
}

Endnote

%0 Thesis
%A Grycner, Adam
%Y Weikum, Gerhard
%A referee: Klakow, Dietrich
%A referee: Ponzetto, Simone Paolo
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Constructing Lexicons of Relational Phrases :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-933B-1
%U urn:nbn:de:bsz:291-scidok-69101
%R 10.22028/D291-26776
%F OTHER: hdl:20.500.11880/26789
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P 125 p.
%V phd
%9 phd
%X Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus.
%U http://scidok.sulb.uni-saarland.de/volltexte/2017/6910/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

A. Guimarães, L. Wang, and G. Weikum

“Us and Them: Adversarial Politics on Twitter,” in 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017 ), New Orleans, LA, USA, 2017.

mehr

BibTeX

@inproceedings{Guimaraes_ICDMW2017,
TITLE = {Us and Them: {A}dversarial Politics on {Twitter}},
AUTHOR = {Guimar{\~a}es, Anna and Wang, Liqiang and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-5386-1480-8},
DOI = {10.1109/ICDMW.2017.119},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {17th IEEE International Conference on Data Mining Workshops (ICDMW 2017 )},
EDITOR = {Gottumukkala, Raju and Ning, Xia and Dong, Guozhu and Raghavan, Vijav and Aluru, Srinivas and Karypis, George and Miele, Lucio and Wu, Xindong},
PAGES = {872--877},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A Guimar&#227;es, Anna
%A Wang, Liqiang
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Us and Them: Adversarial Politics on Twitter : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-3B89-4
%R 10.1109/ICDMW.2017.119
%D 2017
%B 17th International Conference on Data Mining
%Z date of event: 2017-11-18 - 2017-11-21
%C New Orleans, LA, USA
%B 17th IEEE International Conference on Data Mining Workshops 
%E Gottumukkala, Raju; Ning, Xia; Dong, Guozhu; Raghavan, Vijav; Aluru, Srinivas; Karypis, George; Miele, Lucio; Wu, Xindong
%P 872 - 877
%I IEEE
%@ 978-1-5386-1480-8

Report

D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti

“Generating Semantic Aspects for Queries,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2017-5-001, 2017.

mehr

Abstract

Ambiguous information needs expressed in a limited number of keywords
often result in long-winded query sessions and many query reformulations.
In this work, we tackle ambiguous queries by providing automatically gen-
erated semantic aspects that can guide users to satisfying results regarding
their information needs. To generate semantic aspects, we use semantic an-
notations available in the documents and leverage models representing the
semantic relationships between annotations of the same type. The aspects in
turn provide us a foundation for representing text in a completely structured
manner, thereby allowing for a semantically-motivated organization of search
results. We evaluate our approach on a testbed of over 5,000 aspects on Web
scale document collections amounting to more than 450 million documents,
with temporal, geographic, and named entity annotations as example dimen-
sions. Our experimental results show that our general approach is Web-scale
ready and finds relevant aspects for highly ambiguous queries.

BibTeX

@techreport{Guptareport2007,
TITLE = {Generating Semantic Aspects for Queries},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2017-5-001},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
ABSTRACT = {Ambiguous information needs expressed in a limited number of keywords<br>often result in long-winded query sessions and many query reformulations.<br>In this work, we tackle ambiguous queries by providing automatically gen-<br>erated semantic aspects that can guide users to satisfying results regarding<br>their information needs. To generate semantic aspects, we use semantic an-<br>notations available in the documents and leverage models representing the<br>semantic relationships between annotations of the same type. The aspects in<br>turn provide us a foundation for representing text in a completely structured<br>manner, thereby allowing for a semantically-motivated organization of search<br>results. We evaluate our approach on a testbed of over 5,000 aspects on Web<br>scale document collections amounting to more than 450 million documents,<br>with temporal, geographic, and named entity annotations as example dimen-<br>sions. Our experimental results show that our general approach is Web-scale<br>ready and finds relevant aspects for highly ambiguous queries.},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Gupta, Dhruv
%A Berberich, Klaus
%A Str&#246;tgen, Jannik
%A Zeinalipour-Yazti, Demetrios
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Generating Semantic Aspects for Queries : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-07DD-0
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2017
%P 39 p.
%X Ambiguous information needs expressed in a limited number of keywords<br>often result in long-winded query sessions and many query reformulations.<br>In this work, we tackle ambiguous queries by providing automatically gen-<br>erated semantic aspects that can guide users to satisfying results regarding<br>their information needs. To generate semantic aspects, we use semantic an-<br>notations available in the documents and leverage models representing the<br>semantic relationships between annotations of the same type. The aspects in<br>turn provide us a foundation for representing text in a completely structured<br>manner, thereby allowing for a semantically-motivated organization of search<br>results. We evaluate our approach on a testbed of over 5,000 aspects on Web<br>scale document collections amounting to more than 450 million documents,<br>with temporal, geographic, and named entity annotations as example dimen-<br>sions. Our experimental results show that our general approach is Web-scale<br>ready and finds relevant aspects for highly ambiguous queries.
%B Research Report
%@ false

Thesis

D5IMPR-CS

S. Gurajada

“Distributed Querying of Large Labeled Graphs,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the “Labeled Graph”, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner.
• Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices.
• Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner.
• Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner.
To this end, we propose a prototype distributed engine, coined “TriAD” (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets.

BibTeX

@phdthesis{guraphd2017,
TITLE = {Distributed Querying of Large Labeled Graphs},
AUTHOR = {Gurajada, Sairam},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-67738},
DOI = {10.22028/D291-26695},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the {\textquotedblleft}Labeled Graph{\textquotedblright}, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner.<br>\mbox{$\bullet$} Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices.<br>\mbox{$\bullet$} Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner.<br>\mbox{$\bullet$} Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner.<br>To this end, we propose a prototype distributed engine, coined {\textquotedblleft}TriAD{\textquotedblright} (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets.},
}

Endnote

%0 Thesis
%A Gurajada, Sairam
%Y Theobald, Martin
%A referee: Weikum, Gerhard
%A referee: &#214;zsu, M. Tamer
%A referee: Michel, Sebastian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Distributed Querying of Large Labeled Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-8202-E
%U urn:nbn:de:bsz:291-scidok-67738
%R 10.22028/D291-26695
%F OTHER: hdl:20.500.11880/26751
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P x, 167 p.
%V phd
%9 phd
%X Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the &#8220;Labeled Graph&#8221;, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner.<br>&#8226; Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices.<br>&#8226; Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner.<br>&#8226; Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner.<br>To this end, we propose a prototype distributed engine, coined &#8220;TriAD&#8221; (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets.
%U http://scidok.sulb.uni-saarland.de/volltexte/2017/6773/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

K. Hui, A. Yates, K. Berberich, and G. de Melo

“Position-Aware Representations for Relevance Matching in Neural Information Retrieval,” in WWW’17 Companion, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{HuiWWW2017,
TITLE = {Position-Aware Representations for Relevance Matching in Neural Information Retrieval},
AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4914-7},
DOI = {10.1145/3041021.3054258},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17 Companion},
PAGES = {799--800},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Yates, Andrew
%A Berberich, Klaus
%A de Melo, Gerard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Position-Aware Representations for Relevance Matching in Neural Information Retrieval : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90A4-B
%R 10.1145/3041021.3054258
%D 2017
%B 26th International Conference on World Wide Web
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17 Companion
%P 799 - 800
%I ACM
%@ 978-1-4503-4914-7

Conference paper

K. Hui and K. Berberich

“Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.

mehr

BibTeX

@inproceedings{hui2017full,
TITLE = {Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing},
AUTHOR = {Hui, Kai and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-56607-8},
DOI = {10.1007/978-3-319-56608-5_19},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2017},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)},
EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John},
PAGES = {239--251},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10193},
ADDRESS = {Aberdeen, UK},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1F75-5
%R 10.1007/978-3-319-56608-5_19
%D 2017
%B 39th European Conference on Information Retrieval
%Z date of event: 2016-04-09 - 2017-04-13
%C Aberdeen, UK
%B Advances in Information Retrieval
%E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John
%P 239 - 251
%I Springer
%@ 978-3-319-56607-8
%B Lecture Notes in Computer Science
%N 10193

Conference paper

K. Hui, A. Yates, K. Berberich, and G. de Melo

“PACRR: A Position-Aware Neural IR Model for Relevance Matching,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.

mehr

BibTeX

@inproceedings{HuiENMLP2017,
TITLE = {{PACRR}: A Position-Aware Neural {IR} Model for Relevance Matching},
AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-1-945626-83-8},
URL = {https://aclanthology.info/pdf/D/D17/D17-1111.pdf},
PUBLISHER = {ACL},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)},
PAGES = {1060--1069},
ADDRESS = {Copenhagen, Denmark},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Yates, Andrew
%A Berberich, Klaus
%A de Melo, Gerard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T PACRR: A Position-Aware Neural IR Model for Relevance Matching : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-063F-D
%U https://aclanthology.info/pdf/D/D17/D17-1111.pdf
%D 2017
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2017-09-09 - 2017-09-11
%C Copenhagen, Denmark
%B The Conference on Empirical Methods in Natural Language Processing

%P 1060 - 1069
%I ACL
%@ 978-1-945626-83-8
%U https://aclanthology.info/pdf/D/D17/D17-1111.pdf

Paper

K. Hui, A. Yates, K. Berberich, and G. de Melo

“PACRR: A Position-Aware Neural IR Model for Relevance Matching,” 2017. [Online]. Available: http://arxiv.org/abs/1704.03940.

mehr

Abstract

In order to adopt deep learning for information retrieval, models are needed

that can capture all relevant information required to assess the relevance of a

document to a given user query. While previous works have successfully captured

unigram term matches, how to fully employ position-dependent information such

as proximity and term dependencies has been insufficiently explored. In this

work, we propose a novel neural IR model named PACRR (Position-Aware

Convolutional-Recurrent Relevance), aiming at better modeling

position-dependent interactions between a query and a document via

convolutional layers as well as recurrent layers. Extensive experiments on six

years' TREC Web Track data confirm that the proposed model yields better

results under different benchmarks.

BibTeX

@online{DBLP:journals/corr/HuiYBM17,
TITLE = {{PACRR}: A Position-Aware Neural {IR} Model for Relevance Matching},
AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1704.03940},
EPRINT = {1704.03940},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query. While previous works have successfully captured unigram term matches, how to fully employ position-dependent information such as proximity and term dependencies has been insufficiently explored. In this work, we propose a novel neural IR model named PACRR (Position-Aware Convolutional-Recurrent Relevance), aiming at better modeling position-dependent interactions between a query and a document via convolutional layers as well as recurrent layers. Extensive experiments on six years' TREC Web Track data confirm that the proposed model yields better results under different benchmarks.},
}

Endnote

%0 Report
%A Hui, Kai
%A Yates, Andrew
%A Berberich, Klaus
%A de Melo, Gerard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T PACRR: A Position-Aware Neural IR Model for Relevance Matching : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90A8-3
%U http://arxiv.org/abs/1704.03940
%D 2017
%X   In order to adopt deep learning for information retrieval, models are needed
that can capture all relevant information required to assess the relevance of a
document to a given user query. While previous works have successfully captured
unigram term matches, how to fully employ position-dependent information such
as proximity and term dependencies has been insufficiently explored. In this
work, we propose a novel neural IR model named PACRR (Position-Aware
Convolutional-Recurrent Relevance), aiming at better modeling
position-dependent interactions between a query and a document via
convolutional layers as well as recurrent layers. Extensive experiments on six
years' TREC Web Track data confirm that the proposed model yields better
results under different benchmarks.

%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

K. Hui and K. Berberich

“Merge-Tie-Judge: Low-Cost Preference Judgments with Ties,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.

mehr

BibTeX

@inproceedings{HuiICTIR2017b,
TITLE = {{Merge-Tie-Judge}: Low-Cost Preference Judgments with Ties},
AUTHOR = {Hui, Kai and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-4490-6},
DOI = {10.1145/3121050.3121095},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval},
PAGES = {277--280},
ADDRESS = {Amsterdam, The Netherlands},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Merge-Tie-Judge: Low-Cost Preference Judgments with Ties : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-064B-2
%R 10.1145/3121050.3121095
%D 2017
%B 7th  International  Conference  on  the  Theory  of  Information 
Retrieval  
%Z date of event: 2017-10-01 - 2017-10-04
%C Amsterdam, The Netherlands
%B ICTIR'17
%P 277 - 280
%I ACM
%@ 978-1-4503-4490-6

Thesis

D5IMPR-CS

K. Hui

“Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieval,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.

BibTeX

@phdthesis{HUiphd2017,
TITLE = {Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieval},
AUTHOR = {Hui, Kai},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-ds-269423},
DOI = {10.22028/D291-26942},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
ABSTRACT = {An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.},
}

Endnote

%0 Thesis
%A Hui, Kai
%Y Berberich, Klaus
%A referee: Weikum, Gerhard
%A referee: Dietz, Laura
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieval : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-8921-E
%U urn:nbn:de:bsz:291-scidok-ds-269423
%R 10.22028/D291-26942
%F OTHER: hdl:20.500.11880/26894
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P xiv, 130 p.
%V phd
%9 phd
%X An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26894

Conference paper

K. Hui and K. Berberich

“Low-Cost Preference Judgment via Ties,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.

mehr

BibTeX

@inproceedings{hui2017short,
TITLE = {Low-Cost Preference Judgment via Ties},
AUTHOR = {Hui, Kai and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-56607-8},
DOI = {10.1007/978-3-319-56608-5_58},
PUBLISHER = {Springer},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)},
EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John},
PAGES = {626--632},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10193},
ADDRESS = {Aberdeen, UK},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Low-Cost Preference Judgment via Ties : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1F7B-A
%R 10.1007/978-3-319-56608-5_58
%D 2017
%B 39th European Conference on Information Retrieval
%Z date of event: 2017-04-09 - 2017-04-13
%C Aberdeen, UK
%B Advances in Information Retrieval
%E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John
%P 626 - 632
%I Springer
%@ 978-3-319-56607-8
%B Lecture Notes in Computer Science
%N 10193

Conference paper

K. Hui, K. Berberich, and I. Mele

“Dealing with Incomplete Judgments in Cascade Measures,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.

mehr

BibTeX

@inproceedings{HuiICTIR2017,
TITLE = {Dealing with Incomplete Judgments in Cascade Measures},
AUTHOR = {Hui, Kai and Berberich, Klaus and Mele, Ida},
LANGUAGE = {eng},
ISBN = {978-1-4503-4490-6},
DOI = {10.1145/3121050.3121064},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval},
PAGES = {83--90},
ADDRESS = {Amsterdam, The Netherlands},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Berberich, Klaus
%A Mele, Ida
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Dealing with Incomplete Judgments in Cascade Measures : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-0649-6
%R 10.1145/3121050.3121064
%D 2017
%B 7th  International  Conference  on  the  Theory  of  Information 
Retrieval  
%Z date of event: 2017-10-01 - 2017-10-04
%C Amsterdam, The Netherlands
%B ICTIR'17
%P 83 - 90
%I ACM
%@ 978-1-4503-4490-6

Paper

K. Hui, A. Yates, K. Berberich, and G. de Melo

“RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model,” 2017. [Online]. Available: http://arxiv.org/abs/1706.10192.

mehr

Abstract

Ad-hoc retrieval models can benefit from considering different patterns in

the interactions between a query and a document, effectively assessing the

relevance of a document for a given user query. Factors to be considered in

this interaction include (i) the matching of unigrams and ngrams, (ii) the

proximity of the matched query terms, (iii) their position in the document, and

(iv) how the different relevance signals are combined over different query

terms. While previous work has successfully modeled some of these factors, not

all aspects have been fully explored. In this work, we close this gap by

proposing different neural components and incorporating them into a single

architecture, leading to a novel neural IR model called RE-PACRR. Extensive

comparisons with established models on TREC Web Track data confirm that the

proposed model yields promising search results.

BibTeX

@online{HuiarXiv2017b,
TITLE = {{RE-PACRR}: {A} Context and Density-Aware Neural Information Retrieval Model},
AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1706.10192},
EPRINT = {1706.10192},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results.},
}

Endnote

%0 Report
%A Hui, Kai
%A Yates, Andrew
%A Berberich, Klaus
%A de Melo, Gerard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-064D-D
%U http://arxiv.org/abs/1706.10192
%D 2017
%X   Ad-hoc retrieval models can benefit from considering different patterns in
the interactions between a query and a document, effectively assessing the
relevance of a document for a given user query. Factors to be considered in
this interaction include (i) the matching of unigrams and ngrams, (ii) the
proximity of the matched query terms, (iii) their position in the document, and
(iv) how the different relevance signals are combined over different query
terms. While previous work has successfully modeled some of these factors, not
all aspects have been fully explored. In this work, we close this gap by
proposing different neural components and incorporating them into a single
architecture, leading to a novel neural IR model called RE-PACRR. Extensive
comparisons with established models on TREC Web Track data confirm that the
proposed model yields promising search results.

%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

R. Jäschke, J. Strötgen, E. Krotova, and F. Fischer

“„Der Helmut Kohl unter den Brotaufstrichen“ - Zur Extraktion vossianischer Antonomasien aus großen Zeitungskorpora,” in DHd 2017, 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V., Bern, Switzerland, 2017.

mehr

BibTeX

@inproceedings{JaeschkeEtAl2017_DHD,
TITLE = {{{``Der Helmut Kohl unter den Brotaufstrichen'' -- Zur Extraktion vossianischer Antonomasien aus gro{\ss}en Zeitungskorpora}}},
AUTHOR = {J{\"a}schke, Robert and Str{\"o}tgen, Jannik and Krotova, Elena and Fischer, Frank},
LANGUAGE = {deu},
YEAR = {2017},
BOOKTITLE = {DHd 2017, 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.},
PAGES = {120--124},
ADDRESS = {Bern, Switzerland},
}

Endnote

%0 Conference Proceedings
%A J&#228;schke, Robert
%A Str&#246;tgen, Jannik
%A Krotova, Elena
%A Fischer, Frank
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T &#8222;Der Helmut Kohl unter den Brotaufstrichen&#8220; - Zur Extraktion
vossianischer Antonomasien aus gro&#223;en Zeitungskorpora : 
%G deu
%U http://hdl.handle.net/11858/00-001M-0000-002C-4E05-A
%D 2017
%B 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.
%Z date of event: 2017-02-13 - 2017-02-18
%C Bern, Switzerland
%B DHd 2017
%P 120 - 124

Conference paper

H. Jhamtani, R. Saha Roy, N. Chhaya, and E. Nyberg

“Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.

mehr

BibTeX

@inproceedings{JhamtaniECIR2017,
TITLE = {Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages},
AUTHOR = {Jhamtani, Harsh and Saha Roy, Rishiraj and Chhaya, Niyati and Nyberg, Eric},
LANGUAGE = {eng},
ISBN = {978-3-319-56607-8},
DOI = {10.1007/978-3-319-56608-5_41},
PUBLISHER = {Springer},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)},
EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John},
PAGES = {506--512},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10193},
ADDRESS = {Aberdeen, UK},
}

Endnote

%0 Conference Proceedings
%A Jhamtani, Harsh
%A Saha Roy, Rishiraj
%A Chhaya, Niyati
%A Nyberg, Eric
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-DB33-0
%R 10.1007/978-3-319-56608-5_41
%D 2017
%B 39th European Conference on Information Retrieval
%Z date of event: 2017-04-09 - 2017-04-13
%C Aberdeen, UK
%B Advances in Information Retrieval
%E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John
%P 506 - 512
%I Springer
%@ 978-3-319-56607-8
%B Lecture Notes in Computer Science
%N 10193

Paper

J. Kalofolias, M. Boley, and J. Vreeken

“Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups,” 2017. [Online]. Available: http://arxiv.org/abs/1709.07941.

mehr

Abstract

Subgroup discovery is a local pattern mining technique to find interpretable

descriptions of sub-populations that stand out on a given target variable. That

is, these sub-populations are exceptional with regard to the global

distribution. In this paper we argue that in many applications, such as

scientific discovery, subgroups are only useful if they are additionally

representative of the global distribution with regard to a control variable.

That is, when the distribution of this control variable is the same, or almost

the same, as over the whole data.

We formalise this objective function and give an efficient algorithm to

compute its tight optimistic estimator for the case of a numeric target and a

binary control variable. This enables us to use the branch-and-bound framework

to efficiently discover the top-$k$ subgroups that are both exceptional as well

as representative. Experimental evaluation on a wide range of datasets shows

that with this algorithm we discover meaningful representative patterns and are

up to orders of magnitude faster in terms of node evaluations as well as time.

BibTeX

@online{Kalofolias_arXiv2017,
TITLE = {Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups},
AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1709.07941},
EPRINT = {1709.07941},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.},
}

Endnote

%0 Report
%A Kalofolias, Janis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficiently Discovering Locally Exceptional yet Globally Representative
  Subgroups : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-0685-D
%U http://arxiv.org/abs/1709.07941
%D 2017
%X   Subgroup discovery is a local pattern mining technique to find interpretable
descriptions of sub-populations that stand out on a given target variable. That
is, these sub-populations are exceptional with regard to the global
distribution. In this paper we argue that in many applications, such as
scientific discovery, subgroups are only useful if they are additionally
representative of the global distribution with regard to a control variable.
That is, when the distribution of this control variable is the same, or almost
the same, as over the whole data.
  We formalise this objective function and give an efficient algorithm to
compute its tight optimistic estimator for the case of a numeric target and a
binary control variable. This enables us to use the branch-and-bound framework
to efficiently discover the top-$k$ subgroups that are both exceptional as well
as representative. Experimental evaluation on a wide range of datasets shows
that with this algorithm we discover meaningful representative patterns and are
up to orders of magnitude faster in terms of node evaluations as well as time.

%K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI

Conference paper

J. Kalofolias, M. Boley, and J. Vreeken

“Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.

mehr

BibTeX

@inproceedings{KalofoliasICDM2017,
TITLE = {Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups},
AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-5386-3835-4},
DOI = {10.1109/ICDM.2017.29},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)},
PAGES = {197--206},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A Kalofolias, Janis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-63C2-5
%R 10.1109/ICDM.2017.29
%D 2017
%B 17th IEEE International Conference on Data Mining
%Z date of event: 2017-11-18 - 2017-11-21
%C New Orleans, LA, USA
%B 17th IEEE International Conference on Data Mining 
%P 197 - 206
%I IEEE
%@ 978-1-5386-3835-4

Conference paper

J. Kalofolias, E. Galbrun, and P. Miettinen

“From Sets of Good Redescriptions to Good Sets of Redescriptions,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.

mehr

BibTeX

@inproceedings{kalofolias16from,
TITLE = {From Sets of Good Redescriptions to Good Sets of Redescriptions},
AUTHOR = {Kalofolias, Janis and Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-5090-5473-2},
DOI = {10.1109/ICDM.2016.0032},
PUBLISHER = {IEEE},
YEAR = {2016},
BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)},
PAGES = {211--220},
ADDRESS = {Barcelona, Spain},
}

Endnote

%0 Conference Proceedings
%A Kalofolias, Janis
%A Galbrun, Esther
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T From Sets of Good Redescriptions to Good Sets of Redescriptions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-224D-A
%R 10.1109/ICDM.2016.0032
%D 2017
%8 02.02.2017
%B 16th International Conference on Data Mining
%Z date of event: 2016-12-12 - 2016-12-15
%C Barcelona, Spain
%B 16th IEEE International Conference on Data Mining 
%P 211 - 220
%I IEEE
%@ 978-1-5090-5473-2

Conference paper

M. Kamp, M. Boley, O. Missura, and T. Gärtner

“Effective Parallelisation for Machine Learning,” in Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 2017.

mehr

BibTeX

@inproceedings{NIPS2017_7226,
TITLE = {Effective Parallelisation for Machine Learning},
AUTHOR = {Kamp, Michael and Boley, Mario and Missura, Olana and G{\"a}rtner, Thomas},
LANGUAGE = {eng},
PUBLISHER = {Curran Associates},
YEAR = {2017},
BOOKTITLE = {Advances in Neural Information Processing Systems 30},
EDITOR = {Guyon, I. and Luxburg, U. V. and Bengio, S. and Wallach, H. and Fergus, R. and Vishwanathan, S. and Garnett, R.},
PAGES = {6477--6488},
EID = {7226},
ADDRESS = {Long Beach, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Kamp, Michael
%A Boley, Mario
%A Missura, Olana
%A G&#228;rtner, Thomas
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Effective Parallelisation for Machine Learning : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-BA32-4
%D 2017
%B Thirty-first Conference on Neural Information Processing Systems
%Z date of event: 2017-12-04 - 2017-12-09
%C Long Beach, CA, USA
%B Advances in Neural Information Processing Systems 30
%E Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R.
%P 6477 - 6488
%Z sequence number: 7226
%I Curran Associates
%U http://papers.nips.cc/paper/7226-effective-parallelisation-for-machine-learning.pdf

Paper

S. Karaev and P. Miettinen

“Algorithms for Approximate Subtropical Matrix Factorization,” 2017. [Online]. Available: http://arxiv.org/abs/1707.08872.

mehr

Abstract

Matrix factorization methods are important tools in data mining and analysis.

They can be used for many tasks, ranging from dimensionality reduction to

visualization. In this paper we concentrate on the use of matrix factorizations

for finding patterns from the data. Rather than using the standard algebra --

and the summation of the rank-1 components to build the approximation of the

original matrix -- we use the subtropical algebra, which is an algebra over the

nonnegative real values with the summation replaced by the maximum operator.

Subtropical matrix factorizations allow "winner-takes-it-all" interpretations

of the rank-1 components, revealing different structure than the normal

(nonnegative) factorizations. We study the complexity and sparsity of the

factorizations, and present a framework for finding low-rank subtropical

factorizations. We present two specific algorithms, called Capricorn and

Cancer, that are part of our framework. They can be used with data that has

been corrupted with different types of noise, and with different error metrics,

including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon

divergence. Our experiments show that the algorithms perform well on data that

has subtropical structure, and that they can find factorizations that are both

sparse and easy to interpret.

BibTeX

@online{Karaev_arXiv2017,
TITLE = {Algorithms for Approximate Subtropical Matrix Factorization},
AUTHOR = {Karaev, Sanjar and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1707.08872},
EPRINT = {1707.08872},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matrix -- we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow "winner-takes-it-all" interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret.},
}

Endnote

%0 Report
%A Karaev, Sanjar
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Algorithms for Approximate Subtropical Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-065A-F
%U http://arxiv.org/abs/1707.08872
%D 2017
%X   Matrix factorization methods are important tools in data mining and analysis.
They can be used for many tasks, ranging from dimensionality reduction to
visualization. In this paper we concentrate on the use of matrix factorizations
for finding patterns from the data. Rather than using the standard algebra --
and the summation of the rank-1 components to build the approximation of the
original matrix -- we use the subtropical algebra, which is an algebra over the
nonnegative real values with the summation replaced by the maximum operator.
Subtropical matrix factorizations allow "winner-takes-it-all" interpretations
of the rank-1 components, revealing different structure than the normal
(nonnegative) factorizations. We study the complexity and sparsity of the
factorizations, and present a framework for finding low-rank subtropical
factorizations. We present two specific algorithms, called Capricorn and
Cancer, that are part of our framework. They can be used with data that has
been corrupted with different types of noise, and with different error metrics,
including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon
divergence. Our experiments show that the algorithms perform well on data that
has subtropical structure, and that they can find factorizations that are both
sparse and easy to interpret.

%K Computer Science, Learning, cs.LG
%U http://people.mpi-inf.mpg.de/~pmiettin/tropical/

Thesis

D5IMPR-CS

E. Kuzey

“Populating Knowledge bases with Temporal Information,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

BibTeX

@phdthesis{KuzeyPhd2017,
TITLE = {Populating Knowledge bases with Temporal Information},
AUTHOR = {Kuzey, Erdal},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-68119},
DOI = {10.22028/D291-26705},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
}

Endnote

%0 Thesis
%A Kuzey, Erdal
%Y Weikum, Gerhard
%A referee: de Rijke , Maarten
%A referee: Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Populating Knowledge bases with Temporal Information : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-EAE5-7
%R 10.22028/D291-26705
%U urn:nbn:de:bsz:291-scidok-68119
%F OTHER: hdl:20.500.11880/26761
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P XIV, 143 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2017/6811/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Thesis

L. Lange

“Time in Newspaper: A Large-Scale Analysis of Temporal Expressions in News Corpora,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

BibTeX

@mastersthesis{LangeBcS2017,
TITLE = {Time in Newspaper: {A} Large-Scale Analysis of Temporal Expressions in News Corpora},
AUTHOR = {Lange, Lukas},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
TYPE = {Bachelor's thesis},
}

Endnote

%0 Thesis
%A Lange, Lukas
%Y Str&#246;tgen, Jannik
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Time in Newspaper: A Large-Scale Analysis of Temporal Expressions
in News Corpora : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5D08-B
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P 77 p.
%V bachelor
%9 bachelor

Conference paper

F. A. Lisi and D. Stepanova

“Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs,” in Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR, London, UK, 2017.

mehr

BibTeX

@inproceedings{LisiRuleML2017,
TITLE = {Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs},
AUTHOR = {Lisi, Francesca Alessandra and Stepanova, Daria},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-1875-8},
PUBLISHER = {CEUR-WS.org},
YEAR = {2017},
BOOKTITLE = {Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR},
EDITOR = {Bassiliades, Nick and Bikakis, Antonis and Constantini, Stefania and Franconi, Enrico and Giurca, Adrian and Kontchakov, Roman and Patkosi, Theodore and Sadri, Fariba and Van Woensel, William},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1875},
ADDRESS = {London, UK},
}

Endnote

%0 Conference Proceedings
%A Lisi, Francesca Alessandra
%A Stepanova, Daria
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-55FC-8
%D 2017
%B International Joint Conference on Rules and Reasoning
%Z date of event: 2017-07-12 - 2017-07-15
%C London, UK
%B Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR

%E Bassiliades, Nick; Bikakis, Antonis; Constantini, Stefania; Franconi, Enrico; Giurca, Adrian; Kontchakov, Roman; Patkosi, Theodore; Sadri, Fariba; Van Woensel, William
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1875
%@ false
%U http://ceur-ws.org/Vol-1875/paper20.pdf

Paper

S. MacAvaney, K. Hui, and A. Yates

“An Approach for Weakly-Supervised Deep Information Retrieval,” 2017. [Online]. Available: http://arxiv.org/abs/1707.00189.

mehr

Abstract

Recent developments in neural information retrieval models have been

promising, but a problem remains: human relevance judgments are expensive to

produce, while neural models require a considerable amount of training data. In

an attempt to fill this gap, we present an approach that---given a weak

training set of pseudo-queries, documents, relevance information---filters the

data to produce effective positive and negative query-document pairs. This

allows large corpora to be used as neural IR model training data, while

eliminating training examples that do not transfer well to relevance scoring.

The filters include unsupervised ranking heuristics and a novel measure of

interaction similarity. We evaluate our approach using a news corpus with

article headlines acting as pseudo-queries and article content as documents,

with implicit relevance between an article's headline and its content. By using

our approach to train state-of-the-art neural IR models and comparing to

established baselines, we find that training data generated by our approach can

lead to good results on a benchmark test collection.

BibTeX

@online{MacAvaney_arXiv2017,
TITLE = {An Approach for Weakly-Supervised Deep Information Retrieval},
AUTHOR = {MacAvaney, Sean and Hui, Kai and Yates, Andrew},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1707.00189},
EPRINT = {1707.00189},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.},
}

Endnote

%0 Report
%A MacAvaney, Sean
%A Hui, Kai
%A Yates, Andrew
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T An Approach for Weakly-Supervised Deep Information Retrieval : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-06C5-C
%U http://arxiv.org/abs/1707.00189
%D 2017
%X   Recent developments in neural information retrieval models have been
promising, but a problem remains: human relevance judgments are expensive to
produce, while neural models require a considerable amount of training data. In
an attempt to fill this gap, we present an approach that---given a weak
training set of pseudo-queries, documents, relevance information---filters the
data to produce effective positive and negative query-document pairs. This
allows large corpora to be used as neural IR model training data, while
eliminating training examples that do not transfer well to relevance scoring.
The filters include unsupervised ranking heuristics and a novel measure of
interaction similarity. We evaluate our approach using a news corpus with
article headlines acting as pseudo-queries and article content as documents,
with implicit relevance between an article's headline and its content. By using
our approach to train state-of-the-art neural IR models and comparing to
established baselines, we find that training data generated by our approach can
lead to good results on a benchmark test collection.

%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Paper

P. Mandros, M. Boley, and J. Vreeken

“Discovering Reliable Approximate Functional Dependencies,” 2017. [Online]. Available: http://arxiv.org/abs/1705.09391.

mehr

Abstract

Given a database and a target attribute of interest, how can we tell whether

there exists a functional, or approximately functional dependence of the target

on any set of other attributes in the data? How can we reliably, without bias

to sample size or dimensionality, measure the strength of such a dependence?

And, how can we efficiently discover the optimal or $\alpha$-approximate

top-$k$ dependencies? These are exactly the questions we answer in this paper.

As we want to be agnostic on the form of the dependence, we adopt an

information-theoretic approach, and construct a reliable, bias correcting score

that can be efficiently computed. Moreover, we give an effective optimistic

estimator of this score, by which for the first time we can mine the

approximate functional dependencies from data with guarantees of optimality.

Empirical evaluation shows that the derived score achieves a good bias for

variance trade-off, can be used within an efficient discovery algorithm, and

indeed discovers meaningful dependencies. Most important, it remains reliable

in the face of data sparsity.

BibTeX

@online{DBLP:journals/corr/MandrosBV17,
TITLE = {Discovering Reliable Approximate Functional Dependencies},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1705.09391},
EPRINT = {1705.09391},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.},
}

Endnote

%0 Report
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Reliable Approximate Functional Dependencies : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90F8-D
%U http://arxiv.org/abs/1705.09391
%D 2017
%X   Given a database and a target attribute of interest, how can we tell whether
there exists a functional, or approximately functional dependence of the target
on any set of other attributes in the data? How can we reliably, without bias
to sample size or dimensionality, measure the strength of such a dependence?
And, how can we efficiently discover the optimal or $\alpha$-approximate
top-$k$ dependencies? These are exactly the questions we answer in this paper.
  As we want to be agnostic on the form of the dependence, we adopt an
information-theoretic approach, and construct a reliable, bias correcting score
that can be efficiently computed. Moreover, we give an effective optimistic
estimator of this score, by which for the first time we can mine the
approximate functional dependencies from data with guarantees of optimality.
Empirical evaluation shows that the derived score achieves a good bias for
variance trade-off, can be used within an efficient discovery algorithm, and
indeed discovers meaningful dependencies. Most important, it remains reliable
in the face of data sparsity.

%K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT

Conference paper

P. Mandros, M. Boley, and J. Vreeken

“Discovering Reliable Approximate Functional Dependencies,” in KDD’17, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 2017.

mehr

BibTeX

@inproceedings{MandrosKDD2017,
TITLE = {Discovering Reliable Approximate Functional Dependencies},
AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-4503-4887-4},
DOI = {10.1145/3097983.3098062},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {KDD'17, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
PAGES = {355--363},
ADDRESS = {Halifax, NS, Canada},
}

Endnote

%0 Conference Proceedings
%A Mandros, Panagiotis
%A Boley, Mario
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Reliable Approximate Functional Dependencies : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-065F-5
%R 10.1145/3097983.3098062
%D 2017
%B 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
%Z date of event: 2017-08-13 - 2017-08-17
%C Halifax, NS, Canada
%B KDD'17
%P 355 - 363
%I ACM
%@ 978-1-4503-4887-4

Paper

A. Marx and J. Vreeken

“Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06385.

mehr

Abstract

Given data over the joint distribution of two univariate or multivariate

random variables $X$ and $Y$ of mixed or single type data, we consider the

problem of inferring the most likely causal direction between $X$ and $Y$. We

take an information theoretic approach, from which it follows that first

describing the data over cause and then that of effect given cause is shorter

than the reverse direction.

For practical inference, we propose a score for causal models for mixed type

data based on the Minimum Description Length (MDL) principle. In particular, we

model dependencies between $X$ and $Y$ using classification and regression

trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a

fast greedy algorithm to infer the most likely causal direction directly from

the data.

Empirical evaluation on synthetic, benchmark, and real world data shows that

Crack reliably and with high accuracy infers the correct causal direction on

both univariate and multivariate cause--effect pairs over both single and mixed

type data.

BibTeX

@online{DBLP:journals/corr/MarxV17,
TITLE = {Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1702.06385},
EPRINT = {1702.06385},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data.},
}

Endnote

%0 Report
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Inference on Multivariate Mixed-Type Data by Minimum Description
  Length : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90EF-3
%U http://arxiv.org/abs/1702.06385
%D 2017
%X   Given data over the joint distribution of two univariate or multivariate
random variables $X$ and $Y$ of mixed or single type data, we consider the
problem of inferring the most likely causal direction between $X$ and $Y$. We
take an information theoretic approach, from which it follows that first
describing the data over cause and then that of effect given cause is shorter
than the reverse direction.
  For practical inference, we propose a score for causal models for mixed type
data based on the Minimum Description Length (MDL) principle. In particular, we
model dependencies between $X$ and $Y$ using classification and regression
trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a
fast greedy algorithm to infer the most likely causal direction directly from
the data.
  Empirical evaluation on synthetic, benchmark, and real world data shows that
Crack reliably and with high accuracy infers the correct causal direction on
both univariate and multivariate cause--effect pairs over both single and mixed
type data.

%K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG

Paper

A. Marx and J. Vreeken

“Telling Cause from Effect using MDL-based Local and Global Regression,” 2017. [Online]. Available: http://arxiv.org/abs/1709.08915.

mehr

Abstract

We consider the fundamental problem of inferring the causal direction between
two univariate numeric random variables $X$ and $Y$ from observational data.
The two-variable case is especially difficult to solve since it is not possible
to use standard conditional independence tests between the variables.
To tackle this problem, we follow an information theoretic approach based on
Kolmogorov complexity and use the Minimum Description Length (MDL) principle to
provide a practical solution. In particular, we propose a compression scheme to
encode local and global functional relations using MDL-based regression. We
infer $X$ causes $Y$ in case it is shorter to describe $Y$ as a function of $X$
than the inverse direction. In addition, we introduce Slope, an efficient
linear-time algorithm that through thorough empirical evaluation on both
synthetic and real world data we show outperforms the state of the art by a
wide margin.

BibTeX

@online{Marx_arXiv1709.08915,
TITLE = {Telling Cause from Effect using {MDL}-based Local and Global Regression},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
URL = {http://arxiv.org/abs/1709.08915},
DOI = {10.1109/ICDM.2017.40},
EPRINT = {1709.08915},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {We consider the fundamental problem of inferring the causal direction between<br>two univariate numeric random variables $X$ and $Y$ from observational data.<br>The two-variable case is especially difficult to solve since it is not possible<br>to use standard conditional independence tests between the variables.<br> To tackle this problem, we follow an information theoretic approach based on<br>Kolmogorov complexity and use the Minimum Description Length (MDL) principle to<br>provide a practical solution. In particular, we propose a compression scheme to<br>encode local and global functional relations using MDL-based regression. We<br>infer $X$ causes $Y$ in case it is shorter to describe $Y$ as a function of $X$<br>than the inverse direction. In addition, we introduce Slope, an efficient<br>linear-time algorithm that through thorough empirical evaluation on both<br>synthetic and real world data we show outperforms the state of the art by a<br>wide margin.<br>},
}

Endnote

%0 Report
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Telling Cause from Effect using MDL-based Local and Global Regression : 
%U http://hdl.handle.net/21.11116/0000-0002-9F18-1
%R 10.1109/ICDM.2017.40
%U http://arxiv.org/abs/1709.08915
%D 2017
%X   We consider the fundamental problem of inferring the causal direction between<br>two univariate numeric random variables $X$ and $Y$ from observational data.<br>The two-variable case is especially difficult to solve since it is not possible<br>to use standard conditional independence tests between the variables.<br>  To tackle this problem, we follow an information theoretic approach based on<br>Kolmogorov complexity and use the Minimum Description Length (MDL) principle to<br>provide a practical solution. In particular, we propose a compression scheme to<br>encode local and global functional relations using MDL-based regression. We<br>infer $X$ causes $Y$ in case it is shorter to describe $Y$ as a function of $X$<br>than the inverse direction. In addition, we introduce Slope, an efficient<br>linear-time algorithm that through thorough empirical evaluation on both<br>synthetic and real world data we show outperforms the state of the art by a<br>wide margin.<br>
%K Statistics, Machine Learning, stat.ML

Conference paper

A. Marx and J. Vreeken

“Telling Cause from Effect Using MDL-Based Local and Global Regression,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.

mehr

BibTeX

@inproceedings{MarxICDM2017,
TITLE = {Telling Cause from Effect Using {MDL}-Based Local and Global Regression},
AUTHOR = {Marx, Alexander and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-5386-3835-4},
DOI = {10.1109/ICDM.2017.40},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)},
PAGES = {307--316},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A Marx, Alexander
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Telling Cause from Effect Using MDL-Based Local and Global Regression : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-63C4-3
%R 10.1109/ICDM.2017.40
%D 2017
%B 17th IEEE International Conference on Data Mining
%Z date of event: 2017-11-18 - 2017-11-21
%C New Orleans, LA, USA
%B 17th IEEE International Conference on Data Mining 
%P 307 - 316
%I IEEE
%@ 978-1-5386-3835-4

Conference paper

F. Meawad, M. H. Gad-Elrab, and E. Hemayed

“Designing Mobile Augmented Reality Experiences Using Friendly Markers,” in 4th International Conference on User Science and Engineering (i-USEr 2016), Melaka, Malaysia, 2017.

mehr

BibTeX

@inproceedings{Meawad2017,
TITLE = {Designing Mobile Augmented Reality Experiences Using Friendly Markers},
AUTHOR = {Meawad, Fatma and Gad-Elrab, Mohamed H. and Hemayed, Elsayed},
LANGUAGE = {eng},
ISBN = {978-1-5090-263-9},
DOI = {10.1109/IUSER.2016.7857937},
PUBLISHER = {IEEE},
YEAR = {2016},
DATE = {2017},
BOOKTITLE = {4th International Conference on User Science and Engineering (i-USEr 2016)},
PAGES = {75--80},
ADDRESS = {Melaka, Malaysia},
}

Endnote

%0 Conference Proceedings
%A Meawad, Fatma
%A Gad-Elrab, Mohamed H.
%A Hemayed, Elsayed
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Designing Mobile Augmented Reality Experiences Using Friendly Markers : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-CF28-A
%R 10.1109/IUSER.2016.7857937
%D 2017
%B 4th International Conference on User Science and Engineering
%Z date of event: 2016-08-23 - 2016-08-25
%C Melaka, Malaysia
%B 4th International Conference on User Science and Engineering
%P 75 - 80
%I IEEE
%@ 978-1-5090-263-9

Article

S. Metzger, R. Schenkel, and M. Sydow

“QBEES: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation,” Journal of Intelligent Information Systems, vol. 49, no. 3, 2017.

mehr

BibTeX

@article{Metzger2017,
TITLE = {{QBEES}: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation},
AUTHOR = {Metzger, Steffen and Schenkel, Ralf and Sydow, Marcin},
LANGUAGE = {eng},
DOI = {10.1007/s10844-017-0443-x},
YEAR = {2017},
DATE = {2017},
JOURNAL = {Journal of Intelligent Information Systems},
VOLUME = {49},
NUMBER = {3},
PAGES = {333--366},
}

Endnote

%0 Journal Article
%A Metzger, Steffen
%A Schenkel, Ralf
%A Sydow, Marcin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T QBEES: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-557B-8
%R 10.1007/s10844-017-0443-x
%7 2017
%D 2017
%J Journal of Intelligent Information Systems
%V 49
%N 3
%& 333
%P 333 - 366

Conference paper

S. Metzler, S. Günnemann, and P. Miettinen

“Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.

mehr

BibTeX

@inproceedings{metzler16hyperbolae,
TITLE = {Hyperbolae Are No Hyperbole: {Modelling} Communities That Are Not Cliques},
AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-5090-5473-2},
DOI = {10.1109/ICDM.2016.0044},
PUBLISHER = {IEEE},
YEAR = {2016},
BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)},
PAGES = {330--339},
ADDRESS = {Barcelona, Spain},
}

Endnote

%0 Conference Proceedings
%A Metzler, Saskia
%A G&#252;nnemann, Stephan
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-225F-F
%R 10.1109/ICDM.2016.0044
%D 2017
%8 02.02.2017
%B 16th International Conference on Data Mining
%Z date of event: 2016-12-12 - 2016-12-15
%C Barcelona, Spain
%B 16th IEEE International Conference on Data Mining 
%P 330 - 339
%I IEEE
%@ 978-1-5090-5473-2

Paper

P. Mirza, S. Razniewski, F. Darari, and G. Weikum

“Cardinal Virtues: Extracting Relation Cardinalities from Text,” 2017. [Online]. Available: http://arxiv.org/abs/1704.04455.

mehr

Abstract

Information extraction (IE) from text has largely focused on relations

between individual entities, such as who has won which award. However, some

facts are never fully mentioned, and no IE method has perfect recall. Thus, it

is beneficial to also tap contents about the cardinalities of these relations,

for example, how many awards someone has won. We introduce this novel problem

of extracting cardinalities and discusses the specific challenges that set it

apart from standard IE. We present a distant supervision method using

conditional random fields. A preliminary evaluation results in precision

between 3% and 55%, depending on the difficulty of relations.

BibTeX

@online{Mirza2017,
TITLE = {Cardinal Virtues: Extracting Relation Cardinalities from Text},
AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1704.04455},
EPRINT = {1704.04455},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations.},
}

Endnote

%0 Report
%A Mirza, Paramita
%A Razniewski, Simon
%A Darari, Fariz
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cardinal Virtues: Extracting Relation Cardinalities from Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-8128-9
%U http://arxiv.org/abs/1704.04455
%D 2017
%X   Information extraction (IE) from text has largely focused on relations
between individual entities, such as who has won which award. However, some
facts are never fully mentioned, and no IE method has perfect recall. Thus, it
is beneficial to also tap contents about the cardinalities of these relations,
for example, how many awards someone has won. We introduce this novel problem
of extracting cardinalities and discusses the specific challenges that set it
apart from standard IE. We present a distant supervision method using
conditional random fields. A preliminary evaluation results in precision
between 3% and 55%, depending on the difficulty of relations.

%K Computer Science, Computation and Language, cs.CL

Conference paper

P. Mirza, S. Razniewski, F. Darari, and G. Weikum

“Cardinal Virtues: Extracting Relation Cardinalities from Text,” in The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 2017.

mehr

BibTeX

@inproceedings{MirzaACL2017,
TITLE = {Cardinal Virtues: {E}xtracting Relation Cardinalities from Text},
AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-945626-76-0},
DOI = {10.18653/v1/P17-2055},
PUBLISHER = {ACL},
YEAR = {2017},
BOOKTITLE = {The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)},
PAGES = {347--351},
ADDRESS = {Vancouver, Canada},
}

Endnote

%0 Conference Proceedings
%A Mirza, Paramita
%A Razniewski, Simon
%A Darari, Fariz
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cardinal Virtues: Extracting Relation Cardinalities from Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-F9F8-7
%R 10.18653/v1/P17-2055
%D 2017
%B The 55th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2017-07-30 - 2017-08-04
%C Vancouver, Canada
%B The 55th Annual Meeting of the Association for Computational Linguistics
%P 347 - 351
%I ACL
%@ 978-1-945626-76-0
%U http://aclweb.org/anthology/P17-2055

Conference paper

A. Mishra and K. Berberich

“How do Order and Proximity Impact the Readability of Event Summaries?,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.

mehr

BibTeX

@inproceedings{DBLP:conf/ecir/MishraB17,
TITLE = {How do Order and Proximity Impact the Readability of Event Summaries?},
AUTHOR = {Mishra, Arunav and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-56607-8},
DOI = {10.1007/978-3-319-56608-5_17},
PUBLISHER = {Springer},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)},
EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John},
PAGES = {212--225},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10193},
ADDRESS = {Aberdeen, UK},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T How do Order and Proximity Impact the Readability of Event Summaries? : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-20D9-B
%R 10.1007/978-3-319-56608-5_17
%D 2017
%B 39th European Conference on Information Retrieval
%Z date of event: 2017-04-09 - 2017-04-13
%C Aberdeen, UK
%B Advances in Information Retrieval
%E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John
%P 212 - 225
%I Springer
%@ 978-3-319-56607-8
%B Lecture Notes in Computer Science
%N 10193

Conference paper

P. Mrazovic, B. Eravci, J. L. Larriba-Pey, H. Ferhatosmanoglu, and M. Matskin

“Understanding and Predicting Trends in Urban Freight Transport,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.

mehr

BibTeX

@inproceedings{MrazovicMDM2017,
TITLE = {Understanding and Predicting Trends in Urban Freight Transport},
AUTHOR = {Mrazovic, Petar and Eravci, Bahaeddin and Larriba-Pey, Josep L. and Ferhatosmanoglu, Hakan and Matskin, Mihhail},
LANGUAGE = {eng},
ISBN = {978-1-5386-3932-0},
DOI = {10.1109/MDM.2017.26},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)},
PAGES = {124--133},
ADDRESS = {Daejeon, South Korea},
}

Endnote

%0 Conference Proceedings
%A Mrazovic, Petar
%A Eravci, Bahaeddin
%A Larriba-Pey, Josep L.
%A Ferhatosmanoglu, Hakan
%A Matskin, Mihhail
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Understanding and Predicting Trends in Urban Freight Transport : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-DB41-0
%R 10.1109/MDM.2017.26
%D 2017
%B 18th IEEE International Conference on Mobile Data Management
%Z date of event: 2017-05-29 - 2017-06-01
%C Daejeon, South Korea
%B 18th IEEE International Conference on Mobile Data Management
%P 124 - 133
%I IEEE
%@ 978-1-5386-3932-0

Paper

S. Mukherjee, G. Weikum, and C. Danescu-Niculescu-Mizil

“People on Drugs: Credibility of User Statements in Health Communities,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02522.

mehr

Abstract

Online health communities are a valuable source of information for patients

and physicians. However, such user-generated resources are often plagued by

inaccuracies and misinformation. In this work we propose a method for

automatically establishing the credibility of user-generated medical statements

and the trustworthiness of their authors by exploiting linguistic cues and

distant supervision from expert sources. To this end we introduce a

probabilistic graphical model that jointly learns user trustworthiness,

statement credibility, and language objectivity. We apply this methodology to

the task of extracting rare or unknown side-effects of medical drugs --- this

being one of the problems where large scale non-expert data has the potential

to complement expert medical knowledge. We show that our method can reliably

extract side-effects and filter out false statements, while identifying

trustworthy users that are likely to contribute valuable medical information.

BibTeX

@online{Mukherjee_arXiv2017,
TITLE = {People on Drugs: Credibility of User Statements in Health Communities},
AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard and Danescu-Niculescu-Mizil, Cristian},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1705.02522},
EPRINT = {1705.02522},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.},
}

Endnote

%0 Report
%A Mukherjee, Subhabrata
%A Weikum, Gerhard
%A Danescu-Niculescu-Mizil, Cristian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T People on Drugs: Credibility of User Statements in Health Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-80FE-2
%U http://arxiv.org/abs/1705.02522
%D 2017
%X   Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs --- this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information.

%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML

Thesis

D5IMPR-CS

S. Mukherjee

“Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.

BibTeX

@phdthesis{Mukherjeephd17,
TITLE = {Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities},
AUTHOR = {Mukherjee, Subhabrata},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-69269},
DOI = {10.22028/D291-26780},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.},
}

Endnote

%0 Thesis
%A Mukherjee, Subhabrata
%Y Weikum, Gerhard
%A referee: Han, Jiawei
%A referee: G&#252;nnemann, Stephan
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-A648-0
%U urn:nbn:de:bsz:291-scidok-69269
%R 10.22028/D291-26780
%F OTHER: hdl:20.500.11880/26793
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P 166 p.
%V phd
%9 phd
%X 	One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.
%U http://scidok.sulb.uni-saarland.de/volltexte/2017/6926/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Paper

S. Mukherjee, S. Guennemann, and G. Weikum

“Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02669.

mehr

Abstract

Online review communities are dynamic as users join and leave, adopt new

vocabulary, and adapt to evolving trends. Recent work has shown that

recommender systems benefit from explicit consideration of user experience.

However, prior work assumes a fixed number of discrete experience levels,

whereas in reality users gain experience and mature continuously over time.

This paper presents a new model that captures the continuous evolution of user

experience, and the resulting language model in reviews and other posts. Our

model is unsupervised and combines principles of Geometric Brownian Motion,

Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal

progression of user experience and language model respectively. We develop

practical algorithms for estimating the model parameters from data and for

inference with our model (e.g., to recommend items). Extensive experiments with

five real-world datasets show that our model not only fits data better than

discrete-model baselines, but also outperforms state-of-the-art methods for

predicting item ratings.

BibTeX

@online{Mukherjee2017,
TITLE = {Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion},
AUTHOR = {Mukherjee, Subhabrata and Guennemann, Stephan and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1705.02669},
DOI = {10.1145/2939672.2939780},
EPRINT = {1705.02669},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings.},
}

Endnote

%0 Report
%A Mukherjee, Subhabrata
%A Guennemann, Stephan
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-80BE-3
%R 10.1145/2939672.2939780
%U http://arxiv.org/abs/1705.02669
%D 2017
%X   Online review communities are dynamic as users join and leave, adopt new
vocabulary, and adapt to evolving trends. Recent work has shown that
recommender systems benefit from explicit consideration of user experience.
However, prior work assumes a fixed number of discrete experience levels,
whereas in reality users gain experience and mature continuously over time.
This paper presents a new model that captures the continuous evolution of user
experience, and the resulting language model in reviews and other posts. Our
model is unsupervised and combines principles of Geometric Brownian Motion,
Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
progression of user experience and language model respectively. We develop
practical algorithms for estimating the model parameters from data and for
inference with our model (e.g., to recommend items). Extensive experiments with
five real-world datasets show that our model not only fits data better than
discrete-model baselines, but also outperforms state-of-the-art methods for
predicting item ratings.

%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML

Paper

S. Mukherjee, S. Dutta, and G. Weikum

“Credible Review Detection with Limited Information using Consistency Analysis,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02668.

mehr

Abstract

Online reviews provide viewpoints on the strengths and shortcomings of

products/services, influencing potential customers' purchasing decisions.

However, the proliferation of non-credible reviews -- either fake (promoting/

demoting an item), incompetent (involving irrelevant aspects), or biased --

entails the problem of identifying credible reviews. Prior works involve

classifiers harnessing rich information about items/users -- which might not be

readily available in several domains -- that provide only limited

interpretability as to why a review is deemed non-credible. This paper presents

a novel approach to address the above issues. We utilize latent topic models

leveraging review texts, item ratings, and timestamps to derive consistency

features without relying on item/user histories, unavailable for "long-tail"

items/users. We develop models, for computing review credibility scores to

provide interpretable evidence for non-credible reviews, that are also

transferable to other domains -- addressing the scarcity of labeled data.

Experiments on real-world datasets demonstrate improvements over

state-of-the-art baselines.

BibTeX

@online{Mukherjee2017b,
TITLE = {Credible Review Detection with Limited Information using Consistency Analysis},
AUTHOR = {Mukherjee, Subhabrata and Dutta, Sourav and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1705.02668},
EPRINT = {1705.02668},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines.},
}

Endnote

%0 Report
%A Mukherjee, Subhabrata
%A Dutta, Sourav
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Credible Review Detection with Limited Information using Consistency Analysis : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-80C1-A
%U http://arxiv.org/abs/1705.02668
%D 2017
%X   Online reviews provide viewpoints on the strengths and shortcomings of
products/services, influencing potential customers' purchasing decisions.
However, the proliferation of non-credible reviews -- either fake (promoting/
demoting an item), incompetent (involving irrelevant aspects), or biased --
entails the problem of identifying credible reviews. Prior works involve
classifiers harnessing rich information about items/users -- which might not be
readily available in several domains -- that provide only limited
interpretability as to why a review is deemed non-credible. This paper presents
a novel approach to address the above issues. We utilize latent topic models
leveraging review texts, item ratings, and timestamps to derive consistency
features without relying on item/user histories, unavailable for "long-tail"
items/users. We develop models, for computing review credibility scores to
provide interpretable evidence for non-credible reviews, that are also
transferable to other domains -- addressing the scarcity of labeled data.
Experiments on real-world datasets demonstrate improvements over
state-of-the-art baselines.

%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML

Paper

S. Mukherjee and G. Weikum

“People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02667.

mehr

Abstract

Media seems to have become more partisan, often providing a biased coverage

of news catering to the interest of specific groups. It is therefore essential

to identify credible information content that provides an objective narrative

of an event. News communities such as digg, reddit, or newstrust offer

recommendations, reviews, quality ratings, and further insights on journalistic

works. However, there is a complex interaction between different factors in

such online communities: fairness and style of reporting, language clarity and

objectivity, topical perspectives (like political viewpoint), expertise and

bias of community members, and more. This paper presents a model to

systematically analyze the different interactions in a news community between

users, news, and sources. We develop a probabilistic graphical model that

leverages this joint interaction to identify 1) highly credible news articles,

2) trustworthy news sources, and 3) expert users who perform the role of

"citizen journalists" in the community. Our method extends CRF models to

incorporate real-valued ratings, as some communities have very fine-grained

scales that cannot be easily discretized without losing information. To the

best of our knowledge, this paper is the first full-fledged analysis of

credibility, trust, and expertise in news communities.

BibTeX

@online{Mukerjee_arXiv1705.02667,
TITLE = {People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities},
AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1705.02667},
EPRINT = {1705.02667},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.},
}

Endnote

%0 Report
%A Mukherjee, Subhabrata
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T People on Media: Jointly Identifying Credible News and Trustworthy
  Citizen Journalists in Online Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-80F7-0
%U http://arxiv.org/abs/1705.02667
%D 2017
%X   Media seems to have become more partisan, often providing a biased coverage
of news catering to the interest of specific groups. It is therefore essential
to identify credible information content that provides an objective narrative
of an event. News communities such as digg, reddit, or newstrust offer
recommendations, reviews, quality ratings, and further insights on journalistic
works. However, there is a complex interaction between different factors in
such online communities: fairness and style of reporting, language clarity and
objectivity, topical perspectives (like political viewpoint), expertise and
bias of community members, and more. This paper presents a model to
systematically analyze the different interactions in a news community between
users, news, and sources. We develop a probabilistic graphical model that
leverages this joint interaction to identify 1) highly credible news articles,
2) trustworthy news sources, and 3) expert users who perform the role of
"citizen journalists" in the community. Our method extends CRF models to
incorporate real-valued ratings, as some communities have very fine-grained
scales that cannot be easily discretized without losing information. To the
best of our knowledge, this paper is the first full-fledged analysis of
credibility, trust, and expertise in news communities.

%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML

Paper

S. Mukherjee, H. Lamba, and G. Weikum

“Item Recommendation with Evolving User Preferences and Experience,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02519.

mehr

Abstract

Current recommender systems exploit user and item similarities by

collaborative filtering. Some advanced methods also consider the temporal

evolution of item ratings as a global background process. However, all prior

methods disregard the individual evolution of a user's experience level and how

this is expressed in the user's writing in a review community. In this paper,

we model the joint evolution of user experience, interest in specific item

facets, writing style, and rating behavior. This way we can generate individual

recommendations that take into account the user's maturity level (e.g.,

recommending art movies rather than blockbusters for a cinematography expert).

As only item ratings and review texts are observables, we capture the user's

experience and interests in a latent model learned from her reviews, vocabulary

and writing style. We develop a generative HMM-LDA model to trace user

evolution, where the Hidden Markov Model (HMM) traces her latent experience

progressing over time -- with solely user reviews and ratings as observables

over time. The facets of a user's interest are drawn from a Latent Dirichlet

Allocation (LDA) model derived from her reviews, as a function of her (again

latent) experience level. In experiments with five real-world datasets, we show

that our model improves the rating prediction over state-of-the-art baselines,

by a substantial margin. We also show, in a use-case study, that our model

performs well in the assessment of user experience levels.

BibTeX

@online{Mukherjee2017d,
TITLE = {Item Recommendation with Evolving User Preferences and Experience},
AUTHOR = {Mukherjee, Subhabrata and Lamba, Hemank and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1705.02519},
DOI = {10.1109/ICDM.2015.111},
EPRINT = {1705.02519},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels.},
}

Endnote

%0 Report
%A Mukherjee, Subhabrata
%A Lamba, Hemank
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Item Recommendation with Evolving User Preferences and Experience : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-8103-C
%R 10.1109/ICDM.2015.111
%U http://arxiv.org/abs/1705.02519
%D 2017
%X   Current recommender systems exploit user and item similarities by
collaborative filtering. Some advanced methods also consider the temporal
evolution of item ratings as a global background process. However, all prior
methods disregard the individual evolution of a user's experience level and how
this is expressed in the user's writing in a review community. In this paper,
we model the joint evolution of user experience, interest in specific item
facets, writing style, and rating behavior. This way we can generate individual
recommendations that take into account the user's maturity level (e.g.,
recommending art movies rather than blockbusters for a cinematography expert).
As only item ratings and review texts are observables, we capture the user's
experience and interests in a latent model learned from her reviews, vocabulary
and writing style. We develop a generative HMM-LDA model to trace user
evolution, where the Hidden Markov Model (HMM) traces her latent experience
progressing over time -- with solely user reviews and ratings as observables
over time. The facets of a user's interest are drawn from a Latent Dirichlet
Allocation (LDA) model derived from her reviews, as a function of her (again
latent) experience level. In experiments with five real-world datasets, we show
that our model improves the rating prediction over state-of-the-art baselines,
by a substantial margin. We also show, in a use-case study, that our model
performs well in the assessment of user experience levels.

%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML

Paper

S. Mukherjee, K. Popat, and G. Weikum

“Exploring Latent Semantic Factors to Find Useful Product Reviews,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02518.

mehr

Abstract

Online reviews provided by consumers are a valuable asset for e-Commerce
platforms, influencing potential consumers in making purchasing decisions.
However, these reviews are of varying quality, with the useful ones buried deep
within a heap of non-informative reviews. In this work, we attempt to
automatically identify review quality in terms of its helpfulness to the end
consumers. In contrast to previous works in this domain exploiting a variety of
syntactic and community-level features, we delve deep into the semantics of
reviews as to what makes them useful, providing interpretable explanation for
the same. We identify a set of consistency and semantic factors, all from the
text, ratings, and timestamps of user-generated reviews, making our approach
generalizable across all communities and domains. We explore review semantics
in terms of several latent factors like the expertise of its author, his
judgment about the fine-grained facets of the underlying product, and his
writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet
Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
item facets, and (iii) review helpfulness. Large-scale experiments on five
real-world datasets from Amazon show significant improvement over
state-of-the-art baselines in predicting and ranking useful reviews.

BibTeX

@online{Mukjherjee2017e,
TITLE = {Exploring Latent Semantic Factors to Find Useful Product Reviews},
AUTHOR = {Mukherjee, Subhabrata and Popat, Kashyap and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1705.02518},
EPRINT = {1705.02518},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Online reviews provided by consumers are a valuable asset for e-Commerce<br>platforms, influencing potential consumers in making purchasing decisions.<br>However, these reviews are of varying quality, with the useful ones buried deep<br>within a heap of non-informative reviews. In this work, we attempt to<br>automatically identify review quality in terms of its helpfulness to the end<br>consumers. In contrast to previous works in this domain exploiting a variety of<br>syntactic and community-level features, we delve deep into the semantics of<br>reviews as to what makes them useful, providing interpretable explanation for<br>the same. We identify a set of consistency and semantic factors, all from the<br>text, ratings, and timestamps of user-generated reviews, making our approach<br>generalizable across all communities and domains. We explore review semantics<br>in terms of several latent factors like the expertise of its author, his<br>judgment about the fine-grained facets of the underlying product, and his<br>writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet<br>Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)<br>item facets, and (iii) review helpfulness. Large-scale experiments on five<br>real-world datasets from Amazon show significant improvement over<br>state-of-the-art baselines in predicting and ranking useful reviews.<br>},
}

Endnote

%0 Report
%A Mukherjee, Subhabrata
%A Popat, Kashyap
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Exploring Latent Semantic Factors to Find Useful Product Reviews : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-811C-5
%U http://arxiv.org/abs/1705.02518
%D 2017
%X   Online reviews provided by consumers are a valuable asset for e-Commerce<br>platforms, influencing potential consumers in making purchasing decisions.<br>However, these reviews are of varying quality, with the useful ones buried deep<br>within a heap of non-informative reviews. In this work, we attempt to<br>automatically identify review quality in terms of its helpfulness to the end<br>consumers. In contrast to previous works in this domain exploiting a variety of<br>syntactic and community-level features, we delve deep into the semantics of<br>reviews as to what makes them useful, providing interpretable explanation for<br>the same. We identify a set of consistency and semantic factors, all from the<br>text, ratings, and timestamps of user-generated reviews, making our approach<br>generalizable across all communities and domains. We explore review semantics<br>in terms of several latent factors like the expertise of its author, his<br>judgment about the fine-grained facets of the underlying product, and his<br>writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet<br>Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)<br>item facets, and (iii) review helpfulness. Large-scale experiments on five<br>real-world datasets from Amazon show significant improvement over<br>state-of-the-art baselines in predicting and ranking useful reviews.<br>
%K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML

Conference paper

S. Mukherjee, K. Popat, and G. Weikum

“Exploring Latent Semantic Factors to Find Useful Product Reviews,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.

mehr

BibTeX

@inproceedings{MukherjeeSDM2017,
TITLE = {Exploring Latent Semantic Factors to Find Useful Product Reviews},
AUTHOR = {Mukherjee, Subhabrata and Popat, Kashyap and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-61197-497-3},
DOI = {10.1137/1.9781611974973.54},
PUBLISHER = {SIAM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)},
PAGES = {480--488},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Popat, Kashyap
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Exploring Latent Semantic Factors to Find Useful Product Reviews : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4CD4-6
%R 10.1137/1.9781611974973.54
%D 2017
%B 17th SIAM International Conference on Data Mining
%Z date of event: 2017-04-27 - 2017-04-29
%C Houston, TX, USA
%B Proceedings of the Seventeenth SIAM International Conference on Data Mining
%P 480 - 488
%I SIAM
%@ 978-1-61197-497-3

Conference paper

S. Neumann and P. Miettinen

“Reductions for Frequency-Based Data Mining Problems,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.

mehr

BibTeX

@inproceedings{neumann17reductions,
TITLE = {Reductions for Frequency-Based Data Mining Problems},
AUTHOR = {Neumann, Stefan and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-5386-3835-4},
DOI = {10.1109/ICDM.2017.128},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)},
PAGES = {997--1002},
ADDRESS = {New Orleans, LA, USA},
}

Endnote

%0 Conference Proceedings
%A Neumann, Stefan
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Reductions for Frequency-Based Data Mining Problems : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-90CE-F
%R 10.1109/ICDM.2017.128
%D 2017
%B 17th IEEE International Conference on Data Mining
%Z date of event: 2017-11-18 - 2017-11-21
%C New Orleans, LA, USA
%B 17th IEEE International Conference on Data Mining 
%P 997 - 1002
%I IEEE
%@ 978-1-5386-3835-4

Conference paper

S. Neumann, R. Gemulla, and P. Miettinen

“What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.

mehr

BibTeX

@inproceedings{neumann16what,
TITLE = {What You Will Gain By Rounding: {Theory} and Algorithms for Rounding Rank},
AUTHOR = {Neumann, Stefan and Gemulla, Rainer and Miettinen, Pauli},
LANGUAGE = {eng},
DOI = {10.1109/ICDM.2016.147},
PUBLISHER = {IEEE},
YEAR = {2016},
BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)},
EDITOR = {Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong},
PAGES = {380--389},
ADDRESS = {Barcelona, Spain},
}

Endnote

%0 Conference Proceedings
%A Neumann, Stefan
%A Gemulla, Rainer
%A Miettinen, Pauli
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2265-0
%R 10.1109/ICDM.2016.147
%D 2017
%8 02.02.2017
%B 16th International Conference on Data Mining
%Z date of event: 2016-12-12 - 2016-12-15
%C Barcelona, Spain
%B 16th IEEE International Conference on Data Mining 
%E Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong
%P 380 - 389
%I IEEE

Paper

S. Neumann and P. Miettinen

“Reductions for Frequency-Based Data Mining Problems,” 2017. [Online]. Available: http://arxiv.org/abs/1709.00900.

mehr

Abstract

Studying the computational complexity of problems is one of the - if not the

- fundamental questions in computer science. Yet, surprisingly little is known

about the computational complexity of many central problems in data mining. In

this paper we study frequency-based problems and propose a new type of

reduction that allows us to compare the complexities of the maximal frequent

pattern mining problems in different domains (e.g. graphs or sequences). Our

results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader

range of data mining problems. Our results show that, by allowing constraints

in the pattern space, the complexities of many maximal frequent pattern mining

problems collapse. These problems include maximal frequent subgraphs in

labelled graphs, maximal frequent itemsets, and maximal frequent subsequences

with no repetitions. In addition to theoretical interest, our results might

yield more efficient algorithms for the studied problems.

BibTeX

@online{Neumann_arXiv2017,
TITLE = {Reductions for Frequency-Based Data Mining Problems},
AUTHOR = {Neumann, Stefan and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1709.00900},
EPRINT = {1709.00900},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Studying the computational complexity of problems is one of the -- if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.},
}

Endnote

%0 Report
%A Neumann, Stefan
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Reductions for Frequency-Based Data Mining Problems : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-0654-C
%U http://arxiv.org/abs/1709.00900
%D 2017
%X   Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.

%K Computer Science, Computational Complexity, cs.CC

Thesis

D5IMPR-CS

D. B. Nguyen

“Joint Models for Information and Knowledge Extraction,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

Abstract

Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: • High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. • Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. • On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.

BibTeX

@phdthesis{Nguyenphd2017,
TITLE = {Joint Models for Information and Knowledge Extraction},
AUTHOR = {Nguyen, Dat Ba},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-ds-269433},
DOI = {10.22028/D291-26943},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
ABSTRACT = {Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: \mbox{$\bullet$} High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. \mbox{$\bullet$} Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. \mbox{$\bullet$} On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.},
}

Endnote

%0 Thesis
%A Nguyen, Dat Ba
%Y Weikum, Gerhard
%A referee: Theobald, Martin
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Joint Models for Information and Knowledge Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-890F-9
%U urn:nbn:de:bsz:291-scidok-ds-269433
%R 10.22028/D291-26943
%F OTHER: hdl:20.500.11880/26895
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P 89 p.
%V phd
%9 phd
%X Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: &#8226; High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. &#8226; Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. &#8226; On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.
%U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26895

Conference paper

D. B. Nguyen, M. Theobald, and G. Weikum

“J-REED: Joint Relation Extraction and Entity Disambiguation,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.

mehr

BibTeX

@inproceedings{Nguyen_CIKM2017,
TITLE = {J-{REED}: {Joint Relation Extraction and Entity Disambiguation}},
AUTHOR = {Nguyen, Dat Ba and Theobald, Martin and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4918-5},
DOI = {10.1145/3132847.3133090},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management},
PAGES = {2227--2230},
ADDRESS = {Singapore, Singapore},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Dat Ba
%A Theobald, Martin
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T J-REED: Joint Relation Extraction and Entity Disambiguation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-3B9D-E
%R 10.1145/3132847.3133090
%D 2017
%B 26th ACM International Conference on Information and Knowledge Management 
%Z date of event: 2017-11-06 - 2017-11-10
%C Singapore, Singapore
%B CIKM'17
%P 2227 - 2230
%I ACM
%@ 978-1-4503-4918-5

Article

D. B. Nguyen, A. Abujabal, N. K. Tran, M. Theobald, and G. Weikum

“Query-Driven On-The-Fly Knowledge Base Construction,” Proceedings of the VLDB Endowment (Proc. VLDB 2018), vol. 11, no. 1, 2017.

mehr

BibTeX

@article{Nguyen2017_PVLDB,
TITLE = {Query-Driven On-The-Fly Knowledge Base Construction},
AUTHOR = {Nguyen, Dat Ba and Abujabal, Abdalghani and Tran, Nam Khanh and Theobald, Martin and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.14778/3136610.31366},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2017},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {11},
NUMBER = {1},
PAGES = {66--79},
BOOKTITLE = {Proceedings of the 44th International Conference on Very Large Data Bases (VLDB 2018)},
EDITOR = {Bhowmick, Sourav and Torres, Ricardo},
}

Endnote

%0 Journal Article
%A Nguyen, Dat Ba
%A Abujabal, Abdalghani
%A Tran, Nam Khanh
%A Theobald, Martin
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Query-Driven On-The-Fly Knowledge Base Construction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-3B51-3
%R 10.14778/3136610.31366
%7 2017
%D 2017
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 11
%N 1
%& 66
%P 66 - 79
%I ACM
%C New York, NY
%B Proceedings of the 44th International Conference on  Very Large Data Bases
%O VLDB 2018 Rio de Janeiro, Brazil, August 27-31, 2018

Conference paper

A. Nikitin, C. Laoudias, G. Chatzimilioudis, P. Karras, and D. Zeinalipour-Yazti

“Indoor Localization Accuracy Estimation from Fingerprint Data,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.

mehr

BibTeX

@inproceedings{mdm17-spate,
TITLE = {Indoor Localization Accuracy Estimation from Fingerprint Data},
AUTHOR = {Nikitin, Artyom and Laoudias, Christos and Chatzimilioudis, Georgios and Karras, Panagiotis and Zeinalipour-Yazti, Demetrios},
LANGUAGE = {eng},
ISBN = {978-1-5386-3932-0},
DOI = {10.1109/MDM.2017.34},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)},
PAGES = {196--205},
ADDRESS = {Daejeon, South Korea},
}

Endnote

%0 Conference Proceedings
%A Nikitin, Artyom
%A Laoudias, Christos
%A Chatzimilioudis, Georgios
%A Karras, Panagiotis
%A Zeinalipour-Yazti, Demetrios
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Indoor Localization Accuracy Estimation from Fingerprint Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-0832-6
%R 10.1109/MDM.2017.34
%D 2017
%B 18th IEEE International Conference on Mobile Data Management
%Z date of event: 2017-05-29 - 2017-06-01
%C Daejeon, South Korea
%B 18th IEEE International Conference on Mobile Data Management
%P 196 - 205
%I IEEE
%@ 978-1-5386-3932-0

Conference paper

A. Nikitin, C. Laoudias, G. Chatzimilioudis, P. Karras, and D. Zeinalipour-Yazti

“ACCES: Offline Accuracy Estimation for Fingerprint-based Localization,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.

mehr

BibTeX

@inproceedings{mdm17-spate-demo,
TITLE = {{ACCES}: Offline Accuracy Estimation for Fingerprint-based Localization},
AUTHOR = {Nikitin, Artyom and Laoudias, Christos and Chatzimilioudis, Georgios and Karras, Panagiotis and Zeinalipour-Yazti, Demetrios},
LANGUAGE = {eng},
ISBN = {978-1-5386-3932-0},
DOI = {10.1109/MDM.2017.61},
PUBLISHER = {IEEE Computer Society},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)},
PAGES = {358--359},
ADDRESS = {Daejeon, South Korea},
}

Endnote

%0 Conference Proceedings
%A Nikitin, Artyom
%A Laoudias, Christos
%A Chatzimilioudis, Georgios
%A Karras, Panagiotis
%A Zeinalipour-Yazti, Demetrios
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ACCES: Offline Accuracy Estimation for Fingerprint-based Localization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-082D-3
%R 10.1109/MDM.2017.61
%D 2017
%B 18th IEEE International Conference on Mobile Data Management
%Z date of event: 2017-05-29 - 2017-06-01
%C Daejeon, South Korea
%B 18th IEEE International Conference on Mobile Data Management
%P 358 - 359
%I IEEE Computer Society
%@ 978-1-5386-3932-0

Conference paper

S. Paramonov, D. Stepanova, and P. Miettinen

“Hybrid ASP-based Approach to Pattern Mining,” in Lecture Notes in Computer Science, London, UK, 2017, vol. 10364.

mehr

BibTeX

@inproceedings{StepanovaRR2017,
TITLE = {Hybrid {ASP}-based Approach to Pattern Mining},
AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-3-319-61251-5},
PUBLISHER = {Springer},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Rules and Reasoning (RuleML+RR 2017)},
PAGES = {199--214},
BOOKTITLE = {Lecture Notes in Computer Science},
VOLUME = {10364},
ADDRESS = {London, UK},
}

Endnote

%0 Conference Proceedings
%A Paramonov, Sergey
%A Stepanova, Daria
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Hybrid ASP-based Approach to Pattern Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-8450-8
%D 2017
%B International Joint Conference on Rules and Reasoning
%Z date of event: 2017-07-12 - 2017-07-15
%C London, UK
%B Rules and Reasoning
%P 199 - 214
%I Springer
%@ 978-3-319-61251-5
%B Lecture Notes in Computer Science
%V 10364

Conference paper

T. Pelilissier Tanon, D. Stepanova, S. Razniewski, P. Mirza, and G. Weikum

“Completeness-Aware Rule Learning from Knowledge Graphs,” in The Semantic Web -- ISWC 2017, Vienna, Austria, 2017.

mehr

BibTeX

@inproceedings{StepanovaISWC2017,
TITLE = {Completeness-Aware Rule Learning from Knowledge Graphs},
AUTHOR = {Pelilissier Tanon, Thomas and Stepanova, Daria and Razniewski, Simon and Mirza, Paramita and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-68287-7},
DOI = {10.1007/978-3-319-68288-4_30},
PUBLISHER = {Springer},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {The Semantic Web -- ISWC 2017},
EDITOR = {d'Amato, Claudia and Fernandez, Miriam and Tamma, Valentina and Lecue, Freddy and Cudr{\'e}-Mauroux, Philippe and Sequeda, Juan and Lange, Christoph and Hefflin, Jeff},
PAGES = {507--525},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {10587},
ADDRESS = {Vienna, Austria},
}

Endnote

%0 Conference Proceedings
%A Pelilissier Tanon, Thomas
%A Stepanova, Daria
%A Razniewski, Simon
%A Mirza, Paramita
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Completeness-Aware Rule Learning from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-55D9-3
%R 10.1007/978-3-319-68288-4_30
%D 2017
%B 16th International Semantic Web Conference
%Z date of event: 2017-10-21 - 2017-10-25
%C Vienna, Austria
%B The Semantic Web -- ISWC 2017
%E d'Amato, Claudia; Fernandez, Miriam; Tamma, Valentina; Lecue, Freddy; Cudr&#233;-Mauroux, Philippe; Sequeda, Juan; Lange, Christoph; Hefflin, Jeff
%P 507 - 525
%I Springer
%@ 978-3-319-68287-7
%B Lecture Notes in Computer Science
%N 10587
%U https://iswc2017.ai.wu.ac.at/wp-content/uploads/papers/MainProceedings/324.pdf

Conference paper

R. Pienta, M. Kahng, Z. Lin, J. Vreeken, P. Talukdar, J. Abello, G. Parameswaran, and D. H. Chau

“FACETS: Adaptive Local Exploration of Large Graphs,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.

mehr

BibTeX

@inproceedings{pienta:17:facets,
TITLE = {{FACETS}: {A}daptive Local Exploration of Large Graphs},
AUTHOR = {Pienta, Robert and Kahng, Minsuk and Lin, Zhang and Vreeken, Jilles and Talukdar, Partha and Abello, James and Parameswaran, Ganesh and Chau, Duen Horng},
LANGUAGE = {eng},
ISBN = {978-1-611974-87-4},
DOI = {10.1137/1.9781611974973.67},
PUBLISHER = {SIAM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)},
EDITOR = {Chawla, Nitesh and Wang, Wei},
PAGES = {597--605},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Pienta, Robert
%A Kahng, Minsuk
%A Lin, Zhang
%A Vreeken, Jilles
%A Talukdar, Partha
%A Abello, James
%A Parameswaran, Ganesh
%A Chau, Duen Horng
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T FACETS: Adaptive Local Exploration of Large Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4BEA-D
%R 10.1137/1.9781611974973.67
%D 2017
%B 17th SIAM International Conference on Data Mining
%Z date of event: 2017-04-27 - 2017-04-29
%C Houston, TX, USA
%B Proceedings of the Seventeenth SIAM International Conference on Data Mining
%E Chawla, Nitesh; Wang, Wei
%P 597 - 605
%I SIAM
%@ 978-1-611974-87-4

Article

E. Pitoura, P. Tsaparas, G. Flouris, I. Fundulaki, P. Papadakos, S. Abiteboul, and G. Weikum

“On Measuring Bias in Online Information,” ACM SIGMOD Record, vol. 46, no. 4, 2017.

mehr

Abstract

Bias in online information has recently become a pressing issue, with search

engines, social networks and recommendation services being accused of

exhibiting some form of bias. In this vision paper, we make the case for a

systematic approach towards measuring bias. To this end, we discuss formal

measures for quantifying the various types of bias, we outline the system

components necessary for realizing them, and we highlight the related research

challenges and open problems.

BibTeX

@article{Pitoura2017a,
TITLE = {On Measuring Bias in Online Information},
AUTHOR = {Pitoura, Evaggelia and Tsaparas, Panayiotis and Flouris, Giorgos and Fundulaki, Irini and Papadakos, Panagiotis and Abiteboul, Serge and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1145/3186549.3186553},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.},
JOURNAL = {ACM SIGMOD Record},
VOLUME = {46},
NUMBER = {4},
PAGES = {16--21},
}

Endnote

%0 Journal Article
%A Pitoura, Evaggelia
%A Tsaparas, Panayiotis
%A Flouris, Giorgos
%A Fundulaki, Irini
%A Papadakos, Panagiotis
%A Abiteboul, Serge
%A Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On Measuring Bias in Online Information : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-EA0F-9
%R 10.1145/3186549.3186553
%7 2017
%D 2017
%X   Bias in online information has recently become a pressing issue, with search
engines, social networks and recommendation services being accused of
exhibiting some form of bias. In this vision paper, we make the case for a
systematic approach towards measuring bias. To this end, we discuss formal
measures for quantifying the various types of bias, we outline the system
components necessary for realizing them, and we highlight the related research
challenges and open problems.

%K Computer Science, Databases, cs.DB,Computer Science, Computers and Society, cs.CY
%J ACM SIGMOD Record
%V 46
%N 4
%& 16
%P 16 - 21
%I ACM
%C New York, NY

Paper

E. Pitoura, P. Tsaparas, G. Flouris, I. Fundulaki, P. Papadakos, S. Abiteboul, and G. Weikum

“On Measuring Bias in Online Information,” 2017. [Online]. Available: http://arxiv.org/abs/1704.05730.

mehr

Abstract

Bias in online information has recently become a pressing issue, with search

engines, social networks and recommendation services being accused of

exhibiting some form of bias. In this vision paper, we make the case for a

systematic approach towards measuring bias. To this end, we discuss formal

measures for quantifying the various types of bias, we outline the system

components necessary for realizing them, and we highlight the related research

challenges and open problems.

BibTeX

@online{Pitoura2017,
TITLE = {On Measuring Bias in Online Information},
AUTHOR = {Pitoura, Evaggelia and Tsaparas, Panayiotis and Flouris, Giorgos and Fundulaki, Irini and Papadakos, Panagiotis and Abiteboul, Serge and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1704.05730},
EPRINT = {1704.05730},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.},
}

Endnote

%0 Report
%A Pitoura, Evaggelia
%A Tsaparas, Panayiotis
%A Flouris, Giorgos
%A Fundulaki, Irini
%A Papadakos, Panagiotis
%A Abiteboul, Serge
%A Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On Measuring Bias in Online Information : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-8123-4
%U http://arxiv.org/abs/1704.05730
%D 2017
%X   Bias in online information has recently become a pressing issue, with search
engines, social networks and recommendation services being accused of
exhibiting some form of bias. In this vision paper, we make the case for a
systematic approach towards measuring bias. To this end, we discuss formal
measures for quantifying the various types of bias, we outline the system
components necessary for realizing them, and we highlight the related research
challenges and open problems.

%K Computer Science, Databases, cs.DB,Computer Science, Computers and Society, cs.CY

Conference paper

K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum

“Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media,” in WWW’17 Companion, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{PopatWWW2017a,
TITLE = {Where the Truth Lies: {E}xplaining the Credibility of Emerging Claims on the {W}eb and Social Media},
AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4914-7},
DOI = {10.1145/3041021.3055133},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17 Companion},
PAGES = {1003--1012},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Popat, Kashyap
%A Mukherjee, Subhabrata
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4CD8-D
%R 10.1145/3041021.3055133
%D 2017
%B 26th International Conference on World Wide Web 
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17 Companion
%P 1003 - 1012
%I ACM
%@ 978-1-4503-4914-7

Conference paper

K. Popat

“Assessing the Credibility of Claims on the Web,” in WWW’17 Companion, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{PopatWWW2017b,
TITLE = {Assessing the Credibility of Claims on the {Web}},
AUTHOR = {Popat, Kashyap},
LANGUAGE = {eng},
ISBN = {978-1-4503-4914-7},
DOI = {10.1145/3041021.3053379},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17 Companion},
PAGES = {735--739},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Popat, Kashyap
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Assessing the Credibility of Claims on the Web : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-90CC-2
%R 10.1145/3041021.3053379
%D 2017
%B 26th International Conference on World Wide Web
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17 Companion
%P 735 - 739
%I ACM
%@ 978-1-4503-4914-7

Conference paper

Y. Ran, B. He, K. Hui, J. Xu, and L. Sun

“A Document-Based Neural Relevance Model for Effective Clinical Decision Support,” in 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017), Kansas City, MO, USA, 2017.

mehr

BibTeX

@inproceedings{RanBIBM2017,
TITLE = {A Document-Based Neural Relevance Model for Effective Clinical Decision Support},
AUTHOR = {Ran, Yanhua and He, Ben and Hui, Kai and Xu, Jungang and Sun, Le},
LANGUAGE = {eng},
ISBN = {978-1-5090-3050-7},
DOI = {10.1109/BIBM.2017.8217757},
PUBLISHER = {IEEE},
YEAR = {2017},
BOOKTITLE = {2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017)},
EDITOR = {Hu, Xiaohua and Shyu, Chi-Ren and Bromberg, Yana and Gao, Jean and Gong, Yang and Korkin, Dmitry and Yoo, Illhoi and Zheng, Jane Huiru},
PAGES = {798--804},
ADDRESS = {Kansas City, MO, USA},
}

Endnote

%0 Conference Proceedings
%A Ran, Yanhua
%A He, Ben
%A Hui, Kai
%A Xu, Jungang
%A Sun, Le
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T A Document-Based Neural Relevance Model for Effective Clinical Decision Support : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-EA3D-5
%R 10.1109/BIBM.2017.8217757
%D 2017
%B IEEE International Conference on Bioinformatics and Biomedicine 
%Z date of event: 2017-11-13 - 2017-11-16
%C Kansas City, MO, USA
%B 2017 IEEE International Conference on Bioinformatics and Biomedicine 
%E Hu, Xiaohua; Shyu, Chi-Ren; Bromberg, Yana; Gao, Jean; Gong, Yang; Korkin, Dmitry; Yoo, Illhoi ; Zheng, Jane Huiru
%P 798 - 804
%I IEEE
%@ 978-1-5090-3050-7

Conference paper

S. Razniewski, V. Balaraman, and W. Nutt

“Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties,” in Advanced Data Mining and Applications (ADMA 2017), Singapore, 2017.

mehr

BibTeX

@inproceedings{Razniewski_ADMA2017,
TITLE = {Doctoral Advisor or Medical Condition: {T}owards Entity-Specific Rankings of Knowledge Base Properties},
AUTHOR = {Razniewski, Simon and Balaraman, Vevake and Nutt, Werner},
LANGUAGE = {eng},
ISBN = {978-3-319-69178-7},
DOI = {10.1007/978-3-319-69179-4_37},
PUBLISHER = {Springer},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Advanced Data Mining and Applications (ADMA 2017)},
EDITOR = {Cong, Gao and Peng, Wen-Chin and Zhang, Wei Emma and Li, Chengliang and Sun, Aixin},
PAGES = {526--540},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {10604},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Razniewski, Simon
%A Balaraman, Vevake
%A Nutt, Werner
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-2C05-A
%R 10.1007/978-3-319-69179-4_37
%D 2017
%B 13th International Conference on Advanced Data Mining and Applications
%Z date of event: 2017-11-05 - 2017-11-06
%C Singapore
%B Advanced Data Mining and Applications
%E Cong, Gao; Peng, Wen-Chin; Zhang, Wei Emma; Li, Chengliang; Sun, Aixin
%P 526 - 540
%I Springer
%@ 978-3-319-69178-7
%B Lecture Notes in Artificial Intelligence
%N 10604

Conference paper

N. Reiter, E. Gius, J. Strötgen, and M. Willand

“A Shared Task for a Shared Goal: Systematic Annotation of Literary,” in Digital Humanities 2017 (DH 2017), Montréal, Canada, 2017.

mehr

BibTeX

@inproceedings{StroetgenDH2017,
TITLE = {A Shared Task for a Shared Goal: {S}ystematic Annotation of Literary},
AUTHOR = {Reiter, Nils and Gius, Evelyn and Str{\"o}tgen, Jannik and Willand, Marcus},
LANGUAGE = {eng},
URL = {https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf},
YEAR = {2017},
BOOKTITLE = {Digital Humanities 2017 (DH 2017)},
EDITOR = {Lewis, Rihan},
EID = {192},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Reiter, Nils
%A Gius, Evelyn
%A Str&#246;tgen, Jannik
%A Willand, Marcus
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T A Shared Task for a Shared Goal: Systematic Annotation of Literary : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-7BDC-3
%D 2017
%B Digital Humanities
%Z date of event: 2017-08-08 - 2017-08-11
%C Montr&#233;al, Canada
%B Digital Humanities 2017
%E Lewis, Rihan
%Z sequence number: 192

Conference paper

M. Ringsquandl, E. Kharlamov, D. Stepanova, S. Lamparter, R. Lepratti, I. Horrocks, and P. Kroeger

“On Event-driven Knowledge Graph Completion in Digital Factories,” in IEEE International Conference on Big Data, Boston, MA, US, 2017.

mehr

BibTeX

@inproceedings{RingsquandlBD2018,
TITLE = {On Event-driven Knowledge Graph Completion in Digital Factories},
AUTHOR = {Ringsquandl, Martin and Kharlamov, Evgeny and Stepanova, Daria and Lamparter, Steffen and Lepratti, Raffaello and Horrocks, Ian and Kroeger, Peer},
LANGUAGE = {eng},
ISBN = {978-1-5386-2715-0},
DOI = {10.1109/BigData.2017.8258105},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {IEEE International Conference on Big Data},
DEBUG = {author: Cuzzocrea, Alfredo; author: Tang, Jian; author: Toyoda, Masashi},
EDITOR = {Nie, Jian-Yun and Obradovic, Zoran and Suzumura, Toyotaro and Ghosh, Rumi and Nambia, Raghunath and Wang, Chonggang and Zang, Hui and Baeza-Yates, Ricarda and Hu, Xiaohua and Kepner, Jeremy},
PAGES = {1676--1681},
ADDRESS = {Boston, MA, US},
}

Endnote

%0 Conference Proceedings
%A Ringsquandl, Martin
%A Kharlamov, Evgeny
%A Stepanova, Daria
%A Lamparter, Steffen
%A Lepratti, Raffaello
%A Horrocks, Ian
%A Kroeger, Peer
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T On Event-driven Knowledge Graph Completion in Digital Factories : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0001-3824-8
%R 10.1109/BigData.2017.8258105
%D 2017
%B IEEE International Conference on Big Data
%Z date of event: 2017-12-11 - 2017-12-14
%C Boston, MA, US
%B IEEE International Conference on Big Data 
%E Nie, Jian-Yun; Obradovic, Zoran; Suzumura, Toyotaro; Ghosh, Rumi; Nambia, Raghunath; Wang, Chonggang; Zang, Hui; Baeza-Yates, Ricarda; Hu, Xiaohua; Kepner, Jeremy; Cuzzocrea, Alfredo; Tang, Jian; Toyoda, Masashi
%P 1676 - 1681
%I IEEE
%@ 978-1-5386-2715-0

Conference paper

B. Roel, J. Vreeken, and A. Siebes

“Efficiently Discovering Unexpected Pattern-Co-Occurrences,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.

mehr

BibTeX

@inproceedings{RoelSDM2017,
TITLE = {Efficiently Discovering Unexpected Pattern-Co-Occurrences},
AUTHOR = {Roel, Bertens and Vreeken, Jilles and Siebes, Arno},
LANGUAGE = {eng},
ISBN = {978-1-611974-87-4},
DOI = {10.1137/1.9781611974973.15},
PUBLISHER = {SIAM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)},
EDITOR = {Chawla, Nitesh and Wang, Wei},
PAGES = {126--134},
ADDRESS = {Houston, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Roel, Bertens
%A Vreeken, Jilles
%A Siebes, Arno
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Efficiently Discovering Unexpected Pattern-Co-Occurrences : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-066E-3
%R 10.1137/1.9781611974973.15
%D 2017
%B 17th SIAM International Conference on Data Mining
%Z date of event: 2017-04-27 - 2017-04-29
%C Houston, TX, USA
%B Proceedings of the Seventeenth SIAM International Conference on Data Mining
%E Chawla, Nitesh; Wang, Wei
%P 126 - 134
%I SIAM
%@ 978-1-611974-87-4

Article

D2D5

A. Rohrbach, A. Torabi, M. Rohrbach, N. Tandon, C. Pal, H. Larochelle, A. Courville, and B. Schiele

“Movie Description,” International Journal of Computer Vision, vol. 123, no. 1, 2017.

mehr

Abstract

Audio Description (AD) provides linguistic descriptions of movies and allows

visually impaired people to follow a movie along with their peers. Such

descriptions are by design mainly visual and thus naturally form an interesting

data source for computer vision and computational linguistics. In this work we

propose a novel dataset which contains transcribed ADs, which are temporally

aligned to full length movies. In addition we also collected and aligned movie

scripts used in prior work and compare the two sources of descriptions. In

total the Large Scale Movie Description Challenge (LSMDC) contains a parallel

corpus of 118,114 sentences and video clips from 202 movies. First we

characterize the dataset by benchmarking different approaches for generating

video descriptions. Comparing ADs to scripts, we find that ADs are indeed more

visual and describe precisely what is shown rather than what should happen

according to the scripts created prior to movie production. Furthermore, we

present and compare the results of several teams who participated in a

challenge organized in the context of the workshop "Describing and

Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at

ICCV 2015.

BibTeX

@article{RohrbachMovie,
TITLE = {Movie Description},
AUTHOR = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Christopher and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt},
LANGUAGE = {eng},
DOI = {10.1007/s11263-016-0987-1},
PUBLISHER = {Springer},
ADDRESS = {London},
YEAR = {2017},
DATE = {2017},
ABSTRACT = {Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015.},
JOURNAL = {International Journal of Computer Vision},
VOLUME = {123},
NUMBER = {1},
PAGES = {94--120},
}

Endnote

%0 Journal Article
%A Rohrbach, Anna
%A Torabi, Atousa
%A Rohrbach, Marcus
%A Tandon, Niket
%A Pal, Christopher
%A Larochelle, Hugo
%A Courville, Aaron
%A Schiele, Bernt
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Movie Description : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-FD03-C
%R 10.1007/s11263-016-0987-1
%7 2017-01-25
%D 2017
%X   Audio Description (AD) provides linguistic descriptions of movies and allows
visually impaired people to follow a movie along with their peers. Such
descriptions are by design mainly visual and thus naturally form an interesting
data source for computer vision and computational linguistics. In this work we
propose a novel dataset which contains transcribed ADs, which are temporally
aligned to full length movies. In addition we also collected and aligned movie
scripts used in prior work and compare the two sources of descriptions. In
total the Large Scale Movie Description Challenge (LSMDC) contains a parallel
corpus of 118,114 sentences and video clips from 202 movies. First we
characterize the dataset by benchmarking different approaches for generating
video descriptions. Comparing ADs to scripts, we find that ADs are indeed more
visual and describe precisely what is shown rather than what should happen
according to the scripts created prior to movie production. Furthermore, we
present and compare the results of several teams who participated in a
challenge organized in the context of the workshop "Describing and
Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at
ICCV 2015.

%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Computation and Language, cs.CL
%J International Journal of Computer Vision
%O IJCV
%V 123
%N 1
%& 94
%P 94 - 120
%I Springer
%C London

Conference paper

R. Saha Roy, A. Singh, P. Chawla, S. Saxena, and A. R. Sinha

“Automatic Assignment of Topical Icons to Documents for Faster File Navigation,” in 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), Kyoto, Japan, 2017.

mehr

BibTeX

@inproceedings{Roy_ICDAR2017,
TITLE = {Automatic Assignment of Topical Icons to Documents for Faster File Navigation},
AUTHOR = {Saha Roy, Rishiraj and Singh, Abhijeet and Chawla, Prashant and Saxena, Shubham and Sinha, Atanu R.},
LANGUAGE = {eng},
ISSN = {2379-2140},
DOI = {10.1109/ICDAR.2017.220},
PUBLISHER = {IEEE},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017)},
PAGES = {1338--1345},
ADDRESS = {Kyoto, Japan},
}

Endnote

%0 Conference Proceedings
%A Saha Roy, Rishiraj
%A Singh, Abhijeet
%A Chawla, Prashant
%A Saxena, Shubham
%A Sinha, Atanu R.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Automatic Assignment of Topical Icons to Documents for Faster File Navigation : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-A109-E
%R 10.1109/ICDAR.2017.220
%D 2017
%B 14th IAPR International Conference on Document Analysis and Recognition 
%Z date of event: 2017-11-13 - 2017-11-15
%C Kyoto, Japan
%B 14th IAPR International Conference on Document Analysis and Recognition 
%P 1338 - 1345
%I IEEE
%@ false

Conference paper

V. Setty, A. Anand, A. Mishra, and A. Anand

“Modeling Event Importance for Ranking Daily News Events,” in WSDM’17, 10th ACM International Conference on Web Search and Data Mining, Cambridge, UK, 2017.

mehr

BibTeX

@inproceedings{Setii2017,
TITLE = {Modeling Event Importance for Ranking Daily News Events},
AUTHOR = {Setty, Vinay and Anand, Abhijit and Mishra, Arunav and Anand, Avishek},
LANGUAGE = {eng},
ISBN = {978-1-4503-4675-7},
DOI = {10.1145/3018661.3018728},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WSDM'17, 10th ACM International Conference on Web Search and Data Mining},
PAGES = {231--240},
ADDRESS = {Cambridge, UK},
}

Endnote

%0 Conference Proceedings
%A Setty, Vinay
%A Anand, Abhijit
%A Mishra, Arunav
%A Anand, Avishek
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Modeling Event Importance for Ranking Daily News Events : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-26D5-9
%R 10.1145/3018661.3018728
%D 2017
%B 10th ACM International Conference on Web Search and Data Mining
%Z date of event: 2017-02-06 - 2017-02-10
%C Cambridge, UK
%B WSDM'17
%P 231 - 240
%I ACM
%@ 978-1-4503-4675-7

Paper

D. Seyler, T. Dembelova, L. Del Corro, J. Hoffart, and G. Weikum

“KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition,” 2017. [Online]. Available: http://arxiv.org/abs/1709.03544.

mehr

Abstract

KnowNER is a multilingual Named Entity Recognition (NER) system that

leverages different degrees of external knowledge. A novel modular framework

divides the knowledge into four categories according to the depth of knowledge

they convey. Each category consists of a set of features automatically

generated from different information sources (such as a knowledge-base, a list

of names or document-specific semantic annotations) and is used to train a

conditional random field (CRF). Since those information sources are usually

multilingual, KnowNER can be easily trained for a wide range of languages. In

this paper, we show that the incorporation of deeper knowledge systematically

boosts accuracy and compare KnowNER with state-of-the-art NER approaches across

three languages (i.e., English, German and Spanish) performing amongst

state-of-the art systems in all of them.

BibTeX

@online{Seyler_arXiv2017,
TITLE = {{KnowNER}: Incremental Multilingual {Knowledge} in {Named Entity Recognition}},
AUTHOR = {Seyler, Dominic and Dembelova, Tatiana and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1709.03544},
EPRINT = {1709.03544},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.},
}

Endnote

%0 Report
%A Seyler, Dominic
%A Dembelova, Tatiana
%A Del Corro, Luciano
%A Hoffart, Johannes
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-0693-D
%U http://arxiv.org/abs/1709.03544
%D 2017
%X   KnowNER is a multilingual Named Entity Recognition (NER) system that
leverages different degrees of external knowledge. A novel modular framework
divides the knowledge into four categories according to the depth of knowledge
they convey. Each category consists of a set of features automatically
generated from different information sources (such as a knowledge-base, a list
of names or document-specific semantic annotations) and is used to train a
conditional random field (CRF). Since those information sources are usually
multilingual, KnowNER can be easily trained for a wide range of languages. In
this paper, we show that the incorporation of deeper knowledge systematically
boosts accuracy and compare KnowNER with state-of-the-art NER approaches across
three languages (i.e., English, German and Spanish) performing amongst
state-of-the art systems in all of them.

%K Computer Science, Computation and Language, cs.CL

Conference paper

D. Seyler, M. Yahya, and K. Berberich

“Knowledge Questions from Knowledge Graphs,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.

mehr

BibTeX

@inproceedings{SeylerICTIR2017,
TITLE = {Knowledge Questions from Knowledge Graphs},
AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-4490-6},
DOI = {10.1145/3121050.3121073},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval},
PAGES = {11--18},
ADDRESS = {Amsterdam, The Netherlands},
}

Endnote

%0 Conference Proceedings
%A Seyler, Dominic
%A Yahya, Mohamed
%A Berberich, Klaus
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Questions from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-0647-A
%R 10.1145/3121050.3121073
%D 2017
%B 7th  International  Conference  on  the  Theory  of  Information 
Retrieval  
%Z date of event: 2017-10-01 - 2017-10-04
%C Amsterdam, The Netherlands
%B ICTIR'17
%P 11 - 18
%I ACM
%@ 978-1-4503-4490-6

Article

L. Soldaini, A. Yates, and N. Goharian

“Learning to Reformulate Long Queries for Clinical Decision Support,” Journal of the Association for Information Science and Technology, vol. 68, no. 11, 2017.

mehr

BibTeX

@article{Soldaini2017,
TITLE = {Learning to Reformulate Long Queries for Clinical Decision Support},
AUTHOR = {Soldaini, Luca and Yates, Andrew and Goharian, Nazli},
LANGUAGE = {eng},
ISSN = {2330-1635},
DOI = {10.1002/asi.23924},
PUBLISHER = {Wiley},
ADDRESS = {Chichester, UK},
YEAR = {2017},
JOURNAL = {Journal of the Association for Information Science and Technology},
VOLUME = {68},
NUMBER = {11},
PAGES = {2602--2619},
}

Endnote

%0 Journal Article
%A Soldaini, Luca
%A Yates, Andrew
%A Goharian, Nazli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Learning to Reformulate Long Queries for Clinical Decision Support : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-2723-C
%R 10.1002/asi.23924
%7 2017-09-14
%D 2017
%8 14.09.2017
%J Journal of the Association for Information Science and Technology
%O asis&t
%V 68
%N 11
%& 2602
%P 2602 - 2619
%I Wiley
%C Chichester, UK
%@ false

Conference paper

L. Soldaini, A. Yates, and N. Goharian

“Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.

mehr

BibTeX

@inproceedings{Soldaini_CIKM2017,
TITLE = {Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model},
AUTHOR = {Soldaini, Luca and Yates, Andrew and Goharian, Nazli},
LANGUAGE = {eng},
ISBN = {978-1-4503-4918-5},
DOI = {10.1145/3132847.3133149},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management},
PAGES = {2307--2310},
ADDRESS = {Singapore, Singapore},
}

Endnote

%0 Conference Proceedings
%A Soldaini, Luca
%A Yates, Andrew
%A Goharian, Nazli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-02F8-4
%R 10.1145/3132847.3133149
%D 2017
%B 26th ACM International Conference on Information and Knowledge Management 
%Z date of event: 2017-11-06 - 2017-11-10
%C Singapore, Singapore
%B CIKM'17
%P 2307 - 2310
%I ACM
%@ 978-1-4503-4918-5

Conference paper

J. Stoyanovich, B. Howe, S. Abiteboul, G. Miklau, A. Sahuguet, and G. Weikum

“Fides: Towards a Platform for Responsible Data Science,” in 29th International Conference on Scientific and Statistical Database Management (SSDBM 2017), Chicago, IL, USA, 2017.

mehr

BibTeX

@inproceedings{StoyanovichSSDBM2017,
TITLE = {Fides: {T}owards a Platform for Responsible Data Science},
AUTHOR = {Stoyanovich, Julia and Howe, Bill and Abiteboul, Serge and Miklau, Gerome and Sahuguet, Arnaud and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-5282-6},
DOI = {10.1145/3085504.3085530},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {29th International Conference on Scientific and Statistical Database Management (SSDBM 2017)},
EID = {26},
ADDRESS = {Chicago, IL, USA},
}

Endnote

%0 Conference Proceedings
%A Stoyanovich, Julia
%A Howe, Bill
%A Abiteboul, Serge
%A Miklau, Gerome
%A Sahuguet, Arnaud
%A Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Fides: Towards a Platform for Responsible Data Science : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-80BA-B
%R 10.1145/3085504.3085530
%D 2017
%B 29th International Conference on Scientific and Statistical Database Management
%Z date of event: 2017-06-27 - 2017-06-29
%C Chicago, IL, USA
%B 29th International Conference on Scientific and Statistical Database Management
%Z sequence number: 26
%I ACM
%@ 978-1-4503-5282-6

Conference paper

N. Tandon, G. de Melo, and G. Weikum

“WebChild 2.0: Fine-Grained Commonsense Knowledge Distillation,” in The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 2017.

mehr

BibTeX

@inproceedings{TandonACL2017,
TITLE = {{WebChild} 2.0: {F}ine-Grained Commonsense Knowledge Distillation},
AUTHOR = {Tandon, Niket and de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-945626-76-0},
DOI = {10.18653/v1/P17-4020},
PUBLISHER = {ACL},
YEAR = {2017},
BOOKTITLE = {The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)},
PAGES = {115--120},
ADDRESS = {Vancouver, Canada},
}

Endnote

%0 Conference Proceedings
%A Tandon, Niket
%A de Melo, Gerard
%A Weikum, Gerhard
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T WebChild 2.0: Fine-Grained Commonsense Knowledge Distillation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-FAC3-A
%R 10.18653/v1/P17-4020
%D 2017
%B The 55th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2017-07-30 - 2017-08-04
%C Vancouver, Canada
%B The 55th Annual Meeting of the Association for Computational Linguistics
%P 115 - 120
%I ACL
%@ 978-1-945626-76-0

Article

C. Teflioudi and R. Gemulla

“Exact and Approximate Maximum Inner Product Search with LEMP,” ACM Transactions on Database Systems, vol. 42, no. 1, 2017.

mehr

BibTeX

@article{Teflioudi:2016:EAM:3015779.2996452,
TITLE = {Exact and Approximate Maximum Inner Product Search with {LEMP}},
AUTHOR = {Teflioudi, Christina and Gemulla, Rainer},
LANGUAGE = {eng},
ISSN = {0362-5915},
DOI = {10.1145/2996452},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2017},
DATE = {2017},
JOURNAL = {ACM Transactions on Database Systems},
VOLUME = {42},
NUMBER = {1},
EID = {5},
}

Endnote

%0 Journal Article
%A Teflioudi, Christina
%A Gemulla, Rainer
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Exact and Approximate Maximum Inner Product Search with LEMP : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-349C-B
%R 10.1145/2996452
%7 2016
%D 2017
%J ACM Transactions on Database Systems
%O TODS
%V 42
%N 1
%Z sequence number: 5
%I ACM
%C New York, NY
%@ false

Thesis

E. N. Toosi

“A New Efficient and Scalable Algorithm for Boolean Matrix Factorization,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

BibTeX

@mastersthesis{ToosiMsc2017,
TITLE = {A New Efficient and Scalable Algorithm for {Boolean} Matrix Factorization},
AUTHOR = {Toosi, Ehsan Nadjaran},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
}

Endnote

%0 Thesis
%A Toosi, Ehsan Nadjaran
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A New Efficient and Scalable Algorithm for Boolean Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-90D5-E
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P X, 70 p.
%V master
%9 master

Conference paper

H. D. Tran, D. Stepanova, M. Gad-Elrab, F. A. Lisi, and G. Weikum

“Towards Nonmonotonic Relational Learning from Knowledge Graphs,” in Inductive Logic Programming (ILP 2016), London, UK, 2017.

mehr

BibTeX

@inproceedings{TranILP2016,
TITLE = {Towards Nonmonotonic Relational Learning from Knowledge Graphs},
AUTHOR = {Tran, Hai Dang and Stepanova, Daria and Gad-Elrab, Mohamed and Lisi, Francesca A. and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-63341-1},
DOI = {10.1007/978-3-319-63342-8_8},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2017},
BOOKTITLE = {Inductive Logic Programming (ILP 2016)},
EDITOR = {Cussens, James and Russo, Alessandra},
PAGES = {94--107},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {10326},
ADDRESS = {London, UK},
}

Endnote

%0 Conference Proceedings
%A Tran, Hai Dang
%A Stepanova, Daria
%A Gad-Elrab, Mohamed
%A Lisi, Francesca A.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Towards Nonmonotonic Relational Learning from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2DB1-E
%R 10.1007/978-3-319-63342-8_8
%D 2017
%B 26th International Conference on Inductive Logic Programming
%Z date of event: 2016-09-04 - 2016-09-06
%C London, UK
%B Inductive Logic Programming
%E Cussens, James; Russo, Alessandra
%P 94 - 107
%I Springer
%@ 978-3-319-63341-1
%B Lecture Notes in Artificial Intelligence
%N 10326

Thesis

H. D. Tran

“An Approach to Nonmonotonic Relational Learning from Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

BibTeX

@mastersthesis{TranMSc2017,
TITLE = {An Approach to Nonmonotonic Relational Learning from Knowledge Graphs},
AUTHOR = {Tran, Hai Dang},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
}

Endnote

%0 Thesis
%A Tran, Hai Dang
%Y Stepanova, Daria
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T An Approach to Nonmonotonic Relational Learning from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-845A-3
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P XV, 48 p.
%V master
%9 master

Conference paper

G. Weikum

“What Computers Should Know, Shouldn’t Know, and Shouldn’t Believe,” in WWW’17 Companion, Perth, Australia, 2017.

mehr

BibTeX

@inproceedings{WeikumWWW2017,
TITLE = {What Computers Should Know, Shouldn{\textquoteright}t Know, and Shouldn{\textquoteright}t Believe},
AUTHOR = {Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4914-7},
DOI = {10.1145/3041021.3051120},
PUBLISHER = {ACM},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {WWW'17 Companion},
PAGES = {1559--1560},
ADDRESS = {Perth, Australia},
}

Endnote

%0 Conference Proceedings
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T What Computers Should Know, Shouldn&#8217;t Know, and Shouldn&#8217;t Believe : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-7DA0-5
%R 10.1145/3041021.3051120
%D 2017
%B 26th International Conference on World Wide Web 
%Z date of event: 2017-04-03 - 2017-04-07
%C Perth, Australia
%B WWW'17 Companion
%P 1559 - 1560
%I ACM
%@ 978-1-4503-4914-7

Paper

A. Yates, A. Cohan, and N. Goharian

“Depression and Self-Harm Risk Assessment in Online Forums,” 2017. [Online]. Available: http://arxiv.org/abs/1709.01848.

mehr

Abstract

Users suffering from mental health conditions often turn to online resources

for support, including specialized online support communities or general

communities such as Twitter and Reddit. In this work, we present a neural

framework for supporting and studying users in both types of communities. We

propose methods for identifying posts in support communities that may indicate

a risk of self-harm, and demonstrate that our approach outperforms strong

previously proposed methods for identifying such posts. Self-harm is closely

related to depression, which makes identifying depressed users on general

forums a crucial related task. We introduce a large-scale general forum dataset

("RSDD") consisting of users with self-reported depression diagnoses matched

with control users. We show how our method can be applied to effectively

identify depressed users from their use of language alone. We demonstrate that

our method outperforms strong baselines on this general forum dataset.

BibTeX

@online{Yates_arXiv2017b,
TITLE = {Depression and Self-Harm Risk Assessment in Online Forums},
AUTHOR = {Yates, Andrew and Cohan, Arman and Goharian, Nazli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1709.01848},
EPRINT = {1709.01848},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.},
}

Endnote

%0 Report
%A Yates, Andrew
%A Cohan, Arman
%A Goharian, Nazli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Depression and Self-Harm Risk Assessment in Online Forums : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-06C8-6
%U http://arxiv.org/abs/1709.01848
%D 2017
%X   Users suffering from mental health conditions often turn to online resources
for support, including specialized online support communities or general
communities such as Twitter and Reddit. In this work, we present a neural
framework for supporting and studying users in both types of communities. We
propose methods for identifying posts in support communities that may indicate
a risk of self-harm, and demonstrate that our approach outperforms strong
previously proposed methods for identifying such posts. Self-harm is closely
related to depression, which makes identifying depressed users on general
forums a crucial related task. We introduce a large-scale general forum dataset
("RSDD") consisting of users with self-reported depression diagnoses matched
with control users. We show how our method can be applied to effectively
identify depressed users from their use of language alone. We demonstrate that
our method outperforms strong baselines on this general forum dataset.

%K Computer Science, Computation and Language, cs.CL

Conference paper

A. Yates, A. Cohan, and N. Goharian

“Depression and Self-Harm Risk Assessment in Online Forums,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.

mehr

BibTeX

@inproceedings{YatesENMLP2017,
TITLE = {Depression and Self-Harm Risk Assessment in Online Forums},
AUTHOR = {Yates, Andrew and Cohan, Arman and Goharian, Nazli},
LANGUAGE = {eng},
ISBN = {978-1-945626-83-8},
URL = {https://aclanthology.info/pdf/D/D17/D17-1321.pdf},
PUBLISHER = {ACL},
YEAR = {2017},
DATE = {2017},
BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)},
PAGES = {2958--2968},
ADDRESS = {Copenhagen, Denmark},
}

Endnote

%0 Conference Proceedings
%A Yates, Andrew
%A Cohan, Arman
%A Goharian, Nazli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Depression and Self-Harm Risk Assessment in Online Forums : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-06A0-D
%U https://aclanthology.info/pdf/D/D17/D17-1321.pdf
%D 2017
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2017-09-09 - 2017-09-11
%C Copenhagen, Denmark
%B The Conference on Empirical Methods in Natural Language Processing

%P 2958 - 2968
%I ACL
%@ 978-1-945626-83-8
%U https://aclanthology.info/pdf/D/D17/D17-1321.pdf

Paper

A. Yates and K. Hui

“DE-PACRR: Exploring Layers Inside the PACRR Model,” 2017. [Online]. Available: http://arxiv.org/abs/1706.08746.

mehr

Abstract

Recent neural IR models have demonstrated deep learning's utility in ad-hoc

information retrieval. However, deep models have a reputation for being black

boxes, and the roles of a neural IR model's components may not be obvious at

first glance. In this work, we attempt to shed light on the inner workings of a

recently proposed neural IR model, namely the PACRR model, by visualizing the

output of intermediate layers and by investigating the relationship between

intermediate weights and the ultimate relevance score produced. We highlight

several insights, hoping that such insights will be generally applicable.

BibTeX

@online{Yates_arXiv2017,
TITLE = {{DE}-{PACRR}: Exploring Layers Inside the {PACRR} Model},
AUTHOR = {Yates, Andrew and Hui, Kai},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1706.08746},
EPRINT = {1706.08746},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable.},
}

Endnote

%0 Report
%A Yates, Andrew
%A Hui, Kai
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T DE-PACRR: Exploring Layers Inside the PACRR Model : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002E-06BE-D
%U http://arxiv.org/abs/1706.08746
%D 2017
%X   Recent neural IR models have demonstrated deep learning's utility in ad-hoc
information retrieval. However, deep models have a reputation for being black
boxes, and the roles of a neural IR model's components may not be obvious at
first glance. In this work, we attempt to shed light on the inner workings of a
recently proposed neural IR model, namely the PACRR model, by visualizing the
output of intermediate layers and by investigating the relationship between
intermediate weights and the ultimate relevance score produced. We highlight
several insights, hoping that such insights will be generally applicable.

%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Article

D. Zeinalipour-Yazti and C. Laoudias

“The Anatomy of the Anyplace Indoor Navigation Service,” SIGSPATIAL Special, vol. 9, no. 2, 2017.

mehr

BibTeX

@article{Zeinalipour-Yazti:2017:AAI:3151123.3151125,
TITLE = {The Anatomy of the Anyplace Indoor Navigation Service},
AUTHOR = {Zeinalipour-Yazti, Demetrios and Laoudias, Christos},
LANGUAGE = {eng},
ISSN = {1946-7729},
DOI = {10.1145/3151123.3151125},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2017},
DATE = {2017},
JOURNAL = {SIGSPATIAL Special},
VOLUME = {9},
NUMBER = {2},
PAGES = {3--10},
}

Endnote

%0 Journal Article
%A Zeinalipour-Yazti, Demetrios
%A Laoudias, Christos
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T The Anatomy of the Anyplace Indoor Navigation Service : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0002-CA02-8
%R 10.1145/3151123.3151125
%7 2017
%D 2017
%J SIGSPATIAL Special
%V 9
%N 2
%& 3
%P 3 - 10
%I ACM
%C New York, NY
%@ false

Paper

Y. Zhang, M. Humbert, B. Surma, P. Manoharan, J. Vreeken, and M. Backes

“CTRL+Z: Recovering Anonymized Social Graphs,” 2017. [Online]. Available: http://arxiv.org/abs/1711.05441.

mehr

Abstract

Social graphs derived from online social interactions contain a wealth of

information that is nowadays extensively used by both industry and academia.

However, due to the sensitivity of information contained in such social graphs,

they need to be properly anonymized before release. Most of the graph

anonymization techniques that have been proposed to sanitize social graph data

rely on the perturbation of the original graph's structure, more specifically

of its edge set. In this paper, we identify a fundamental weakness of these

edge-based anonymization mechanisms and exploit it to recover most of the

original graph structure.

First, we propose a method to quantify an edge's plausibility in a given

graph by relying on graph embedding. Our experiments on three real-life social

network datasets under two widely known graph anonymization mechanisms

demonstrate that this method can very effectively detect fake edges with AUC

values above 0.95 in most cases. Second, by relying on Gaussian mixture models

and maximum a posteriori probability estimation, we derive an optimal decision

rule to detect whether an edge is fake based on the observed graph data. We

further demonstrate that this approach concretely jeopardizes the privacy

guarantees provided by the considered graph anonymization mechanisms. To

mitigate this vulnerability, we propose a method to generate fake edges as

plausible as possible given the graph structure and incorporate it into the

existing anonymization mechanisms. Our evaluation demonstrates that the

enhanced mechanisms not only decrease the chances of graph recovery (with AUC

dropping by up to 35%), but also provide even better graph utility than

existing anonymization methods.

BibTeX

@online{Zhang1711.05441,
TITLE = {{CTRL}+Z: Recovering Anonymized Social Graphs},
AUTHOR = {Zhang, Yang and Humbert, Mathias and Surma, Bartlomiej and Manoharan, Praveen and Vreeken, Jilles and Backes, Michael},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1711.05441},
EPRINT = {1711.05441},
EPRINTTYPE = {arXiv},
YEAR = {2017},
ABSTRACT = {Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods.},
}

Endnote

%0 Report
%A Zhang, Yang
%A Humbert, Mathias
%A Surma, Bartlomiej
%A Manoharan, Praveen
%A Vreeken, Jilles
%A Backes, Michael
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T CTRL+Z: Recovering Anonymized Social Graphs : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-6463-0
%U http://arxiv.org/abs/1711.05441
%D 2017
%X   Social graphs derived from online social interactions contain a wealth of
information that is nowadays extensively used by both industry and academia.
However, due to the sensitivity of information contained in such social graphs,
they need to be properly anonymized before release. Most of the graph
anonymization techniques that have been proposed to sanitize social graph data
rely on the perturbation of the original graph's structure, more specifically
of its edge set. In this paper, we identify a fundamental weakness of these
edge-based anonymization mechanisms and exploit it to recover most of the
original graph structure.
  First, we propose a method to quantify an edge's plausibility in a given
graph by relying on graph embedding. Our experiments on three real-life social
network datasets under two widely known graph anonymization mechanisms
demonstrate that this method can very effectively detect fake edges with AUC
values above 0.95 in most cases. Second, by relying on Gaussian mixture models
and maximum a posteriori probability estimation, we derive an optimal decision
rule to detect whether an edge is fake based on the observed graph data. We
further demonstrate that this approach concretely jeopardizes the privacy
guarantees provided by the considered graph anonymization mechanisms. To
mitigate this vulnerability, we propose a method to generate fake edges as
plausible as possible given the graph structure and incorporate it into the
existing anonymization mechanisms. Our evaluation demonstrates that the
enhanced mechanisms not only decrease the chances of graph recovery (with AUC
dropping by up to 35%), but also provide even better graph utility than
existing anonymization methods.

%K Computer Science, Cryptography and Security, cs.CR,cs.SI

Thesis

D. Ziegler

“Answer Type Prediction for Question Answering over Knowledge Bases,” Universität des Saarlandes, Saarbrücken, 2017.

mehr

BibTeX

@mastersthesis{ZieglerMSc2017,
TITLE = {Answer Type Prediction for Question Answering over Knowledge Bases},
AUTHOR = {Ziegler, David},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2017},
DATE = {2017},
}

Endnote

%0 Thesis
%A Ziegler, David
%Y Abujabal, Abdalghani
%A referee: Saha Roy, Rishiraj
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Answer Type Prediction for Question Answering over Knowledge Bases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002D-8F38-A
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2017
%P X, 48 p.
%V master
%9 master

Conference paper

D1D5

D. Ziegler, A. Abujabal, R. Saha Roy, and G. Weikum

“Efficiency-aware Answering of Compositional Questions using Answer Type Prediction,” in The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, 2017.

mehr

BibTeX

@inproceedings{ZieglerIJCNLP2017,
TITLE = {Efficiency-aware Answering of Compositional Questions using Answer Type Prediction},
AUTHOR = {Ziegler, David and Abujabal, Abdalghani and Saha Roy, Rishiraj and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-948087-01-8},
URL = {http://aclweb.org/anthology/I17-2038},
PUBLISHER = {Asian Federation of Natural Language Processing},
YEAR = {2017},
BOOKTITLE = {The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)},
PAGES = {222--227},
ADDRESS = {Taipei, Taiwan},
}

Endnote

%0 Conference Proceedings
%A Ziegler, David
%A Abujabal, Abdalghani
%A Saha Roy, Rishiraj
%A Weikum, Gerhard
%+ Algorithms and Complexity, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficiency-aware Answering of Compositional Questions using Answer Type Prediction : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0000-3B5F-5
%U http://aclweb.org/anthology/I17-2038
%D 2017
%B 8th International Joint Conference on Natural Language Processing

%Z date of event: 2017-11-27 - 2017-12-01
%C Taipei, Taiwan
%B The 8th International Joint Conference on Natural Language Processing

%P 222 - 227
%I Asian Federation of Natural Language Processing 
%@ 978-1-948087-01-8

2016

Proceedings

S. Abiteboul, G. Miklau, J. Stoyanovich, and G. Weikum

Eds., Data, Responsibly, no. 7. Schloss Dagstuhl, 2016.

mehr

BibTeX

@proceedings{AbiteboulDagstuhl2016,
TITLE = {Data, Responsibly (Dagstuhl Seminar 16291)},
EDITOR = {Abiteboul, Serge and Miklau, Gerome and Stoyanovich, Julia and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {2192-5283},
URL = {urn:nbn:de:0030-drops-67644},
DOI = {10.4230/DagRep.6.7.42},
PUBLISHER = {Schloss Dagstuhl},
YEAR = {2016},
PAGES = {30 p.},
SERIES = {Dagstuhl Reports},
VOLUME = {6},
ISSUE = {7},
ADDRESS = {Wadern, Germany},
}

Endnote

%0 Conference Proceedings
%E Abiteboul, Serge
%E Miklau, Gerome
%E Stoyanovich, Julia
%E Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Data, Responsibly : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-500A-2
%R 10.4230/DagRep.6.7.42
%U urn:nbn:de:0030-drops-67644
%I Schloss Dagstuhl
%D 2016
%B Dagstuhl Seminar 16291 "Data, Responsibly"
%Z date of event: 2016-07-17 - 2016-07-22
%D 2016
%C Wadern, Germany
%P 30 p.
%K Data responsibly, Big data, Machine bias, Data analysis, Data management, Data mining, Fairness, Diversity, Accountability, Transparency, Personal
%S Dagstuhl Reports
%V 6
%P 42 - 71
%@ false
%U http://drops.dagstuhl.de/opus/volltexte/2016/6764/http://drops.dagstuhl.de/doku/urheberrecht1.html

Article

K. Athukorala, D. Głowack, G. Jacucci, A. Oulasvirta, and J. Vreeken

“Is Exploratory Search Different? A Comparison of Information Search Behavior for Exploratory and Lookup Tasks,” Journal of the Association for Information Science and Technology, vol. 67, no. 11, 2016.

mehr

BibTeX

@article{VreekenSearch2015,
TITLE = {Is Exploratory Search Different? {A} Comparison of Information Search Behavior for Exploratory and Lookup Tasks},
AUTHOR = {Athukorala, Kumaripaba and G{\l}owack, Dorota and Jacucci, Giulio and Oulasvirta, Antti and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {2330-1643},
DOI = {10.1002/asi.23617},
PUBLISHER = {Wiley},
ADDRESS = {Chichester},
YEAR = {2016},
DATE = {2016},
JOURNAL = {Journal of the Association for Information Science and Technology},
VOLUME = {67},
NUMBER = {11},
PAGES = {2635--2651},
}

Endnote

%0 Journal Article
%A Athukorala, Kumaripaba
%A G&#322;owack, Dorota
%A Jacucci, Giulio
%A Oulasvirta, Antti
%A Vreeken, Jilles
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Is Exploratory Search Different? A Comparison of Information Search Behavior for Exploratory and Lookup Tasks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0028-E6A7-D
%R 10.1002/asi.23617
%7 2015-10-22
%D 2016
%J Journal of the Association for Information Science and Technology
%V 67
%N 11
%& 2635
%P 2635 - 2651
%I Wiley
%C Chichester
%@ false

Thesis

A. H. Baradaranshahroudi

“Fast Computation of Highest Correlated Segments in Multivariate Time-Series,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{BaradaranshahroudiMSc2016,
TITLE = {Fast Computation of Highest Correlated Segments in Multivariate Time-Series},
AUTHOR = {Baradaranshahroudi, Amir Hossein},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Baradaranshahroudi, Amir Hossein
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Fast Computation of Highest Correlated Segments in Multivariate Time-Series : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5FB1-1
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%V master
%9 master

Proceedings

B. Berendt, B. Bringmann, E. Fromont, G. Garriga, P. Miettinen, N. Tatti, and V. Tresp

Eds., Machine Learning and Knowledge Discovery in Databases. Springer, 2016.

mehr

BibTeX

@proceedings{ProceedingsECML2016III,
TITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)},
EDITOR = {Berendt, Bettina and Bringmann, Bj{\"o}rn and Fromont, Elisa and Garriga, Gemma and Miettinen, Pauli and Tatti, Nikolai and Tresp, Volker},
LANGUAGE = {eng},
ISBN = {978-3-319-46130-4},
DOI = {10.1007/978-3-319-46131-1},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
PAGES = {XXII, 307 p.},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9853},
ADDRESS = {Riva del Garda, Italy},
}

Endnote

%0 Conference Proceedings
%E Berendt, Bettina
%E Bringmann, Bj&#246;rn
%E Fromont, Elisa
%E Garriga, Gemma
%E Miettinen, Pauli
%E Tatti, Nikolai
%E Tresp, Volker
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Machine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2016 ; Riva del Garda, Italy, September 19-23, 2016 ; Proceedings, Part III
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A68E-5
%R 10.1007/978-3-319-46131-1
%@ 978-3-319-46130-4
%I Springer
%D 2016
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2016-09-19 - 2016-09-23
%D 2016
%C Riva del Garda, Italy
%P XXII, 307 p.
%S Lecture Notes in Artificial Intelligence
%V 9853

Conference paper

R. Bertens, J. Vreeken, and A. Siebes

“Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns,” in KDD’16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016.

mehr

BibTeX

@inproceedings{BertensKDD2016,
TITLE = {Keeping it Short and Simple: {S}ummarising Complex Event Sequences with Multivariate Patterns},
AUTHOR = {Bertens, Roel and Vreeken, Jilles and Siebes, Arno},
LANGUAGE = {eng},
ISBN = {978-1-4503-4232-2},
DOI = {10.1145/2939672.2939761},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {KDD'16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
PAGES = {735--744},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Bertens, Roel
%A Vreeken, Jilles
%A Siebes, Arno
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A92D-B
%R 10.1145/2939672.2939761
%D 2016
%B 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%Z date of event: 2016-08-13 - 2016-08-17
%C San Francisco, CA, USA
%B KDD'16
%P 735 - 744
%I ACM
%@ 978-1-4503-4232-2

Thesis

A. Bhattacharyya

“Squish: Efficiently Summarising Sequences with Rich and Interleaving Patterns,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{BhattacharyyaMSc2016,
TITLE = {Squish: Efficiently Summarising Sequences with Rich and Interleaving Patterns},
AUTHOR = {Bhattacharyya, Apratim},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Bhattacharyya, Apratim
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Squish: Efficiently Summarising Sequences with Rich and Interleaving Patterns : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5F37-4
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%V master
%9 master

Conference paper

J. A. Biega, K. P. Gummadi, I. Mele, D. Milchevski, C. Tryfonopoulos, and G. Weikum

“R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities,” in SIGIR’16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 2016.

mehr

BibTeX

@inproceedings{BiegaSIGIR2016,
TITLE = {R-Susceptibility: {A}n {IR}-Centric Approach to Assessing Privacy Risks for Users in Online Communities},
AUTHOR = {Biega, Joanna Asia and Gummadi, Krishna P. and Mele, Ida and Milchevski, Dragan and Tryfonopoulos, Christos and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4069-4},
DOI = {10.1145/2911451.2911533},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {SIGIR'16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {365--374},
ADDRESS = {Pisa, Italy},
}

Endnote

%0 Conference Proceedings
%A Biega, Joanna Asia
%A Gummadi, Krishna P.
%A Mele, Ida
%A Milchevski, Dragan
%A Tryfonopoulos, Christos
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A921-3
%R 10.1145/2911451.2911533
%D 2016
%B 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2016-07-17 - 2016-07-21
%C Pisa, Italy
%B SIGIR'16
%P 365 - 374
%I ACM
%@ 978-1-4503-4069-4

Conference paper

T. Bögel, E. Gius, J. Jacke, and J. Strötgen

“From Order to Order Switch: Mediating between Complexity and Reproducibility in the Context of Automated Literary Annotation,” in Digital Humanities 2016 (DH 2016), Krakow, Poland, 2016.

mehr

BibTeX

@inproceedings{BoegelDH2016,
TITLE = {From Order to Order Switch: {M}ediating between Complexity and Reproducibility in the Context of Automated Literary Annotation},
AUTHOR = {B{\"o}gel, Thomas and Gius, Evelyn and Jacke, Janina and Str{\"o}tgen, Jannik},
LANGUAGE = {eng},
URL = {http://dh2016.adho.org/abstracts/275},
PUBLISHER = {Jagiellonian University \& Pedagogical University},
YEAR = {2016},
BOOKTITLE = {Digital Humanities 2016 (DH 2016)},
PAGES = {379--382},
ADDRESS = {Krakow, Poland},
}

Endnote

%0 Conference Proceedings
%A B&#246;gel, Thomas
%A Gius, Evelyn
%A Jacke, Janina
%A Str&#246;tgen, Jannik
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T From Order to Order Switch: Mediating between Complexity and Reproducibility in the Context of Automated Literary Annotation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-0E96-0
%D 2016
%B Digital Humanities
%Z date of event: 2016-07-11 - 2016-07-16
%C Krakow, Poland
%B Digital Humanities 2016
%P 379 - 382
%I Jagiellonian University & Pedagogical University
%U http://dh2016.adho.org/abstracts/275

Conference paper

N. Boldyrev, M. Spaniol, and G. Weikum

“ACROSS: A Framework for Multi-Cultural Interlinking of Web Taxonomies,” in WebSci’16, ACM Web Science Conference, Hannover, Germany, 2016.

mehr

BibTeX

@inproceedings{BoldryevWebSci2016,
TITLE = {{ACROSS}: {A} Framework for Multi-Cultural Interlinking of {W}eb Taxonomies},
AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4208-7},
DOI = {10.1145/2908131.2908164},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WebSci'16, ACM Web Science Conference},
PAGES = {127--136},
ADDRESS = {Hannover, Germany},
}

Endnote

%0 Conference Proceedings
%A Boldyrev, Natalia
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ACROSS: A Framework for Multi-Cultural Interlinking of Web Taxonomies : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-01B6-E
%R 10.1145/2908131.2908164
%D 2016
%B ACM Web Science Conference
%Z date of event: 2016-05-22 - 2016-05-25
%C Hannover, Germany
%B WebSci'16
%P 127 - 136
%I ACM
%@ 978-1-4503-4208-7

Proceedings

P. Chau, J. Vreeken, M. van Leeuwen, D. Shahaf, and C. Faloutsos

Eds., Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics. IDEA’16, 2016.

mehr

BibTeX

@proceedings{ChauIDEA2016,
TITLE = {Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics (IDEA 2016)},
EDITOR = {Chau, Polo and Vreeken, Jilles and van Leeuwen, Matthijs and Shahaf, Dafna and Faloutsos, Christons},
LANGUAGE = {eng},
PUBLISHER = {IDEA'16},
YEAR = {2016},
PAGES = {137 p.},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%E Chau, Polo
%E Vreeken, Jilles
%E van Leeuwen, Matthijs
%E Shahaf, Dafna
%E Faloutsos  , Christons
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2439-4
%I IDEA'16
%D 2016
%B ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics
%Z date of event: 2016-08-14 - 2016-08-14
%D 2016
%C San Francisco, CA, USA
%P 137 p.

Conference paper

J. Chen, N. Tandon, C. D. Hariman, and G. de Melo

“WebBrain: Joint Neural Learning of Large-Scale Commonsense Knowledge,” in The Semantic Web -- ISWC 2016, Kobe, Japan, 2016.

mehr

BibTeX

@inproceedings{ChenISWC2016,
TITLE = {{WebBrain}: {J}oint Neural Learning of Large-Scale Commonsense Knowledge},
AUTHOR = {Chen, Jiaqiang and Tandon, Niket and Hariman, Charles Darwis and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-3-319-46522-7},
DOI = {10.1007/978-3-319-46523-4_7},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {The Semantic Web -- ISWC 2016},
EDITOR = {Groth, Paul and Simperl, Elena and Gray, Alasdair and Sabou, Marta and Kr{\"o}tzsch, Markus and Lecue, Freddy and Fl{\"o}ck, Fabian and Gil, Yolanda},
PAGES = {102--118},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9981},
ADDRESS = {Kobe, Japan},
}

Endnote

%0 Conference Proceedings
%A Chen, Jiaqiang
%A Tandon, Niket
%A Hariman, Charles Darwis
%A de Melo, Gerard
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T WebBrain: Joint Neural Learning of Large-Scale Commonsense Knowledge : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2DED-A
%R  10.1007/978-3-319-46523-4_7
%D 2016
%B 15th International Semantic Web Conference
%Z date of event: 2016-10-17 - 2016-10-21
%C Kobe, Japan
%B The Semantic Web -- ISWC 2016
%E Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Kr&#246;tzsch, Markus; Lecue, Freddy; Fl&#246;ck, Fabian; Gil, Yolanda
%P 102 - 118
%I Springer
%@ 978-3-319-46522-7
%B Lecture Notes in Computer Science
%N 9981

Thesis

D5IMPR-CS

C. X. Chu

“Mining How-to Task Knowledge from Online Communities,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{ChuMSc2016,
TITLE = {Mining How-to Task Knowledge from Online Communities},
AUTHOR = {Chu, Cuong Xuan},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Chu, Cuong Xuan
%Y Weikum, Gerhard
%A referee: Vreeken, Jilles
%A referee: Tandon, Niket
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mining How-to Task Knowledge from Online Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-491D-B
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P 66 p.
%V master
%9 master

Thesis

D5IMPR-CS

L. Del Corro

“Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@phdthesis{delcorrophd15,
TITLE = {Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text},
AUTHOR = {Del Corro, Luciano},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-63465},
DOI = {10.22028/D291-26641},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Del Corro, Luciano
%Y Gemulla, Rainer
%A referee: Ponzetto, Simone Paolo
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-B3DB-3
%R 10.22028/D291-26641
%U urn:nbn:de:bsz:291-scidok-63465
%F OTHER: hdl:20.500.11880/26697
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P xiv, 101 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2016/6346/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Article

G. de Melo and N. Tandon

“Seeing is Believing: The Quest for Multimodal Knowledge,” ACM SIGWEB Newsletter, no. Spring, 2016.

mehr

BibTeX

@article{DemeloTandon:SIGWEB2016,
TITLE = {Seeing is Believing: {T}he Quest for Multimodal Knowledge},
AUTHOR = {de Melo, Gerard and Tandon, Niket},
LANGUAGE = {eng},
DOI = {10.1145/2903513.2903517},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2016},
JOURNAL = {ACM SIGWEB Newsletter},
NUMBER = {Spring},
EID = {4},
}

Endnote

%0 Journal Article
%A de Melo, Gerard
%A Tandon, Niket
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Seeing is Believing: The Quest for Multimodal Knowledge : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-54BB-3
%R 10.1145/2903513.2903517
%7 2016
%D 2016
%J ACM SIGWEB Newsletter
%N Spring
%Z sequence number: 4
%I ACM
%C New York, NY

Conference paper

L. Derczynski, J. Strötgen, D. Maynard, M. A. Greenwood, and M. Jung

“GATE-Time: Extraction of Temporal Expressions and Event,” in Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 2016.

mehr

BibTeX

@inproceedings{DERCZYNSKI16.915,
TITLE = {{GATE}-Time: {E}xtraction of Temporal Expressions and Event},
AUTHOR = {Derczynski, Leon and Str{\"o}tgen, Jannik and Maynard, Diana and Greenwood, Mark A. and Jung, Manuel},
LANGUAGE = {eng},
ISBN = {978-2-9517408-9-1},
PUBLISHER = {European Language Resources Association (ELRA)},
YEAR = {2016},
BOOKTITLE = {Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
EDITOR = {Calzolari, Nicoletta and Choukri, Khalid and Declerck, Thierry and Goggi, Sara and Grobelnik, Marko and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asunci{\'o}n and Odijk, Jan and Piperidis, Stelios},
PAGES = {3702--3708},
EID = {915},
ADDRESS = {Portoro{\v z}, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Derczynski, Leon
%A Str&#246;tgen, Jannik
%A Maynard, Diana
%A Greenwood, Mark A.
%A Jung, Manuel
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T GATE-Time: Extraction of Temporal Expressions and Event : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-4139-8
%D 2016
%B 10th Language Resources and Evaluation Conference
%Z date of event: 2016-05-23 - 2016-05-28
%C Portoro&#382;, Slovenia
%B Tenth International Conference on Language Resources and Evaluation
%E Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Goggi, Sara; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Mazo, H&#233;l&#232;ne; Moreno, Asunci&#243;n; Odijk, Jan; Piperidis, Stelios
%P 3702 - 3708
%Z sequence number: 915
%I European Language Resources Association (ELRA)
%@ 978-2-9517408-9-1

Thesis

H. Dombrowski

“Boolean Tensor Decomposition based on the Walk’n'Merge Algorithm,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{DombrowskiMaster2016,
TITLE = {Boolean Tensor Decomposition based on the Walk'n'Merge Algorithm},
AUTHOR = {Dombrowski, Helge},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Dombrowski, Helge
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Boolean Tensor Decomposition based on the Walk'n'Merge Algorithm : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2280-1
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%V master
%9 master

Conference paper

X. Du, O. Emebo, A. Varde, N. Tandon, S. Nag Chowdhury, and G. Weikum

“Air Quality Assessment from Social Media and Structured Data: Pollutants and Health Impacts in Urban Planning,” in Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW 2016), Helsinki, Finland, 2016.

mehr

BibTeX

@inproceedings{DuICDEW2016,
TITLE = {Air Quality Assessment from Social Media and Structured Data: {P}ollutants and Health Impacts in Urban Planning},
AUTHOR = {Du, Xu and Emebo, Onyeka and Varde, Aparna and Tandon, Niket and Nag Chowdhury, Sreyasi and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1109/ICDEW.2016.7495616},
PUBLISHER = {IEEE},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW 2016)},
PAGES = {54--59},
ADDRESS = {Helsinki, Finland},
}

Endnote

%0 Conference Proceedings
%A Du, Xu
%A Emebo, Onyeka
%A Varde, Aparna
%A Tandon, Niket
%A Nag Chowdhury, Sreyasi
%A Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Air Quality Assessment from Social Media and Structured Data: Pollutants and Health Impacts in Urban Planning : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-01AE-2
%R 10.1109/ICDEW.2016.7495616
%D 2016
%B IEEE 32nd International Conference on Data Engineering Workshops
%Z date of event: 2016-05-16 - 2016-05-20
%C Helsinki, Finland
%B Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering Workshops
%P 54 - 59
%I IEEE

Conference paper

P. Ernst, A. Siu, D. Milchevski, J. Hoffart, and G. Weikum

“DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences,” in Proceedings of ACL-2016 System Demonstrations, Berlin, Germany, 2016.

mehr

BibTeX

@inproceedings{ernst-EtAl:2016:P16-4,
TITLE = {{DeepLife:} {A}n Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences},
AUTHOR = {Ernst, Patrick and Siu, Amy and Milchevski, Dragan and Hoffart, Johannes and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-945626-0},
DOI = {10.18653/v1/P16-4004},
PUBLISHER = {ACL},
YEAR = {2016},
BOOKTITLE = {Proceedings of ACL-2016 System Demonstrations},
EDITOR = {Pradhan, Sameer and Apidianaki, Marianna},
PAGES = {19--24},
ADDRESS = {Berlin, Germany},
}

Endnote

%0 Conference Proceedings
%A Ernst, Patrick
%A Siu, Amy
%A Milchevski, Dragan
%A Hoffart, Johannes
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-24CA-F
%R 10.18653/v1/P16-4004
%D 2016
%B The 54th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2016-08-07 - 2016-08-12
%C Berlin, Germany
%B Proceedings of ACL-2016 System Demonstrations 
%E Pradhan, Sameer; Apidianaki, Marianna
%P 19 - 24
%I ACL
%@ 978-1-945626-0

Proceedings

P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken

Eds., Machine Learning and Knowledge Discovery in Databases. Springer, 2016.

mehr

BibTeX

@proceedings{ProceedingsECML2016,
TITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)},
EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles},
LANGUAGE = {eng},
DOI = {10.1007/978-3-319-46227-1},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
PAGES = {XXVIII, 825 p.},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9852},
ADDRESS = {Riva del Garda, Italy},
}

Endnote

%0 Conference Proceedings
%E Frasconi, Paolo
%E Landwehr, Niels
%E Manco, Guiseppe
%E Vreeken, Jilles
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Machine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2016 ; Riva del Garda, Italy, September 19-23, 2016 ; Proceedings, Part II
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A688-2
%R 10.1007/978-3-319-46227-1
%I Springer
%D 2016
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2016-09-19 - 2016-09-23
%D 2016
%C Riva del Garda, Italy
%P XXVIII, 825 p.
%S Lecture Notes in Artificial Intelligence
%V 9852

Proceedings

P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken

Eds., Machine Learning and Knowledge Discovery in Databases. Springer, 2016.

mehr

BibTeX

@proceedings{ProceedingsECML2016I,
TITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)},
EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-319-46127-4},
DOI = {10.1007/978-3-319-46128-1},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
PAGES = {XXXVI, 817 p.},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9851},
ADDRESS = {Riva del Garda, Italy},
}

Endnote

%0 Conference Proceedings
%E Frasconi, Paolo
%E Landwehr, Niels
%E Manco, Guiseppe
%E Vreeken, Jilles
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Machine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2016 ; Riva del Garda, Italy, September 19-23, 2016 ; Proceedings, Part I
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A68A-D
%R 10.1007/978-3-319-46128-1
%@ 978-3-319-46127-4
%I Springer
%D 2016
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2016-09-19 - 2016-09-23
%D 2016
%C Riva del Garda, Italy
%P XXXVI, 817 p.
%S Lecture Notes in Artificial Intelligence
%V 9851

Conference paper

M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum

“Exception-Enriched Rule Learning from Knowledge Graphs,” in The Semantic Web -- ISWC 2016, Kobe, Japan, 2016.

mehr

BibTeX

@inproceedings{Gad-ElrabISWC2016,
TITLE = {Exception-Enriched Rule Learning from Knowledge Graphs},
AUTHOR = {Gad-Elrab, Mohamed H. and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-46522-7},
DOI = {10.1007/978-3-319-46523-4_15},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {The Semantic Web -- ISWC 2016},
EDITOR = {Groth, Paul and Simperl, Elena and Gray, Alasdair and Sabou, Marta and Kr{\"o}tzsch, Markus and Lecue, Freddy and Fl{\"o}ck, Fabian and Gil, Yolanda},
PAGES = {234--251},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9981},
ADDRESS = {Kobe, Japan},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed H.
%A Stepanova, Daria
%A Urbani, Jacopo
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Exception-Enriched Rule Learning from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A91F-B
%R 10.1007/978-3-319-46523-4_15
%D 2016
%B 15th International Semantic Web Conference
%Z date of event: 2016-10-17 - 2016-10-21
%C Kobe, Japan
%B The Semantic Web -- ISWC 2016
%E Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Kr&#246;tzsch, Markus; Lecue, Freddy; Fl&#246;ck, Fabian; Gil, Yolanda
%P 234 - 251
%I Springer
%@ 978-3-319-46522-7
%B Lecture Notes in Computer Science
%N 9981

Conference paper

M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum

“Exception-Enriched Rule Learning from Knowledge Graphs,” in KI 2016: Advances in Artificial Intelligence, Klagenfurt, Austria, 2016.

mehr

BibTeX

@inproceedings{Gad-ElrabKI2016,
TITLE = {Exception-Enriched Rule Learning from Knowledge Graphs},
AUTHOR = {Gad-Elrab, Mohamed H. and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-46072-7},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {KI 2016: Advances in Artificial Intelligence},
EDITOR = {Friedrich, Gerhard and Helmert, Malte and Wotawa, Franz},
PAGES = {211--217},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9904},
ADDRESS = {Klagenfurt, Austria},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed H.
%A Stepanova, Daria
%A Urbani, Jacopo
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Exception-Enriched Rule Learning from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-22E9-A
%D 2016
%B 39th Annual German Conference on AI
%Z date of event: 2016-09-26 - 2016-09-30
%C Klagenfurt, Austria
%B KI 2016: Advances in Artificial Intelligence
%E Friedrich, Gerhard; Helmert, Malte; Wotawa, Franz
%P 211 - 217
%I Springer
%@ 978-3-319-46072-7
%B Lecture Notes in Artificial Intelligence
%N 9904

Thesis

M. Gandhi

“Towards Summarising Large Transaction Databases,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{GandhiMSc2016,
TITLE = {Towards Summarising Large Transaction Databases},
AUTHOR = {Gandhi, Manan},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Gandhi, Manan
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Towards Summarising Large Transaction Databases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5F61-4
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P X, 55 p.
%V master
%9 master

Thesis

K. Grosse

“An Approach for Ontological Pattern-based Summarization,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{GrosseMSc2016,
TITLE = {An Approach for Ontological Pattern-based Summarization},
AUTHOR = {Grosse, Kathrin},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Grosse, Kathrin
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T An Approach for Ontological Pattern-based Summarization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5F5F-C
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P X, 84 p.
%V master
%9 master

Conference paper

A. Grycner and G. Weikum

“POLY: Mining Relational Paraphrases from Multilingual Sentences,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), Austin, TX, USA, 2016.

mehr

BibTeX

@inproceedings{GrycnerENMLP2016,
TITLE = {{POLY}: {M}ining Relational Paraphrases from Multilingual Sentences},
AUTHOR = {Grycner, Adam and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-945626-25-8},
URL = {https://aclweb.org/anthology/D16-1236},
PUBLISHER = {ACL},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016)},
PAGES = {2183--2192},
ADDRESS = {Austin, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Grycner, Adam
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T POLY: Mining Relational Paraphrases from Multilingual Sentences : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-158D-0
%U https://aclweb.org/anthology/D16-1236
%D 2016
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2016-11-01 - 2016-11-05
%C Austin, TX, USA
%B Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

%P 2183 - 2192
%I ACL
%@ 978-1-945626-25-8

Report

D. Gupta and K. Berberich

“Diversifying Search Results Using Time,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2016-5-001, 2016.

mehr

Abstract

Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore

longitudinal document collections by querying for entities or events without knowing associated important dates apriori.

In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their

contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of

identified time intervals. We present a novel and objective evaluation for our proposed

approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present

search results diversified along time.

BibTeX

@techreport{GuptaReport2016-5-001,
TITLE = {Diversifying Search Results Using Time},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2016-5-001},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
ABSTRACT = {Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore longitudinal document collections by querying for entities or events without knowing associated important dates apriori. In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present search results diversified along time.},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Diversifying Search Results Using Time :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-0AA4-C
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2016
%P 51 p.
%X Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore
longitudinal document collections by querying for entities or events without knowing associated important dates apriori.
In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their
contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of
identified time intervals. We present a novel and objective evaluation for our proposed
approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present
search results diversified along time.
%B Research Report
%@ false

Conference paper

D. Gupta, J. Strötgen, and K. Berberich

“DIGITALHISTORIAN: Search & Analytics Using Annotations,” in HistoInformatics 2016, The 3rd HistoInformatics Workshop on Computational History, Krakow, Poland, 2016.

mehr

BibTeX

@inproceedings{Gupta,
TITLE = {{DIGITALHISTORIAN}: {S}earch \& Analytics Using Annotations},
AUTHOR = {Gupta, Dhruv and Str{\"o}tgen, Jannik and Berberich, Klaus},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-1632-7},
PUBLISHER = {CEUR-WS.org},
YEAR = {2016},
BOOKTITLE = {HistoInformatics 2016, The 3rd HistoInformatics Workshop on Computational History},
EDITOR = {D{\"u}ring, Marten and Jatowt, Adam and Preiser-Kappeller, Johannes and van Den Bosch, Antal},
PAGES = {5--10},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1632},
ADDRESS = {Krakow, Poland},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Str&#246;tgen, Jannik
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T DIGITALHISTORIAN: Search & Analytics Using Annotations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-0885-2
%D 2016
%B The 3rd HistoInformatics Workshop on Computational History
%Z date of event: 2016-07-11 - 2016-07-11
%C Krakow, Poland
%B HistoInformatics 2016
%E D&#252;ring, Marten; Jatowt, Adam; Preiser-Kappeller, Johannes; van Den Bosch, Antal
%P 5 - 10
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1632
%@ false
%U http://ceur-ws.org/Vol-1632/paper_1.pdf

Paper

D. Gupta

“Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search and Analytics,” 2016. [Online]. Available: http://arxiv.org/abs/1603.00260.

mehr

Abstract

In this article, I present the questions that I seek to answer in my PhD

research. I posit to analyze natural language text with the help of semantic

annotations and mine important events for navigating large text corpora.

Semantic annotations such as named entities, geographic locations, and temporal

expressions can help us mine events from the given corpora. These events thus

provide us with useful means to discover the locked knowledge in them. I pose

three problems that can help unlock this knowledge vault in semantically

annotated text corpora: i. identifying important events; ii. semantic search;

and iii. event analytics.

BibTeX

@online{Gupta1603.00260,
TITLE = {Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search and Analytics},
AUTHOR = {Gupta, Dhruv},
URL = {http://arxiv.org/abs/1603.00260},
DOI = {10.1145/2835776.2855083},
EPRINT = {1603.00260},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; and iii. event analytics.},
}

Endnote

%0 Report
%A Gupta, Dhruv
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search and Analytics : 
%U http://hdl.handle.net/11858/00-001M-0000-002C-2224-4
%R 10.1145/2835776.2855083
%U http://arxiv.org/abs/1603.00260
%D 2016
%X   In this article, I present the questions that I seek to answer in my PhD
research. I posit to analyze natural language text with the help of semantic
annotations and mine important events for navigating large text corpora.
Semantic annotations such as named entities, geographic locations, and temporal
expressions can help us mine events from the given corpora. These events thus
provide us with useful means to discover the locked knowledge in them. I pose
three problems that can help unlock this knowledge vault in semantically
annotated text corpora: i. identifying important events; ii. semantic search;
and iii. event analytics.

%K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Conference paper

D. Gupta and K. Berberich

“Diversifying Search Results Using Time: An Information Retrieval Method for Historians,” in Advances in Information Retrieval (ECIR 2016), Padova, Italy, 2016.

mehr

BibTeX

@inproceedings{GuptaECIR2016,
TITLE = {Diversifying Search Results Using Time: An Information Retrieval Method for Historians},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-30670-4},
DOI = {10.1007/978-3-319-30671-1_69},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2016)},
EDITOR = {Ferro, Nicola and Crestani, Fabio and Moens, Marie-Francine and Mothe, Josiane and Silvestre, Fabrizio and Di Nunzio, Giorgio Maria and Hauff, Claudia and Silvello, Gianmaria},
PAGES = {789--795},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9626},
ADDRESS = {Padova, Italy},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Diversifying Search Results Using Time: An Information Retrieval Method for Historians : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-7514-F
%R 10.1007/978-3-319-30671-1_69
%D 2016
%B 38th European Conference on Information Retrieval
%Z date of event: 2016-03-20 - 2016-03-23
%C Padova, Italy
%B Advances in Information Retrieval
%E Ferro, Nicola; Crestani, Fabio; Moens, Marie-Francine; Mothe, Josiane; Silvestre, Fabrizio; Di Nunzio, Giorgio Maria; Hauff, Claudia; Silvello, Gianmaria
%P 789 - 795
%I Springer
%@ 978-3-319-30670-4
%B Lecture Notes in Computer Science
%N 9626

Conference paper

D. Gupta and K. Berberich

“A Probabilistic Framework for Time-Sensitive Search,” in Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan, 2016.

mehr

BibTeX

@inproceedings{GuptaNTCIR12,
TITLE = {A Probabilistic Framework for Time-Sensitive Search},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-4-86049-071-3},
URL = {http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/NTCIR/toc_ntcir.html},
PUBLISHER = {National Institute of Informatics},
YEAR = {2016},
BOOKTITLE = {Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies},
DEBUG = {author: Yamamoto, Shuhei},
EDITOR = {Kando, Noriko and Kishida, Kazuaki and Kato, Makoto P.},
PAGES = {225--232},
ADDRESS = {Tokyo, Japan},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Probabilistic Framework for Time-Sensitive Search : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2238-7
%U http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/NTCIR/toc_ntcir.html
%D 2016
%B 12th NTCIR Conference on Evaluation of Information Access Technologies
%Z date of event: 2016-06-07 - 2016-06-10
%C Tokyo, Japan
%B Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies
%E Kando, Noriko; Kishida, Kazuaki; Kato, Makoto P.; Yamamoto, Shuhei
%P 225 - 232
%I National Institute of Informatics
%@ 978-4-86049-071-3

Conference paper

D. Gupta

“Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics,” in WSDM’16, 9th ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, 2016, pp. 705–705.

mehr

BibTeX

@inproceedings{GuptaWSDM2016,
TITLE = {Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search \& Analytics},
AUTHOR = {Gupta, Dhruv},
LANGUAGE = {eng},
ISBN = {978-1-4503-3716-8},
DOI = {10.1145/2835776.2855083},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WSDM'16, 9th ACM International Conference on Web Search and Data Mining},
PAGES = {705--705},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Event Search and Analytics:  Detecting Events in Semantically Annotated Corpora for Search & Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-7526-7
%R 10.1145/2835776.2855083
%D 2016
%B 9th ACM International Conference on Web Search and Data Mining
%Z date of event: 2016-02-22 - 2016-02-25
%C San Francisco, CA, USA
%B WSDM'16
%P 705 - 705
%I ACM
%@ 978-1-4503-3716-8

Conference paper

D. Gupta, J. Strötgen, and K. Berberich

“EventMiner: Mining Events from Annotated Documents,” in ICTIR’2016, ACM International Conference on the Theory of Information Retrieval, Newark, DE, USA, 2016.

mehr

BibTeX

@inproceedings{GuptaICTIR2016,
TITLE = {{EventMiner}: {M}ining Events from Annotated Documents},
AUTHOR = {Gupta, Dhruv and Str{\"o}tgen, Jannik and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-4497-5},
DOI = {10.1145/2970398.2970411},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {ICTIR'2016, ACM International Conference on the Theory of Information Retrieval},
PAGES = {261--270},
ADDRESS = {Newark, DE, USA},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Str&#246;tgen, Jannik
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T EventMiner: Mining Events from Annotated Documents : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-B262-0
%R 10.1145/2970398.2970411
%D 2016
%B ACM International Conference on the Theory of Information Retrieval
%Z date of event: 2016-09-12 - 2016-09-16
%C Newark, DE, USA
%B ICTIR'2016
%P 261 - 270
%I ACM
%@ 978-1-4503-4497-5

Conference paper

S. Gurajada and M. Theobald

“Distributed Set Reachability,” in SIGMOD’16, ACM SIGMOD International Conference on Management of Data, San Francisco, CA, USA, 2016.

mehr

BibTeX

@inproceedings{GurajadaSIGMOD2016,
TITLE = {Distributed Set Reachability},
AUTHOR = {Gurajada, Sairam and Theobald, Martin},
LANGUAGE = {eng},
ISBN = {978-1-4503-3531-7},
DOI = {10.1145/2882903.2915226},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {SIGMOD'16, ACM SIGMOD International Conference on Management of Data},
PAGES = {1247--1261},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Gurajada, Sairam
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Distributed Set Reachability : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-220F-5
%R 10.1145/2882903.2915226
%D 2016
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2016-06-26 - 2016-07-01
%C San Francisco, CA, USA
%B SIGMOD'16
%P 1247 - 1261
%I ACM
%@ 978-1-4503-3531-7

Paper

S. Gurajada and M. Theobald

“Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1,” 2016. [Online]. Available: http://arxiv.org/abs/1609.05293.

mehr

Abstract

We propose an efficient and scalable architecture for processing generalized

graph-pattern queries as they are specified by the current W3C recommendation

of the SPARQL 1.1 "Query Language" component. Specifically, the class of

queries we consider consists of sets of SPARQL triple patterns with labeled

property paths. From a relational perspective, this class resolves to

conjunctive queries of relational joins with additional graph-reachability

predicates. For the scalable, i.e., distributed, processing of this kind of

queries over very large RDF collections, we develop a suitable partitioning and

indexing scheme, which allows us to shard the RDF triples over an entire

cluster of compute nodes and to process an incoming SPARQL query over all of

the relevant graph partitions (and thus compute nodes) in parallel. Unlike most

prior works in this field, we specifically aim at the unified optimization and

distributed processing of queries consisting of both relational joins and

graph-reachability predicates. All communication among the compute nodes is

established via a proprietary, asynchronous communication protocol based on the

Message Passing Interface.

BibTeX

@online{Gurajada1609.05293,
TITLE = {Distributed Processing of Generalized Graph-Pattern Queries in {SPARQL} 1.1},
AUTHOR = {Gurajada, Sairam and Theobald, Martin},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1609.05293},
EPRINT = {1609.05293},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface.},
}

Endnote

%0 Report
%A Gurajada, Sairam
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1 : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2212-C
%U http://arxiv.org/abs/1609.05293
%D 2016
%X   We propose an efficient and scalable architecture for processing generalized
graph-pattern queries as they are specified by the current W3C recommendation
of the SPARQL 1.1 "Query Language" component. Specifically, the class of
queries we consider consists of sets of SPARQL triple patterns with labeled
property paths. From a relational perspective, this class resolves to
conjunctive queries of relational joins with additional graph-reachability
predicates. For the scalable, i.e., distributed, processing of this kind of
queries over very large RDF collections, we develop a suitable partitioning and
indexing scheme, which allows us to shard the RDF triples over an entire
cluster of compute nodes and to process an incoming SPARQL query over all of
the relevant graph partitions (and thus compute nodes) in parallel. Unlike most
prior works in this field, we specifically aim at the unified optimization and
distributed processing of queries consisting of both relational joins and
graph-reachability predicates. All communication among the compute nodes is
established via a proprietary, asynchronous communication protocol based on the
Message Passing Interface.

%K Computer Science, Databases, cs.DB

Thesis

M. Halbe

“Skim: Alternative Candidate Selections for Slim through Sketching,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{HalbeBcS2016,
TITLE = {Skim: Alternative Candidate Selections for Slim through Sketching},
AUTHOR = {Halbe, Magnus},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
TYPE = {Bachelor's thesis},
}

Endnote

%0 Thesis
%A Halbe, Magnus
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Skim: Alternative Candidate Selections for Slim through Sketching : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5F44-6
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P X, 52 p.
%V bachelor
%9 bachelor

Conference paper

Y. He, K. Chakrabarti, T. Cheng, and T. Tylenda

“Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora,” in WWW’16, 25th International Conference on World Wide Web, Montréal, Canada, 2016.

mehr

BibTeX

@inproceedings{He_WWW2016,
TITLE = {Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora},
AUTHOR = {He, Yeye and Chakrabarti, Kaushik and Cheng, Tao and Tylenda, Tomasz},
LANGUAGE = {eng},
ISBN = {978-1-4503-4143-1},
DOI = {10.1145/2872427.2874816},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WWW'16, 25th International Conference on World Wide Web},
PAGES = {1429--1439},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A He, Yeye
%A Chakrabarti, Kaushik
%A Cheng, Tao
%A Tylenda, Tomasz
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-312D-5
%R 10.1145/2872427.2874816
%D 2016
%B 25th International Conference on World Wide Web 
%Z date of event: 2016-05-11 - 2016-05-15
%C Montr&#233;al, Canada
%B WWW'16
%P 1429 - 1439
%I ACM
%@ 978-1-4503-4143-1

Conference paper

J. Hoffart, D. Milchevski, G. Weikum, A. Anand, and J. Singh

“The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities,” in WWW’16 Companion, Montréal, Canada, 2016.

mehr

BibTeX

@inproceedings{HoffartWWW2016,
TITLE = {The Knowledge Awakens: {K}eeping Knowledge Bases Fresh with Emerging Entities},
AUTHOR = {Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard and Anand, Avishek and Singh, Jaspreet},
LANGUAGE = {eng},
ISBN = {978-1-4503-4144-8},
DOI = {10.1145/2872518.2890537},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WWW'16 Companion},
PAGES = {203--206},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%A Milchevski, Dragan
%A Weikum, Gerhard
%A Anand, Avishek
%A Singh, Jaspreet
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-01BB-4
%R 10.1145/2872518.2890537
%D 2016
%B 25th International Conference on World Wide Web
%Z date of event: 2016-05-11 - 2016-05-15
%C Montr&#233;al, Canada
%B WWW'16 Companion
%P 203 - 206
%I ACM
%@ 978-1-4503-4144-8

Conference paper

K. Hui and K. Berberich

“Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations,” in WWW’16 Companion, Montréal, Canada, 2016.

mehr

BibTeX

@inproceedings{HuiWWW2016,
TITLE = {Cluster Hypothesis in Low-Cost {IR} Evaluation with Different Document Representations},
AUTHOR = {Hui, Kai and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-4144-8},
DOI = {10.1145/2872518.2889370},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WWW'16 Companion},
PAGES = {47--48},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-08E3-C
%R 10.1145/2872518.2889370
%D 2016
%B 25th International Conference on World Wide Web
%Z date of event: 2016-05-11 - 2016-05-15
%C Montr&#233;al, Canada
%B WWW'16 Companion
%P 47 - 48
%I ACM
%@ 978-1-4503-4144-8

Conference paper

Y. Ibrahim, M. Riedewald, and G. Weikum

“Making Sense of Entities and Quantities in Web Tables,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.

mehr

BibTeX

@inproceedings{Ibrahim:CIKM2016,
TITLE = {Making Sense of Entities and Quantities in {Web} Tables},
AUTHOR = {Ibrahim, Yusra and Riedewald, Mirek and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4073-1},
DOI = {10.1145/2983323.2983772},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management},
PAGES = {1703--1712},
ADDRESS = {Indianapolis, IN, USA},
}

Endnote

%0 Conference Proceedings
%A Ibrahim, Yusra
%A Riedewald, Mirek
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Making Sense of Entities and Quantities in Web Tables : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4852-E
%R 10.1145/2983323.2983772
%D 2016
%B 25th ACM Conference on Information and Knowledge Management
%Z date of event: 2016-10-24 - 2016-10-28
%C Indianapolis, IN, USA
%B CIKM'16
%P 1703 - 1712
%I ACM
%@ 978-1-4503-4073-1

Thesis

J. Kalofolias

“Maximum Entropy Models for Redescription Mining,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{KalofoliasMSc2016,
TITLE = {Maximum Entropy Models for Redescription Mining},
AUTHOR = {Kalofolias, Janis},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Kalofolias, Janis
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Maximum Entropy Models for Redescription Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-54C0-6
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P III, 51 p.
%V master
%9 master

Conference paper

S. Karaev and P. Miettinen

“Capricorn: An Algorithm for Subtropical Matrix Factorization,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.

mehr

Abstract

Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.

BibTeX

@inproceedings{karaev16capricorn,
TITLE = {Capricorn: {An} Algorithm for Subtropical Matrix Factorization},
AUTHOR = {Karaev, Sanjar and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-61197-434-8},
DOI = {10.1137/1.9781611974348.79},
PUBLISHER = {SIAM},
YEAR = {2016},
DATE = {2016},
ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.},
BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)},
EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner},
PAGES = {702--710},
ADDRESS = {Miama, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Karaev, Sanjar
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Capricorn: An Algorithm for Subtropical Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-542F-3
%R 10.1137/1.9781611974348.79
%D 2016
%B 16th SIAM International Conference on Data Mining
%Z date of event: 2016-05-05 - 2016-05-07
%C Miama, FL, USA
%X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its  performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
%B Proceedings of the Sixteenth SIAM International Conference on Data Mining
%E Chawla Venkatasubramanian, Sanjay; Meira, Wagner
%P 702 - 710
%I SIAM
%@ 978-1-61197-434-8

Conference paper

S. Karaev and P. Miettinen

“Cancer: Another Algorithm for Subtropical Matrix Factorization,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Riva del Garda, Italy, 2016.

mehr

BibTeX

@inproceedings{KaraevECML2016,
TITLE = {Cancer: {A}nother Algorithm for Subtropical Matrix Factorization},
AUTHOR = {Karaev, Sanjar and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-3-319-46226-4},
DOI = {10.1007/978-3-319-46227-1_36},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)},
EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles},
PAGES = {576--592},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9852},
ADDRESS = {Riva del Garda, Italy},
}

Endnote

%0 Conference Proceedings
%A Karaev, Sanjar
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cancer: Another Algorithm for Subtropical Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A926-A
%R 10.1007/978-3-319-46227-1_36
%D 2016
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2016-09-19 - 2016-09-23
%C Riva del Garda, Italy
%B Machine Learning and Knowledge Discovery in Databases
%E Frasconi, Paolo; Landwehr, Niels; Manco, Guiseppe; Vreeken, Jilles
%P 576 - 592
%I Springer
%@ 978-3-319-46226-4
%B Lecture Notes in Artificial Intelligence
%N 9852

Article

M. Krötzsch and G. Weikum

“Editorial,” Journal of Web Semantics, vol. 37/38, 2016.

mehr

BibTeX

@article{Kroetzsch2016,
TITLE = {Editorial},
AUTHOR = {Kr{\"o}tzsch, Markus and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1570-8268},
DOI = {10.1016/j.websem.2016.04.002},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2016},
DATE = {2016},
JOURNAL = {Journal of Web Semantics},
VOLUME = {37/38},
PAGES = {53--54},
}

Endnote

%0 Journal Article
%A Kr&#246;tzsch, Markus
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Editorial : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-EB8D-B
%R 10.1016/j.websem.2016.04.002
%7 2016
%D 2016
%J Journal of Web Semantics
%O Science, Services and Agents on the World Wide Web Web Semantics: Science, Services and Agents on the World Wide Web
%V 37/38
%& 53
%P 53 - 54
%I Elsevier
%C Amsterdam
%@ false

Conference paper

E. Kuzey, J. Strötgen, V. Setty, and G. Weikum

“Temponym Tagging: Temporal Scopes for Textual Phrases,” in WWW’16 Companion, Montréal, Canada, 2016.

mehr

BibTeX

@inproceedings{Kuzey:2016:TTT:2872518.2889289,
TITLE = {Temponym Tagging: {T}emporal Scopes for Textual Phrases},
AUTHOR = {Kuzey, Erdal and Str{\"o}tgen, Jannik and Setty, Vinay and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4144-8},
DOI = {10.1145/2872518.2889289},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WWW'16 Companion},
PAGES = {841--842},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Kuzey, Erdal
%A Str&#246;tgen, Jannik
%A Setty, Vinay
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Temponym Tagging: Temporal Scopes for Textual Phrases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-4134-1
%R 10.1145/2872518.2889289
%D 2016
%B 25th International Conference on World Wide Web
%Z date of event: 2016-05-11 - 2016-05-15
%C Montr&#233;al, Canada
%B WWW'16 Companion
%P 841 - 842
%I ACM
%@ 978-1-4503-4144-8

Conference paper

E. Kuzey, V. Setty, J. Strötgen, and G. Weikum

“As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes,” in WWW’16, 25th International Conference on World Wide Web, Montréal, Canada, 2016.

mehr

BibTeX

@inproceedings{Kuzey_WWW2016,
TITLE = {As Time Goes By: {C}omprehensive Tagging of Textual Phrases with Temporal Scopes},
AUTHOR = {Kuzey, Erdal and Setty, Vinay and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4143-1},
DOI = {10.1145/2872427.2883055},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WWW'16, 25th International Conference on World Wide Web},
PAGES = {915--925},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Kuzey, Erdal
%A Setty, Vinay
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-310D-D
%R 10.1145/2872427.2883055
%D 2016
%B 25th International Conference on World Wide Web 
%Z date of event: 2016-05-11 - 2016-05-15
%C Montr&#233;al, Canada
%B WWW'16
%P 915 - 925
%I ACM
%@ 978-1-4503-4143-1

Paper

S. Metzler, S. Günnemann, and P. Miettinen

“Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques,” 2016. [Online]. Available: http://arxiv.org/abs/1602.04650.

mehr

Abstract

Cliques (or quasi-cliques) are frequently used to model communities: a set of

nodes where each pair is (equally) likely to be connected. However, when

observing real-world communities, we see that most communities have more

structure than that. In particular, the nodes can be ordered in such a way that

(almost) all edges in the community lie below a hyperbola. In this paper we

present three new models for communities that capture this phenomenon. Our

models explain the structure of the communities differently, but we also prove

that they are identical in their expressive power. Our models fit to real-world

data much better than traditional block models, and allow for more in-depth

understanding of the structure of the data.

BibTeX

@online{Metzler_arXiv2016,
TITLE = {Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques},
AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1602.04650},
EPRINT = {1602.04650},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {Cliques (or quasi-cliques) are frequently used to model communities: a set of nodes where each pair is (equally) likely to be connected. However, when observing real-world communities, we see that most communities have more structure than that. In particular, the nodes can be ordered in such a way that (almost) all edges in the community lie below a hyperbola. In this paper we present three new models for communities that capture this phenomenon. Our models explain the structure of the communities differently, but we also prove that they are identical in their expressive power. Our models fit to real-world data much better than traditional block models, and allow for more in-depth understanding of the structure of the data.},
}

Endnote

%0 Report
%A Metzler, Saskia
%A G&#252;nnemann, Stephan
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-08E5-8
%U http://arxiv.org/abs/1602.04650
%D 2016
%X   Cliques (or quasi-cliques) are frequently used to model communities: a set of
nodes where each pair is (equally) likely to be connected. However, when
observing real-world communities, we see that most communities have more
structure than that. In particular, the nodes can be ordered in such a way that
(almost) all edges in the community lie below a hyperbola. In this paper we
present three new models for communities that capture this phenomenon. Our
models explain the structure of the communities differently, but we also prove
that they are identical in their expressive power. Our models fit to real-world
data much better than traditional block models, and allow for more in-depth
understanding of the structure of the data.

%K cs.SI, Physics, Physics and Society, physics.soc-ph

Conference paper

P. Mirza and S. Tonelli

“On the Contribution of Word Embeddings to Temporal Relation Classification,” in Proceedings of COLING 2016: Technical Papers, Osaka, Japan, 2016.

mehr

BibTeX

@inproceedings{mirza-tonelli:2016:COLING2,
TITLE = {On the Contribution of Word Embeddings to Temporal Relation Classification},
AUTHOR = {Mirza, Paramita and Tonelli, Sara},
LANGUAGE = {eng},
ISBN = {978-4-87974-702-0},
PUBLISHER = {ACL},
YEAR = {2016},
BOOKTITLE = {Proceedings of COLING 2016: Technical Papers},
PAGES = {2818--2828},
ADDRESS = {Osaka, Japan},
}

Endnote

%0 Conference Proceedings
%A Mirza, Paramita
%A Tonelli, Sara
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T On the Contribution of Word Embeddings to Temporal Relation
Classification : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-23BB-A
%D 2016
%B The 26th International Conference on Computational Linguistics
%Z date of event: 2016-12-11 - 2016-12-16
%C Osaka, Japan
%B Proceedings of COLING 2016: Technical Papers
%P 2818 - 2828
%I ACL
%@ 978-4-87974-702-0

Conference paper

P. Mirza, S. Razniewski, and W. Nutt

“Expanding Wikidata’s Parenthood Information by 178%, or How To Mine Relation Cardinalities,” in Proceedings of the ISWC 2016 Posters & Demonstrations Track co-located with 15th International Semantic Web Conference (ISWC-P&D 2016), Kobe, Japan, 2016.

mehr

BibTeX

@inproceedings{DBLP:conf/semweb/MirzaRN16,
TITLE = {Expanding {W}ikidata's Parenthood Information by 178{\%}, or How To Mine Relation Cardinalities},
AUTHOR = {Mirza, Paramita and Razniewski, Simon and Nutt, Werner},
LANGUAGE = {eng},
URL = {urn:nbn:de:0074-1690-5},
PUBLISHER = {CEUR-WS.org},
YEAR = {2016},
BOOKTITLE = {Proceedings of the ISWC 2016 Posters \& Demonstrations Track co-located with 15th International Semantic Web Conference (ISWC-P\&D 2016)},
EDITOR = {Kawamura, Takahiro and Paulheim, Heiko},
EID = {4},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1690},
ADDRESS = {Kobe, Japan},
}

Endnote

%0 Conference Proceedings
%A Mirza, Paramita
%A Razniewski, Simon
%A Nutt, Werner
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Expanding Wikidata's Parenthood Information by 178%, or How To Mine Relation Cardinalities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-23C1-9
%D 2016
%B ISWC 2016 Posters & Demonstrations Trac
%Z date of event: 2016-10-19 - 2016-10-19
%C Kobe, Japan
%B Proceedings of the ISWC 2016 Posters & Demonstrations Track
co-located with 15th International Semantic Web Conference
%E Kawamura, Takahiro; Paulheim, Heiko
%Z sequence number: 4
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1690

Conference paper

P. Mirza and S. Tonelli

“CATENA: CAusal and TEmporal relation extraction from NAtural language texts,” in Proceedings of COLING 2016: Technical Papers, Osaka, Japan, 2016.

mehr

BibTeX

@inproceedings{mirza-tonelli:2016:COLING1,
TITLE = {{CATENA}: {CAusal} and {TEmporal} relation extraction from {NAtural} language texts},
AUTHOR = {Mirza, Paramita and Tonelli, Sara},
LANGUAGE = {eng},
ISBN = {978-4-87974-702-0},
PUBLISHER = {ACL},
YEAR = {2016},
BOOKTITLE = {Proceedings of COLING 2016: Technical Papers},
PAGES = {64--75},
ADDRESS = {Osaka, Japan},
}

Endnote

%0 Conference Proceedings
%A Mirza, Paramita
%A Tonelli, Sara
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T CATENA: CAusal and TEmporal relation extraction from NAtural language texts : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-23B8-0
%D 2016
%B The 26th International Conference on Computational Linguistics
%Z date of event: 2016-12-11 - 2016-12-16
%C Osaka, Japan
%B Proceedings of COLING 2016: Technical Papers
%P 64 - 75
%I ACL
%@ 978-4-87974-702-0

Conference paper

A. Mishra and K. Berberich

“Estimating Time Models for News Article Excerpts,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.

mehr

BibTeX

@inproceedings{DBLP:conf/cikm/MishraB16,
TITLE = {Estimating Time Models for News Article Excerpts},
AUTHOR = {Mishra, Arunav and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-4073-1},
DOI = {10.1145/2983323.2983802},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management},
PAGES = {781--790},
ADDRESS = {Indianapolis, IN, USA},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Estimating Time Models for News Article Excerpts : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-20CF-3
%R 10.1145/2983323.2983802
%D 2016
%B 25th ACM Conference on Information and Knowledge Management
%Z date of event: 2016-10-24 - 2016-10-28
%C Indianapolis, IN, USA
%B CIKM'16
%P 781 - 790
%I ACM
%@ 978-1-4503-4073-1

Conference paper

A. Mishra and K. Berberich

“Event Digest: A Holistic View on Past Events,” in SIGIR’16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 2016.

mehr

BibTeX

@inproceedings{MishraSIGIR2016,
TITLE = {Event Digest: {A} Holistic View on Past Events},
AUTHOR = {Mishra, Arunav and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-4069-4},
DOI = {10.1145/2911451.2911526},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {SIGIR'16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {493--502},
ADDRESS = {Pisa, Italy},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Event Digest: A Holistic View on Past Events : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-0895-D
%R 10.1145/2911451.2911526
%D 2016
%B 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2016-07-17 - 2016-07-21
%C Pisa, Italy
%B SIGIR'16
%P 493 - 502
%I ACM
%@ 978-1-4503-4069-4

Conference paper

A. Mishra and K. Berberich

“Leveraging Semantic Annotations to Link Wikipedia and News Archives,” in Advances in Information Retrieval (ECIR 2016), Padova, Italy, 2016.

mehr

BibTeX

@inproceedings{MishraECIR2016,
TITLE = {Leveraging Semantic Annotations to Link {W}ikipedia and News Archives},
AUTHOR = {Mishra, Arunav and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-30670-4},
DOI = {10.1007/978-3-319-30671-1_3},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2016)},
EDITOR = {Ferro, Nicola and Crestani, Fabio and Moens, Marie-Francine and Mothe, Josiane and Silvestre, Fabrizio and Di Nunzio, Giorgio Maria and Hauff, Claudia and Silvello, Gianmaria},
PAGES = {30--42},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9626},
ADDRESS = {Padova, Italy},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Leveraging Semantic Annotations to Link Wikipedia and News Archives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-48DC-F
%R 10.1007/978-3-319-30671-1_3
%D 2016
%B 38th European Conference on Information Retrieval
%Z date of event: 2016-03-20 - 2016-03-23
%C Padova, Italy
%B Advances in Information Retrieval
%E Ferro, Nicola; Crestani, Fabio; Moens, Marie-Francine; Mothe, Josiane; Silvestre, Fabrizio; Di Nunzio, Giorgio Maria; Hauff, Claudia; Silvello, Gianmaria
%P 30 - 42
%I Springer
%@ 978-3-319-30670-4
%B Lecture Notes in Computer Science
%N 9626

Report

A. Mishra and K. Berberich

“Leveraging Semantic Annotations to Link Wikipedia and News Archives,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2016-5-002, 2016.

mehr

Abstract

The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them.

To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.

BibTeX

@techreport{MishraBerberich16,
TITLE = {Leveraging Semantic Annotations to Link Wikipedia and News Archives},
AUTHOR = {Mishra, Arunav and Berberich, Klaus},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2016-5-002},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
ABSTRACT = {The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.},
TYPE = {Research Reports},
}

Endnote

%0 Report
%A Mishra, Arunav
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Leveraging Semantic Annotations to Link Wikipedia and News Archives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-5FF0-A
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2016
%P 21 p.
%X The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. 
To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.
%B Research Reports
%@ false

Conference paper

S. Mukherjee, S. Günnemann, and G. Weikum

“Continuous Experience-aware Language Model,” in KDD’16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016.

mehr

BibTeX

@inproceedings{MukherjeeKDD2016,
TITLE = {Continuous Experience-aware Language Model},
AUTHOR = {Mukherjee, Subhabrata and G{\"u}nnemann, Stephan and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4232-2},
DOI = {10.1145/2939672.2939780},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {KDD'16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
PAGES = {1075--1084},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A G&#252;nnemann, Stephan
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Continuous Experience-aware Language Model : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A678-6
%R 10.1145/2939672.2939780
%D 2016
%B 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%Z date of event: 2016-08-13 - 2016-08-17
%C San Francisco, CA, USA
%B KDD'16
%P 1075 - 1084
%I ACM
%@ 978-1-4503-4232-2

Conference paper

S. Mukherjee, S. Dutta, and G. Weikum

“Credible Review Detection with Limited Information Using Consistency Features,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Riva del Garda, Italy, 2016.

mehr

BibTeX

@inproceedings{MukherjeeECML2016,
TITLE = {Credible Review Detection with Limited Information Using Consistency Features},
AUTHOR = {Mukherjee, Subhabrata and Dutta, Sourav and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-46226-4},
DOI = {10.1007/978-3-319-46227-1_13},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)},
EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles},
PAGES = {195--213},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9852},
ADDRESS = {Riva del Garda, Italy},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Dutta, Sourav
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Credible Review Detection with Limited Information Using Consistency Features : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A67C-D
%R 10.1007/978-3-319-46227-1_13
%D 2016
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2016-09-19 - 2016-09-23
%C Riva del Garda, Italy
%B Machine Learning and Knowledge Discovery in Databases
%E Frasconi, Paolo; Landwehr, Niels; Manco, Guiseppe; Vreeken, Jilles
%P 195 - 213
%I Springer
%@ 978-3-319-46226-4
%B Lecture Notes in Artificial Intelligence
%N 9852

Conference paper

N. Mukuze and P. Miettinen

“Interactive Constrained Boolean Matrix Factorization,” in Proceedings of the ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2016), San Francisco, CA, USA, 2016.

mehr

BibTeX

@inproceedings{mukuze16interactive,
TITLE = {Interactive Constrained {B}oolean Matrix Factorization},
AUTHOR = {Mukuze, Nelson and Miettinen, Pauli},
LANGUAGE = {eng},
YEAR = {2015},
BOOKTITLE = {Proceedings of the ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2016)},
EDITOR = {Chau, Duen Horng and Vreeken, Jilles and van Leeuwen, Matthijs and Shahaf, Dafna and Faloutsos, Christos},
PAGES = {96--104},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Mukuze, Nelson
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Interactive Constrained Boolean Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-226C-2
%D 2016
%B ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics
%Z date of event: 2015-08-14 - 2014-08-14
%C San Francisco, CA, USA
%B Proceedings of the ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics
%E Chau, Duen Horng; Vreeken, Jilles; van Leeuwen, Matthijs; Shahaf, Dafna; Faloutsos  , Christos
%P 96 - 104

Thesis

N. Mukuze

“Interactive Boolean Matrix Factorization,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{MukuzeMSc2016,
TITLE = {Interactive Boolean Matrix Factorization},
AUTHOR = {Mukuze, Nelson},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Mukuze, Nelson
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Interactive Boolean Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-54C8-5
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P III, 68 p.
%V master
%9 master

Conference paper

S. Nag Chowdhury, N. Tandon, and G. Weikum

“Know2Look: Commonsense Knowledge for Visual Search,” in AKBC 2016, 5th Workshop on Automated Knowledge Base Construction, San Diego, CA, USA, 2016.

mehr

BibTeX

@inproceedings{DBLP:conf/akbc/ChowdhuryTW16,
TITLE = {{Know2Look}: {C}ommonsense Knowledge for Visual Search},
AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://www.akbc.ws/2016/papers/11_Paper.pdf},
PUBLISHER = {AKBC Board},
YEAR = {2016},
BOOKTITLE = {AKBC 2016, 5th Workshop on Automated Knowledge Base Construction},
PAGES = {57--62},
ADDRESS = {San Diego, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Nag Chowdhury, Sreyasi
%A Tandon, Niket
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Know2Look: Commonsense Knowledge for Visual Search : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A633-2
%U http://www.akbc.ws/2016/papers/11_Paper.pdf
%D 2016
%B 5th Workshop on Automated Knowledge Base Construction
%Z date of event: 2016-06-17 - 2016-06-17
%C San Diego, CA, USA
%B AKBC 2016
%P 57 - 62
%I AKBC Board

Conference paper

S. Nag Chowdhury

“Commonsense for Making Sense of Data,” in Proceedings of the VLDB 2016 PhD Workshop co-located with the 42nd International Conference on Very Large Databases (VLDB 2016), New Delhi, India, 2016.

mehr

BibTeX

@inproceedings{NagChowdhuryVLDB2016,
TITLE = {Commonsense for Making Sense of Data},
AUTHOR = {Nag Chowdhury, Sreyasi},
LANGUAGE = {eng},
URL = {urn:nbn:de:0074-1671-7; urn:nbn:de:0074-1671-7},
PUBLISHER = {CEUR-WS.org},
YEAR = {2016},
BOOKTITLE = {Proceedings of the VLDB 2016 PhD Workshop co-located with the 42nd International Conference on Very Large Databases (VLDB 2016)},
EDITOR = {Grust, Torsten and Karlapalem, Kamal and Pavlo, Andyq},
EID = {8},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1671},
ADDRESS = {New Delhi, India},
}

Endnote

%0 Conference Proceedings
%A Nag Chowdhury, Sreyasi
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Commonsense for Making Sense of Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-22E4-3
%U urn:nbn:de:0074-1671-7
%D 2016
%B VLDB 2016 PhD Workshop
%Z date of event: 2016-09-09 - 2016-09-09
%C New Delhi, India
%B Proceedings of the VLDB 2016 PhD Workshop co-located with the 42nd International Conference on Very Large Databases (VLDB 2016)
%E Grust, Torsten; Karlapalem, Kamal; Pavlo, Andyq
%Z sequence number: 8
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1671

Article

D. B. Nguyen, M. Theobald, and G. Weikum

“J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features,” Transactions of the Association for Computational Linguistics, vol. 4, 2016.

mehr

BibTeX

@article{Nguyen2016,
TITLE = {{J}-{NERD}: {J}oint {N}amed {E}ntity {R}ecognition and {D}isambiguation with Rich Linguistic Features},
AUTHOR = {Nguyen, Dat Ba and Theobald, Martin and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {2307-387X},
YEAR = {2016},
JOURNAL = {Transactions of the Association for Computational Linguistics},
VOLUME = {4},
PAGES = {215--229},
}

Endnote

%0 Journal Article
%A Nguyen, Dat Ba
%A Theobald, Martin
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-0199-1
%7 2016
%D 2016
%J Transactions of the Association for Computational Linguistics
%O TACL
%V 4
%& 215
%P 215 - 229
%@ false
%U https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/698

Conference paper

H.-V. Nguyen, P. Mandros, and J. Vreeken

“Universal Dependency Analysis,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.

mehr

Abstract

BibTeX

@inproceedings{MandrosSDM2016,
TITLE = {Universal Dependency Analysis},
AUTHOR = {Nguyen, Hoang-Vu and Mandros, Panagiotis and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-434-8},
DOI = {10.1137/1.9781611974348.89},
PUBLISHER = {SIAM},
YEAR = {2016},
DATE = {2016},
ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.},
BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)},
EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner},
PAGES = {792--800},
ADDRESS = {Miama, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Hoang-Vu
%A Mandros, Panagiotis
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Universal Dependency Analysis : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A935-8
%R 10.1137/1.9781611974348.89
%D 2016
%B 16th SIAM International Conference on Data Mining
%Z date of event: 2016-05-05 - 2016-05-07
%C Miama, FL, USA
%X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its  performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
%B Proceedings of the Sixteenth SIAM International Conference on Data Mining
%E Chawla Venkatasubramanian, Sanjay; Meira, Wagner
%P 792 - 800
%I SIAM
%@ 978-1-61197-434-8

Conference paper

H.-V. Nguyen and J. Vreeken

“Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.

mehr

Abstract

BibTeX

@inproceedings{VreekenSDM2016,
TITLE = {Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series},
AUTHOR = {Nguyen, Hoang-Vu and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-434-8},
DOI = {10.1137/1.9781611974348.93},
PUBLISHER = {SIAM},
YEAR = {2016},
DATE = {2016},
ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.},
BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)},
EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner},
PAGES = {828--836},
ADDRESS = {Miama, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Hoang-Vu
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A937-4
%R 10.1137/1.9781611974348.93
%D 2016
%B 16th SIAM International Conference on Data Mining
%Z date of event: 2016-05-05 - 2016-05-07
%C Miama, FL, USA
%X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its  performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
%B Proceedings of the Sixteenth SIAM International Conference on Data Mining
%E Chawla Venkatasubramanian, Sanjay; Meira, Wagner
%P 828 - 836
%I SIAM
%@ 978-1-61197-434-8

Conference paper

H.-V. Nguyen and J. Vreeken

“Flexibly Mining Better Subgroups,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.

mehr

Abstract

BibTeX

@inproceedings{NguyenSDM2016,
TITLE = {Flexibly Mining Better Subgroups},
AUTHOR = {Nguyen, Hoang-Vu and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-434-8},
DOI = {10.1137/1.9781611974348.66},
PUBLISHER = {SIAM},
YEAR = {2016},
DATE = {2016},
ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.},
BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)},
EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner},
PAGES = {585--593},
ADDRESS = {Miama, FL, USA},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Hoang-Vu
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Flexibly Mining Better Subgroups : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A933-C
%R 10.1137/1.9781611974348.66
%D 2016
%B 16th SIAM International Conference on Data Mining
%Z date of event: 2016-05-05 - 2016-05-07
%C Miama, FL, USA
%X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its  performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
%B Proceedings of the Sixteenth SIAM International Conference on Data Mining
%E Chawla Venkatasubramanian, Sanjay; Meira, Wagner
%P 585 - 593
%I SIAM
%@ 978-1-61197-434-8

Conference paper

K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum

“Credibility Assessment of Textual Claims on the Web,” in CIKM’16, 25th ACM International Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.

mehr

BibTeX

@inproceedings{PopatCIKM2016,
TITLE = {Credibility Assessment of Textual Claims on the {Web}},
AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4073-1},
DOI = {10.1145/2983323.2983661},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {CIKM'16, 25th ACM International Conference on Information and Knowledge Management},
PAGES = {2173--2178},
ADDRESS = {Indianapolis, IN, USA},
}

Endnote

%0 Conference Proceedings
%A Popat, Kashyap
%A Mukherjee, Subhabrata
%A Str&#246;tgen, Jannik
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Credibility Assessment of Textual Claims on the Web : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-B260-3
%R 10.1145/2983323.2983661
%D 2016
%B 25th ACM  International  Conference  on  Information  and  Knowledge Management
%Z date of event: 2016-10-24 - 2016-10-28
%C Indianapolis, IN, USA
%B CIKM'16
%P 2173 - 2178
%I ACM
%@ 978-1-4503-4073-1

Conference paper

T. Rebele, F. Suchanek, J. Hoffart, J. Biega, E. Kuzey, and G. Weikum

“YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames,” in The Semantic Web -- ISWC 2016, Kobe, Japan, 2016.

mehr

BibTeX

@inproceedings{RebeleISWC2016,
TITLE = {{YAGO}: A Multilingual Knowledge Base from {W}ikipedia, {W}ordnet, and {G}eonames},
AUTHOR = {Rebele, Thomas and Suchanek, Fabian and Hoffart, Johannes and Biega, Joanna and Kuzey, Erdal and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-46546-3},
DOI = {10.1007/978-3-319-46547-0_19},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {The Semantic Web -- ISWC 2016},
EDITOR = {Groth, Paul and Simperl, Elena and Gray, Alasdair and Sabou, Marta and Kr{\"o}tzsch, Markus and Lecue, Freddy and Fl{\"o}ck, Fabian and Gil, Yolanda},
PAGES = {177--185},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9982},
ADDRESS = {Kobe, Japan},
}

Endnote

%0 Conference Proceedings
%A Rebele, Thomas
%A Suchanek, Fabian
%A Hoffart, Johannes
%A Biega, Joanna
%A Kuzey, Erdal
%A Weikum, Gerhard
%+ T&#233;l&#233;com ParisTech
T&#233;l&#233;com ParisTech
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A69A-9
%R 10.1007/978-3-319-46547-0_19
%D 2016
%B 15th International Semantic Web Conference
%Z date of event: 2016-10-17 - 2016-10-21
%C Kobe, Japan
%B The Semantic Web -- ISWC 2016
%E Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Kr&#246;tzsch, Markus; Lecue, Freddy; Fl&#246;ck, Fabian; Gil, Yolanda
%P 177 - 185
%I Springer
%@ 978-3-319-46546-3
%B Lecture Notes in Computer Science
%N 9982

Conference paper

P. Rozenshtein, A. Gionis, B. A. Prakash, and J. Vreeken

“Reconstructing an Epidemic Over Time,” in KDD’16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016.

mehr

BibTeX

@inproceedings{RozenshteinKDD2016,
TITLE = {Reconstructing an Epidemic Over Time},
AUTHOR = {Rozenshtein, Polina and Gionis, Aristides and Prakash, B. Aditya and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-4503-4232-2},
DOI = {10.1145/2939672.2939865},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {KDD'16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
PAGES = {1835 --1844},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Rozenshtein, Polina
%A Gionis, Aristides
%A Prakash, B. Aditya
%A Vreeken, Jilles
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Reconstructing an Epidemic Over Time : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A92F-7
%R 10.1145/2939672.2939865
%D 2016
%B 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%Z date of event: 2016-08-13 - 2016-08-17
%C San Francisco, CA, USA
%B KDD'16
%P 1835  - 1844
%I ACM
%@ 978-1-4503-4232-2

Conference paper

R. Saha Roy, A. Suresh, N. Ganguly, and M. Choudhury

“Improving Document Ranking for Long Queries with Nested Query Segmentation,” in Advances in Information Retrieval (ECIR 2016), Padova, Italy, 2016.

mehr

BibTeX

@inproceedings{RoyECIR2016,
TITLE = {Improving Document Ranking for Long Queries with Nested Query Segmentation},
AUTHOR = {Saha Roy, Rishiraj and Suresh, Anusha and Ganguly, Niloy and Choudhury, Monojit},
LANGUAGE = {eng},
ISBN = {978-3-319-30670-4},
DOI = {10.1007/978-3-319-30671-1_67},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2016)},
EDITOR = {Ferro, Nicola and Crestani, Fabio and Moens, Marie-Francine and Mothe, Josiane and Silvestre, Fabrizio and Di Nunzio, Giorgio Maria and Hauff, Claudia and Silvello, Gianmaria},
PAGES = {775--781},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9626},
ADDRESS = {Padova, Italy},
}

Endnote

%0 Conference Proceedings
%A Saha Roy, Rishiraj
%A Suresh, Anusha
%A Ganguly, Niloy
%A Choudhury, Monojit
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Improving Document Ranking for Long Queries with Nested Query Segmentation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-48DF-9
%R 10.1007/978-3-319-30671-1_67
%D 2016
%B 38th European Conference on Information Retrieval
%Z date of event: 2016-03-20 - 2016-03-23
%C Padova, Italy
%B Advances in Information Retrieval
%E Ferro, Nicola; Crestani, Fabio; Moens, Marie-Francine; Mothe, Josiane; Silvestre, Fabrizio; Di Nunzio, Giorgio Maria; Hauff, Claudia; Silvello, Gianmaria
%P 775 - 781
%I Springer
%@ 978-3-319-30670-4
%B Lecture Notes in Computer Science
%N 9626

Article

R. Saha Roy, S. Agarwal, N. Ganguly, and M. Choudhury

“Syntactic Complexity of Web Search Queries through the Lenses of Language Models, Networks and Users,” Information Processing and Management, vol. 52, no. 5, 2016.

mehr

BibTeX

@article{Roy2016,
TITLE = {Syntactic complexity of {W}eb search queries through the lenses of language models, networks and users},
AUTHOR = {Saha Roy, Rishiraj and Agarwal, Smith and Ganguly, Niloy and Choudhury, Monojit},
LANGUAGE = {eng},
ISSN = {0306-4573},
DOI = {10.1016/j.ipm.2016.04.002},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2016},
DATE = {2016},
JOURNAL = {Information Processing and Management},
VOLUME = {52},
NUMBER = {5},
PAGES = {923--948},
}

Endnote

%0 Journal Article
%A Saha Roy, Rishiraj
%A Agarwal, Smith
%A Ganguly, Niloy
%A Choudhury, Monojit
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Syntactic Complexity of Web Search Queries through the Lenses of Language Models, Networks and Users : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-7EBE-B
%R 10.1016/j.ipm.2016.04.002
%7 2016
%D 2016
%J Information Processing and Management
%V 52
%N 5
%& 923
%P 923 - 948
%I Elsevier
%C Amsterdam
%@ false

Thesis

M. Salyaeva

“Summarising and Recommending with Skipisodes,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{SalyaevaMSc2016,
TITLE = {Summarising and Recommending with Skipisodes},
AUTHOR = {Salyaeva, Margarita},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Salyaeva, Margarita
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Summarising and Recommending with Skipisodes : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5F46-2
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%V master
%9 master

Conference paper

A. Schmidt, J. Hoffart, D. Milchevski, and G. Weikum

“Context-Sensitive Auto-Completion for Searching with Entities and Categories,” in SIGIR’16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 2016.

mehr

BibTeX

@inproceedings{SchmidtIGIR2016,
TITLE = {Context-Sensitive Auto-Completion for Searching with Entities and Categories},
AUTHOR = {Schmidt, Andreas and Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4069-4},
DOI = {10.1145/2911451.2911461},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {SIGIR'16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {1097--1100},
ADDRESS = {Pisa, Italy},
}

Endnote

%0 Conference Proceedings
%A Schmidt, Andreas
%A Hoffart, Johannes
%A Milchevski, Dragan
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Context-Sensitive Auto-Completion for Searching with Entities and Categories : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A924-E
%R 10.1145/2911451.2911461
%D 2016
%B 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2016-07-17 - 2016-07-21
%C Pisa, Italy
%B SIGIR'16
%P 1097 - 1100
%I ACM
%@ 978-1-4503-4069-4

Conference paper

S. Seufert, P. Ernst, S. J. Bedathur, S. K. Kondreddi, K. Berberich, and G. Weikum

“Instant Espresso: Interactive Analysis of Relationships in Knowledge Graphs,” in WWW’16 Companion, Montréal, Canada, 2016.

mehr

BibTeX

@inproceedings{SeufertWWW2016,
TITLE = {Instant {E}spresso: {I}nteractive Analysis of Relationships in Knowledge Graphs},
AUTHOR = {Seufert, Stephan and Ernst, Patrick and Bedathur, Srikanta J. and Kondreddi, Sarath Kumar and Berberich, Klaus and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4144-8},
DOI = {10.1145/2872518.2890528},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WWW'16 Companion},
PAGES = {251--254},
ADDRESS = {Montr{\'e}al, Canada},
}

Endnote

%0 Conference Proceedings
%A Seufert, Stephan
%A Ernst, Patrick
%A Bedathur, Srikanta J.
%A Kondreddi, Sarath Kumar
%A Berberich, Klaus
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Instant Espresso: Interactive Analysis of Relationships in Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-01BD-F
%R 10.1145/2872518.2890528
%D 2016
%B 25th International Conference on World Wide Web
%Z date of event: 2016-05-11 - 2016-05-15
%C Montr&#233;al, Canada
%B WWW'16 Companion
%P 251 - 254
%I ACM
%@ 978-1-4503-4144-8

Conference paper

S. Seufert, K. Berberich, S. J. Bedathur, S. K. Kondreddi, P. Ernst, and G. Weikum

“ESPRESSO: Explaining Relationships between Entity Sets,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.

mehr

BibTeX

@inproceedings{DBLP:conf/cikm/SeufertBBKEW16,
TITLE = {{ESPRESSO}: {E}xplaining Relationships between Entity Sets},
AUTHOR = {Seufert, Stephan and Berberich, Klaus and Bedathur, Srikanta J. and Kondreddi, Sarath Kumar and Ernst, Patrick and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-4073-1},
DOI = {10.1145/2983323.2983778},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management},
PAGES = {1311--1320},
ADDRESS = {Indianapolis, IN, USA},
}

Endnote

%0 Conference Proceedings
%A Seufert, Stephan
%A Berberich, Klaus
%A Bedathur, Srikanta J.
%A Kondreddi, Sarath Kumar
%A Ernst, Patrick
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ESPRESSO: Explaining Relationships between Entity Sets : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-20D3-8
%R 10.1145/2983323.2983778
%D 2016
%B 25th ACM Conference on Information and Knowledge Management
%Z date of event: 2016-10-24 - 2016-10-28
%C Indianapolis, IN, USA
%B CIKM'16
%P 1311 - 1320
%I ACM
%@ 978-1-4503-4073-1

Conference paper

D. Seyler, M. Yahya, K. Berberich, and O. Alonso

“Automated Question Generation for Quality Control in Human Computation Tasks,” in WebSci’16, ACM Web Science Conference, Hannover, Germany, 2016.

mehr

BibTeX

@inproceedings{SeylerWebSci2016,
TITLE = {Automated Question Generation for Quality Control in Human Computation Tasks},
AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus and Alonso, Omar},
LANGUAGE = {eng},
ISBN = {978-1-4503-4208-7},
DOI = {10.1145/2908131.2908210},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WebSci'16, ACM Web Science Conference},
PAGES = {360--362},
ADDRESS = {Hannover, Germany},
}

Endnote

%0 Conference Proceedings
%A Seyler, Dominic
%A Yahya, Mohamed
%A Berberich, Klaus
%A Alonso, Omar
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Automated Question Generation for Quality Control in
Human Computation Tasks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-08DF-7
%R 10.1145/2908131.2908210
%D 2016
%B ACM Web Science Conference
%Z date of event: 2016-05-22 - 2016-05-25
%C Hannover, Germany
%B WebSci'16
%P 360 - 362
%I ACM
%@ 978-1-4503-4208-7

Paper

D. Seyler, M. Yahya, and K. Berberich

“Knowledge Questions from Knowledge Graphs,” 2016. [Online]. Available: http://arxiv.org/abs/1610.09935.

mehr

Abstract

We address the novel problem of automatically generating quiz-style knowledge

questions from a knowledge graph such as DBpedia. Questions of this kind have

ample applications, for instance, to educate users about or to evaluate their

knowledge in a specific domain. To solve the problem, we propose an end-to-end

approach. The approach first selects a named entity from the knowledge graph as

an answer. It then generates a structured triple-pattern query, which yields

the answer as its sole result. If a multiple-choice question is desired, the

approach selects alternative answer options. Finally, our approach uses a

template-based method to verbalize the structured query and yield a natural

language question. A key challenge is estimating how difficult the generated

question is to human users. To do this, we make use of historical data from the

Jeopardy! quiz show and a semantically annotated Web-scale document collection,

engineer suitable features, and train a logistic regression classifier to

predict question difficulty. Experiments demonstrate the viability of our

overall approach.

BibTeX

@online{Seyler1610.09935,
TITLE = {Knowledge Questions from Knowledge Graphs},
AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1610.09935},
EPRINT = {1610.09935},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {We address the novel problem of automatically generating quiz-style knowledge questions from a knowledge graph such as DBpedia. Questions of this kind have ample applications, for instance, to educate users about or to evaluate their knowledge in a specific domain. To solve the problem, we propose an end-to-end approach. The approach first selects a named entity from the knowledge graph as an answer. It then generates a structured triple-pattern query, which yields the answer as its sole result. If a multiple-choice question is desired, the approach selects alternative answer options. Finally, our approach uses a template-based method to verbalize the structured query and yield a natural language question. A key challenge is estimating how difficult the generated question is to human users. To do this, we make use of historical data from the Jeopardy! quiz show and a semantically annotated Web-scale document collection, engineer suitable features, and train a logistic regression classifier to predict question difficulty. Experiments demonstrate the viability of our overall approach.},
}

Endnote

%0 Report
%A Seyler, Dominic
%A Yahya, Mohamed
%A Berberich, Klaus
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Questions from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1CB5-F
%U http://arxiv.org/abs/1610.09935
%D 2016
%X   We address the novel problem of automatically generating quiz-style knowledge
questions from a knowledge graph such as DBpedia. Questions of this kind have
ample applications, for instance, to educate users about or to evaluate their
knowledge in a specific domain. To solve the problem, we propose an end-to-end
approach. The approach first selects a named entity from the knowledge graph as
an answer. It then generates a structured triple-pattern query, which yields
the answer as its sole result. If a multiple-choice question is desired, the
approach selects alternative answer options. Finally, our approach uses a
template-based method to verbalize the structured query and yield a natural
language question. A key challenge is estimating how difficult the generated
question is to human users. To do this, we make use of historical data from the
Jeopardy! quiz show and a semantically annotated Web-scale document collection,
engineer suitable features, and train a logistic regression classifier to
predict question difficulty. Experiments demonstrate the viability of our
overall approach.

%K Computer Science, Computation and Language, cs.CL

Thesis

A. Shah

“Recognizing Visual Activities,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{ShahMSc2016,
TITLE = {Recognizing Visual Activities},
AUTHOR = {Shah, Ali},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Shah, Ali
%Y Weikum, Gerhard
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Recognizing Visual Activities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-439D-1
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P 56 p.
%V master
%9 master

Conference paper

J. Singh, J. Hoffart, and A. Anand

“Discovering Entities with Just a Little Help from You,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.

mehr

BibTeX

@inproceedings{Singh:2016:DEJ:2983323.2983798,
TITLE = {Discovering Entities with Just a Little Help from You},
AUTHOR = {Singh, Jaspreet and Hoffart, Johannes and Anand, Avishek},
LANGUAGE = {eng},
ISBN = {978-1-4503-4073-1},
DOI = {10.1145/2983323.2983798},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management},
PAGES = {1331--1340},
ADDRESS = {Indianapolis, IN, USA},
}

Endnote

%0 Conference Proceedings
%A Singh, Jaspreet
%A Hoffart, Johannes
%A Anand, Avishek
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Discovering Entities with Just a Little Help from You : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1CC2-2
%R 10.1145/2983323.2983798
%D 2016
%B 25th ACM Conference on Information and Knowledge Management
%Z date of event: 2016-10-24 - 2016-10-28
%C Indianapolis, IN, USA
%B CIKM'16
%P 1331 - 1340
%I ACM
%@ 978-1-4503-4073-1

Conference paper

A. Siu, P. Ernst, and G. Weikum

“Disambiguation of Entities in MEDLINE Abstracts by Combining MeSH Terms with Knowledge,” in Proceedings of the 15th Workshop on Biomedical Natural Language Processing (BioNLP 2016), Berlin, Germany, 2016.

mehr

BibTeX

@inproceedings{Siu16,
TITLE = {Disambiguation of entities in {MEDLINE} abstracts by combining {MeSH} terms with knowledge},
AUTHOR = {Siu, Amy and Ernst, Patrick and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-945626-12-8},
PUBLISHER = {ACL},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Proceedings of the 15th Workshop on Biomedical Natural Language Processing (BioNLP 2016)},
PAGES = {72--76},
ADDRESS = {Berlin, Germany},
}

Endnote

%0 Conference Proceedings
%A Siu, Amy
%A Ernst, Patrick
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Disambiguation of Entities in MEDLINE Abstracts by Combining MeSH Terms with Knowledge : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2040-3
%D 2016
%B 15th Workshop on Biomedical Natural Language Processing
%Z date of event: 2016-08-12 - 2016-08-12
%C Berlin, Germany
%B Proceedings of the 15th Workshop on Biomedical Natural Language Processing
%P 72 - 76
%I ACL
%@ 978-1-945626-12-8
%U http://aclweb.org/anthology/W/W16/W16-2909.pdf

Thesis

D. Spanier

“An Incremental Approach to Distilling Named Events from News Streams,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{SpanierMSc2016,
TITLE = {An Incremental Approach to Distilling Named Events from News Streams},
AUTHOR = {Spanier, Daniel},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Spanier, Daniel
%Y Weikum, Gerhard
%A referee: Setty, Vinay
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T An Incremental Approach to Distilling Named Events from News Streams : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4913-0
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P XI, 58 p.
%V master
%9 master

Book

J. Strötgen and M. Gertz

Domain-Sensitive Temporal Tagging. San Rafael, CA: Morgan & Claypool Publishers, 2016.

mehr

BibTeX

@book{StroetgenBook2016,
TITLE = {Domain-Sensitive Temporal Tagging},
AUTHOR = {Str{\"o}tgen, Jannik and Gertz, Michael},
LANGUAGE = {eng},
ISSN = {1947-4040},
ISBN = {9781627054591; 9781627054997},
DOI = {10.2200/S00721ED1V01Y201606HLT036},
PUBLISHER = {Morgan \& Claypool Publishers},
ADDRESS = {San Rafael, CA},
YEAR = {2016},
DATE = {2016},
PAGES = {151 p.},
SERIES = {Synthesis Lectures on Human Language Technologies},
}

Endnote

%0 Book
%A Str&#246;tgen, Jannik
%A Gertz, Michael
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Domain-Sensitive Temporal Tagging : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-1777-9
%@ 9781627054591
%@ 9781627054997
%R 10.2200/S00721ED1V01Y201606HLT036
%I Morgan & Claypool Publishers
%C San Rafael, CA
%D 2016
%P 151 p.
%B Synthesis Lectures on Human Language Technologies
%@ false

Book chapter / section

J. Strötgen

“Domänen-sensitives Temporal Tagging für Event-zentriertes Information Retrieval,” in Ausgezeichnete Informatikdissertationen 2015, Bonn: GI, 2016.

mehr

BibTeX

@incollection{StrotgenLNI_Diss16,
TITLE = {{Dom{\"a}nen-sensitives Temporal Tagging f{\"u}r Event-zentriertes Information Retrieval}},
AUTHOR = {Str{\"o}tgen, Jannik},
LANGUAGE = {deu},
ISBN = {978-3-88579-975-7},
PUBLISHER = {GI},
ADDRESS = {Bonn},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Ausgezeichnete Informatikdissertationen 2015},
EDITOR = {H{\"o}lldobler, Steffen},
PAGES = {279--288},
SERIES = {Lecture Notes in Informatics -- Dissertations},
VOLUME = {16},
}

Endnote

%0 Book Section
%A Str&#246;tgen, Jannik
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Dom&#228;nen-sensitives Temporal Tagging f&#252;r Event-zentriertes Information Retrieval : 
%G deu
%U http://hdl.handle.net/11858/00-001M-0000-002B-B26A-F
%D 2016
%B Ausgezeichnete Informatikdissertationen 2015
%E H&#246;lldobler, Steffen
%P 279 - 288
%I GI
%C Bonn
%@ 978-3-88579-975-7
%S Lecture Notes in Informatics - Dissertations
%N 16

Conference paper

A. Talaika, J. Biega, A. Amarilli, and F. M. Suchanek

“IBEX: Harvesting Entities from the Web Using Unique Identifiers,” in Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015), Melbourne, Australia, 2016.

mehr

BibTeX

@inproceedings{Talaika2016,
TITLE = {{IBEX}: {H}arvesting Entities from the {Web} Using Unique Identifiers},
AUTHOR = {Talaika, Aliaksandr and Biega, Joanna and Amarilli, Antoine and Suchanek, Fabian M.},
LANGUAGE = {eng},
ISBN = {978-1-4503-3627-7},
DOI = {10.1145/2767109.2767116},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2016},
BOOKTITLE = {Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015)},
EDITOR = {Stoyanovich, Julia and Suchanek, Fabian M.},
PAGES = {13--19},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Talaika, Aliaksandr
%A Biega, Joanna
%A Amarilli, Antoine
%A Suchanek, Fabian M.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
T&#233;l&#233;com ParisTech
T&#233;l&#233;com ParisTech
%T IBEX: Harvesting Entities from the Web Using Unique Identifiers : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-AF0D-5
%R 10.1145/2767109.2767116
%D 2016
%B 18th International Workshop on the Web and Databases
%Z date of event: 2015-05-31 - 2015-05-31
%C Melbourne, Australia
%B Proceedings of the 18th International Workshop on Web and Databases
%E Stoyanovich, Julia; Suchanek, Fabian M.
%P 13 - 19
%I ACM
%@ 978-1-4503-3627-7

Conference paper

D5D2

N. Tandon, C. D. Hariman, J. Urbani, A. Rohrbach, M. Rohrbach, and G. Weikum

“Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 2016.

mehr

BibTeX

@inproceedings{TandonAAAI2016,
TITLE = {Commonsense in Parts: Mining Part-Whole Relations from the {Web} and Image Tags},
AUTHOR = {Tandon, Niket and Hariman, Charles Darwis and Urbani, Jacopo and Rohrbach, Anna and Rohrbach, Marcus and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-57735-760-5},
PUBLISHER = {AAAI Press},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence},
PAGES = {243--250},
ADDRESS = {Phoenix, AZ, USA},
}

Endnote

%0 Conference Proceedings
%A Tandon, Niket
%A Hariman, Charles Darwis
%A Urbani, Jacopo
%A Rohrbach, Anna
%A Rohrbach, Marcus
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-ABFE-1
%D 2016
%B Thirtieth AAAI Conference on Artificial Intelligence
%Z date of event: 2016-02-12 - 2016-02-17
%C Phoenix, AZ, USA
%B Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
%P 243 - 250
%I AAAI Press
%@ 978-1-57735-760-5
%U http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12337/11590

Thesis

D5IMPR-CS

N. Tandon

“Commonsense Knowledge Acquisition and Applications,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@phdthesis{TandonPhD2016,
TITLE = {Commonsense Knowledge Acquisition and Applications},
AUTHOR = {Tandon, Niket},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-66291},
DOI = {10.22028/D291-26668},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Tandon, Niket
%Y Weikum, Gerhard
%A referee: Lieberman, Henry
%A referee: Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Commonsense Knowledge Acquisition and Applications : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-78F6-A
%U urn:nbn:de:bsz:291-scidok-66291
%R 10.22028/D291-26668
%F OTHER: hdl:20.500.11880/26724
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P XIV, 154 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2016/6629/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Thesis

IMPR-CSD5

C. Teflioudi

“Algorithms for Shared-Memory Matrix Completion and Maximum Inner Product Search,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@phdthesis{Teflioudiphd2016,
TITLE = {Algorithms for Shared-Memory Matrix Completion and Maximum Inner Product Search},
AUTHOR = {Teflioudi, Christina},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-64699},
DOI = {10.22028/D291-26651},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Teflioudi, Christina
%Y Gemulla, Rainer
%A referee: Weikum, Gerhard
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Algorithms for Shared-Memory Matrix Completion and Maximum Inner Product Search : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-43FA-2
%U urn:nbn:de:bsz:291-scidok-64699
%R 10.22028/D291-26651
%F OTHER: urn:nbn:de:bsz:291-scidok-64699
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P xi, 110 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2016/6469/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

Y. S. Uddin, V. Setty, Y. Zhao, R. Vitenberg, and N. Venkatasubramanian

“RichNote: Adaptive Selection and Delivery of Rich Media Notifications to Mobile Users,” in ICDCS 2016, 36th International Conference on Distributed Computing Systems, Nara, Japan, 2016.

mehr

BibTeX

@inproceedings{UddinICDCS2016,
TITLE = {{RichNote}: {A}daptive Selection and Delivery of Rich Media Notifications to Mobile Users},
AUTHOR = {Uddin, Yusuf Sarwar and Setty, Vinay and Zhao, Ye and Vitenberg, Roman and Venkatasubramanian, Nalini},
LANGUAGE = {eng},
ISSN = {1063-6927},
DOI = {10.1109/ICDCS.2016.107},
PUBLISHER = {IEEE},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {ICDCS 2016, 36th International Conference on Distributed Computing Systems},
PAGES = {159--168},
ADDRESS = {Nara, Japan},
}

Endnote

%0 Conference Proceedings
%A Uddin, Yusuf Sarwar
%A Setty, Vinay
%A Zhao, Ye
%A Vitenberg, Roman
%A Venkatasubramanian, Nalini
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T RichNote: Adaptive Selection and Delivery of Rich Media Notifications to Mobile Users : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-9AFE-C
%R 10.1109/ICDCS.2016.107
%D 2016
%B 36th International Conference on Distributed Computing Systems 
%Z date of event: 2016-06-27 - 2016-06-30
%C Nara, Japan
%B ICDCS 2016
%P 159 - 168
%I IEEE
%@ false

Paper

J. Urbani, S. Dutta, S. Gurajada, and G. Weikum

“KOGNAC: Efficient Encoding of Large Knowledge Graphs,” 2016. [Online]. Available: http://arxiv.org/abs/1604.04795.

mehr

Abstract

Many Web applications require efficient querying of large Knowledge Graphs

(KGs). We propose KOGNAC, a dictionary-encoding algorithm designed to improve

SPARQL querying with a judicious combination of statistical and semantic

techniques. In KOGNAC, frequent terms are detected with a frequency

approximation algorithm and encoded to maximise compression. Infrequent terms

are semantically grouped into ontological classes and encoded to increase data

locality. We evaluated KOGNAC in combination with state-of-the-art RDF engines,

and observed that it significantly improves SPARQL querying on KGs with up to

1B edges.

BibTeX

@online{Urbani2016,
TITLE = {{KOGNAC}: Efficient Encoding of Large Knowledge Graphs},
AUTHOR = {Urbani, Jacopo and Dutta, Sourav and Gurajada, Sairam and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1604.04795},
EPRINT = {1604.04795},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {Many Web applications require efficient querying of large Knowledge Graphs (KGs). We propose KOGNAC, a dictionary-encoding algorithm designed to improve SPARQL querying with a judicious combination of statistical and semantic techniques. In KOGNAC, frequent terms are detected with a frequency approximation algorithm and encoded to maximise compression. Infrequent terms are semantically grouped into ontological classes and encoded to increase data locality. We evaluated KOGNAC in combination with state-of-the-art RDF engines, and observed that it significantly improves SPARQL querying on KGs with up to 1B edges.},
}

Endnote

%0 Report
%A Urbani, Jacopo
%A Dutta, Sourav
%A Gurajada, Sairam
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T KOGNAC: Efficient Encoding of Large Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-01C1-3
%U http://arxiv.org/abs/1604.04795
%D 2016
%X   Many Web applications require efficient querying of large Knowledge Graphs
(KGs). We propose KOGNAC, a dictionary-encoding algorithm designed to improve
SPARQL querying with a judicious combination of statistical and semantic
techniques. In KOGNAC, frequent terms are detected with a frequency
approximation algorithm and encoded to maximise compression. Infrequent terms
are semantically grouped into ontological classes and encoded to increase data
locality. We evaluated KOGNAC in combination with state-of-the-art RDF engines,
and observed that it significantly improves SPARQL querying on KGs with up to
1B edges.

%K Computer Science, Artificial Intelligence, cs.AI

Conference paper

J. Urbani, S. Dutta, S. Gurajada, and G. Weikum

“KOGNAC: Efficient Encoding of Large Knowledge Graphs,” in Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2016), New York, NY, USA, 2016.

mehr

BibTeX

@inproceedings{UrbaniIJCAI2016,
TITLE = {{KOGNAC}: {E}fficient Encoding of Large Knowledge Graphs},
AUTHOR = {Urbani, Jacopo and Dutta, Sourav and Gurajada, Sairam and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-57735-771-1},
URL = {http://www.ijcai.org/Proceedings/16/Papers/548.pdf},
PUBLISHER = {AAAI},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2016)},
EDITOR = {Kambhampati, Subbarao},
PAGES = {3896--3902},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Urbani, Jacopo
%A Dutta, Sourav
%A Gurajada, Sairam
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T KOGNAC: Efficient Encoding of Large Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A641-2
%U http://www.ijcai.org/Proceedings/16/Papers/548.pdf
%D 2016
%B 25th International Joint Conference on Artificial Intelligence
%Z date of event: 2016-07-09 - 2016-07-15
%C New York, NY, USA
%B Twenty-Fifth International Joint Conference on Artificial Intelligence
%E Kambhampati, Subbarao
%P 3896 - 3902
%I AAAI
%@ 978-1-57735-771-1

Conference paper

Y. Wang, Z. Ren, M. Theobald, M. Dylla, and G. de Melo

“Summary Generation for Temporal Extractions,” in Database and Expert Systems Applications (DEXA 2016), Porto, Portugal, 2016.

mehr

BibTeX

@inproceedings{Wang_DEXA2016,
TITLE = {Summary Generation for Temporal Extractions},
AUTHOR = {Wang, Yafang and Ren, Zhaochun and Theobald, Martin and Dylla, Maximilian and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-3-319-44402-4},
DOI = {10.1007/978-3-319-44403-1_23},
PUBLISHER = {Springer},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {Database and Expert Systems Applications (DEXA 2016)},
EDITOR = {Hartmann, Sven and Ma, Hui},
PAGES = {370--386},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9827},
ADDRESS = {Porto, Portugal},
}

Endnote

%0 Conference Proceedings
%A Wang, Yafang
%A Ren, Zhaochun
%A Theobald, Martin
%A Dylla, Maximilian
%A de Melo, Gerard
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Summary Generation for Temporal Extractions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2DE2-F
%R 10.1007/978-3-319-44403-1_23
%D 2016
%B 27th International Conference on Database and Expert Systems Application
%Z date of event: 2016-09-05 - 2016-09-08
%C Porto, Portugal
%B Database and Expert Systems Applications
%E Hartmann, Sven; Ma, Hui
%P 370 - 386
%I Springer
%@ 978-3-319-44402-4
%B Lecture Notes in Computer Science
%N 9827

Article

G. Weikum, J. Hoffart, and F. Suchanek

“Ten Years of Knowledge Harvesting: Lessons and Challenges,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 39, no. 3, 2016.

mehr

BibTeX

@article{Weikum_Hoffart_Suchanek2016,
TITLE = {Ten Years of Knowledge Harvesting: {L}essons and Challenges},
AUTHOR = {Weikum, Gerhard and Hoffart, Johannes and Suchanek, Fabian},
LANGUAGE = {eng},
URL = {http://sites.computer.org/debull/A16sept/p41.pdf},
PUBLISHER = {IEEE Computer Society},
ADDRESS = {Los Alamitos, CA},
YEAR = {2016},
DATE = {2016},
JOURNAL = {Bulletin of the IEEE Computer Society Technical Committee on Data Engineering},
VOLUME = {39},
NUMBER = {3},
PAGES = {41--50},
}

Endnote

%0 Journal Article
%A Weikum, Gerhard
%A Hoffart, Johannes
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
T&#233;l&#233;com ParisTech
%T Ten Years of Knowledge Harvesting: Lessons and Challenges : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A618-F
%U http://sites.computer.org/debull/A16sept/p41.pdf
%7 2016
%D 2016
%J Bulletin of the IEEE Computer Society Technical Committee on Data Engineering
%V 39
%N 3
%& 41
%P 41 - 50
%I IEEE Computer Society
%C Los Alamitos, CA

Article

G. Weikum

“Die Abteilung Datenbanken und Informationssysteme am Max-Planck-Institut für Informatik,” Datenbank Spektrum, vol. 16, no. 1, 2016.

mehr

BibTeX

@article{WeikumDBSpektrum2016,
TITLE = {{Die Abteilung Datenbanken und Informationssysteme am Max-Planck-Institut f{\"u}r Informatik}},
AUTHOR = {Weikum, Gerhard},
LANGUAGE = {deu},
DOI = {10.1007/s13222-016-0211-z},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2016},
DATE = {2016},
JOURNAL = {Datenbank Spektrum},
VOLUME = {16},
NUMBER = {1},
PAGES = {77--82},
}

Endnote

%0 Journal Article
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Die Abteilung Datenbanken und Informationssysteme am Max-Planck-Institut f&#252;r Informatik : 
%G deu
%U http://hdl.handle.net/11858/00-001M-0000-002B-0194-B
%R 10.1007/s13222-016-0211-z
%7 2016
%D 2016
%J Datenbank Spektrum
%V 16
%N 1
%& 77
%P 77 - 82
%I Springer
%C Berlin

Thesis

B. A. Wójciak

“Spaghetti: Finding Storylines in Large Collections of Documents,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{WojciakMSc2016,
TITLE = {Spaghetti: Finding Storylines in Large Collections of Documents},
AUTHOR = {W{\'o}jciak, Beata Anna},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A W&#243;jciak, Beata Anna
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Spaghetti: Finding Storylines in Large Collections of Documents : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5F3F-3
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%V master
%9 master

Paper

H. Wu, M. Sun, J. Vreeken, N. Tatti, C. North, and N. Ramakrishnan

“Interactive and Iterative Discovery of Entity Network Subgraphs,” 2016. [Online]. Available: http://arxiv.org/abs/1608.03889.

mehr

Abstract

Graph mining to extract interesting components has been studied in various

guises, e.g., communities, dense subgraphs, cliques. However, most existing

works are based on notions of frequency and connectivity and do not capture

subjective interestingness from a user's viewpoint. Furthermore, existing

approaches to mine graphs are not interactive and cannot incorporate user

feedbacks in any natural manner. In this paper, we address these gaps by

proposing a graph maximum entropy model to discover surprising connected

subgraph patterns from entity graphs. This model is embedded in an interactive

visualization framework to enable human-in-the-loop, model-guided data

exploration. Using case studies on real datasets, we demonstrate how

interactions between users and the maximum entropy model lead to faster and

explainable conclusions.

BibTeX

@online{Wu1608.03889,
TITLE = {Interactive and Iterative Discovery of Entity Network Subgraphs},
AUTHOR = {Wu, Hao and Sun, Maoyuan and Vreeken, Jilles and Tatti, Nikolaj and North, Chris and Ramakrishnan, Naren},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1608.03889},
EPRINT = {1608.03889},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions.},
}

Endnote

%0 Report
%A Wu, Hao
%A Sun, Maoyuan
%A Vreeken, Jilles
%A Tatti, Nikolaj
%A North, Chris
%A Ramakrishnan, Naren
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Interactive and Iterative Discovery of Entity Network Subgraphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A939-F
%U http://arxiv.org/abs/1608.03889
%D 2016
%X   Graph mining to extract interesting components has been studied in various
guises, e.g., communities, dense subgraphs, cliques. However, most existing
works are based on notions of frequency and connectivity and do not capture
subjective interestingness from a user's viewpoint. Furthermore, existing
approaches to mine graphs are not interactive and cannot incorporate user
feedbacks in any natural manner. In this paper, we address these gaps by
proposing a graph maximum entropy model to discover surprising connected
subgraph patterns from entity graphs. This model is embedded in an interactive
visualization framework to enable human-in-the-loop, model-guided data
exploration. Using case studies on real datasets, we demonstrate how
interactions between users and the maximum entropy model lead to faster and
explainable conclusions.

%K cs.SI,Computer Science, Databases, cs.DB

Paper

H. Wu, Y. Ning, P. Chakraborty, J. Vreeken, N. Tatti, and N. Ramakrishnan

“Generating Realistic Synthetic Population Datasets,” 2016. [Online]. Available: http://arxiv.org/abs/1602.06844.

mehr

Abstract

Modern studies of societal phenomena rely on the availability of large

datasets capturing attributes and activities of synthetic, city-level,

populations. For instance, in epidemiology, synthetic population datasets are

necessary to study disease propagation and intervention measures before

implementation. In social science, synthetic population datasets are needed to

understand how policy decisions might affect preferences and behaviors of

individuals. In public health, synthetic population datasets are necessary to

capture diagnostic and procedural characteristics of patient records without

violating confidentialities of individuals. To generate such datasets over a

large set of categorical variables, we propose the use of the maximum entropy

principle to formalize a generative model such that in a statistically

well-founded way we can optimally utilize given prior information about the

data, and are unbiased otherwise. An efficient inference algorithm is designed

to estimate the maximum entropy model, and we demonstrate how our approach is

adept at estimating underlying data distributions. We evaluate this approach

against both simulated data and on US census datasets, and demonstrate its

feasibility using an epidemic simulation application.

BibTeX

@online{Wu_arXiv2016,
TITLE = {Generating Realistic Synthetic Population Datasets},
AUTHOR = {Wu, Hao and Ning, Yue and Chakraborty, Prithwish and Vreeken, Jilles and Tatti, Nikolaj and Ramakrishnan, Naren},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1602.06844},
EPRINT = {1602.06844},
EPRINTTYPE = {arXiv},
YEAR = {2016},
ABSTRACT = {Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and on US census datasets, and demonstrate its feasibility using an epidemic simulation application.},
}

Endnote

%0 Report
%A Wu, Hao
%A Ning, Yue
%A Chakraborty, Prithwish
%A Vreeken, Jilles
%A Tatti, Nikolaj
%A Ramakrishnan, Naren
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Generating Realistic Synthetic Population Datasets : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-08F9-B
%U http://arxiv.org/abs/1602.06844
%D 2016
%X   Modern studies of societal phenomena rely on the availability of large
datasets capturing attributes and activities of synthetic, city-level,
populations. For instance, in epidemiology, synthetic population datasets are
necessary to study disease propagation and intervention measures before
implementation. In social science, synthetic population datasets are needed to
understand how policy decisions might affect preferences and behaviors of
individuals. In public health, synthetic population datasets are necessary to
capture diagnostic and procedural characteristics of patient records without
violating confidentialities of individuals. To generate such datasets over a
large set of categorical variables, we propose the use of the maximum entropy
principle to formalize a generative model such that in a statistically
well-founded way we can optimally utilize given prior information about the
data, and are unbiased otherwise. An efficient inference algorithm is designed
to estimate the maximum entropy model, and we demonstrate how our approach is
adept at estimating underlying data distributions. We evaluate this approach
against both simulated data and on US census datasets, and demonstrate its
feasibility using an epidemic simulation application.

%K Computer Science, Databases, cs.DB

Thesis

D5IMPR-CS

M. Yahya

“Question Answering and Query Processing for Extended Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@phdthesis{yahyaphd2016,
TITLE = {Question Answering and Query Processing for Extended Knowledge Graphs},
AUTHOR = {Yahya, Mohamed},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-64765},
DOI = {10.22028/D291-25428},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Yahya, Mohamed
%Y Weikum, Gerhard
%A referee: Sch&#252;tze, Hinrich
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Question Answering and Query Processing for Extended Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-48C2-7
%R 10.22028/D291-25428
%U urn:nbn:de:bsz:291-scidok-64765
%F OTHER: hdl:20.500.11880/25484
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%P x, 160 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2016/6476/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

M. Yahya, D. Barbosa, K. Berberich, Q. Wang, and G. Weikum

“Relationship Queries on Extended Knowledge Graphs,” in WSDM’16, 9th ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, 2016.

mehr

BibTeX

@inproceedings{YahyaWSDM2016,
TITLE = {Relationship Queries on Extended Knowledge Graphs},
AUTHOR = {Yahya, Mohamed and Barbosa, Denilson and Berberich, Klaus and Wang, Quiyue and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-3716-8},
DOI = {10.1145/2835776.2835795},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WSDM'16, 9th ACM International Conference on Web Search and Data Mining},
PAGES = {605--614},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Yahya, Mohamed
%A Barbosa, Denilson
%A Berberich, Klaus
%A Wang, Quiyue
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Relationship Queries on Extended Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-ABAA-0
%R 10.1145/2835776.2835795
%D 2016
%B 9th ACM International Conference on Web Search and Data Mining
%Z date of event: 2016-02-22 - 2016-02-25
%C San Francisco, CA, USA
%B WSDM'16
%P 605 - 614
%I ACM
%@ 978-1-4503-3716-8

Article

M. Yahya, K. Berberich, M. Ramanath, and G. Weikum

“Exploratory Querying of Extended Knowledge Graphs,” Proceedings of the VLDB Endowment (Proc. VLDB 2016), vol. 9, no. 1, 2016.

mehr

BibTeX

@article{YahyaVLDB2016,
TITLE = {Exploratory Querying of Extended Knowledge Graphs},
AUTHOR = {Yahya, Mohamed and Berberich, Klaus and Ramanath, Maya and Weikum, Gerhard},
LANGUAGE = {eng},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2016},
DATE = {2016},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {9},
NUMBER = {1},
PAGES = {1521--1524},
BOOKTITLE = {Proceedings of the 42nd International Conference on Very Large Data Bases (VLDB 2016)},
EDITOR = {Chaudhuri, Surajit and Haritsa, Jayant},
}

Endnote

%0 Journal Article
%A Yahya, Mohamed
%A Berberich, Klaus
%A Ramanath, Maya
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Exploratory Querying of Extended Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A61C-7
%7 2016
%D 2016
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 9
%N 1
%& 1521
%P 1521 - 1524
%I ACM
%C New York, NY
%B Proceedings of the 42nd International Conference on Very Large Data Bases
%O VLDB 2016 New Delhi, India, September 5 - 9, 2016
%U http://www.vldb.org/pvldb/vol9/p1521-yahya.pdf

Conference paper

L. Zervakis, C. Tryfonopoulos, V. Setty, S. Seufert, and S. Skiadopoulos

“Towards Publish/Subscribe Functionality on Graphs,” in Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, Bordeaux, France, 2016.

mehr

BibTeX

@inproceedings{DBLP:conf/edbt/ZervakisTSSS16,
TITLE = {Towards Publish/Subscribe Functionality on Graphs},
AUTHOR = {Zervakis, Lefteris and Tryfonopoulos, Christos and Setty, Vinay and Seufert, Stephan and Skiadopoulos, Spiros},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-1558-2},
PUBLISHER = {CEUR-WS.org},
YEAR = {2016},
BOOKTITLE = {Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference},
EDITOR = {Palpanas, Thermis and Stefanidis, Kostas},
EID = {13},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1558},
ADDRESS = {Bordeaux, France},
}

Endnote

%0 Conference Proceedings
%A Zervakis, Lefteris
%A Tryfonopoulos, Christos
%A Setty, Vinay
%A Seufert, Stephan
%A Skiadopoulos, Spiros
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Towards Publish/Subscribe Functionality on Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1CAE-1
%D 2016
%B 2nd International Workshop on Preservation of Evolving Big Data
%Z date of event: 2016-03-15 - 2016-03-15
%C Bordeaux, France
%B Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference
%E Palpanas, Thermis; Stefanidis, Kostas
%Z sequence number: 13
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1558
%@ false

Conference paper

H. Zhang and V. Setty

“Finding Diverse Needles in a Haystack of Comments -- Social Media Exploration for News,” in WebSci’16, ACM Web Science Conference, Hannover, Germany, 2016.

mehr

BibTeX

@inproceedings{ZhangWebSci2016,
TITLE = {Finding Diverse Needles in a Haystack of Comments -- Social Media Exploration for News},
AUTHOR = {Zhang, Hang and Setty, Vinay},
LANGUAGE = {eng},
ISBN = {978-1-4503-4208-7},
DOI = {10.1145/2908131.2908168},
PUBLISHER = {ACM},
YEAR = {2016},
DATE = {2016},
BOOKTITLE = {WebSci'16, ACM Web Science Conference},
PAGES = {286--290},
ADDRESS = {Hannover, Germany},
}

Endnote

%0 Conference Proceedings
%A Zhang, Hang
%A Setty, Vinay
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Finding Diverse Needles in a Haystack of Comments -- Social Media Exploration for News : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-020A-C
%R 10.1145/2908131.2908168
%D 2016
%B ACM Web Science Conference
%Z date of event: 2016-05-22 - 2016-05-25
%C Hannover, Germany
%B WebSci'16
%P 286 - 290
%I ACM
%@ 978-1-4503-4208-7

Thesis

H. Zhang

“Diversified Social Media Retrieval for News Stories,” Universität des Saarlandes, Saarbrücken, 2016.

mehr

BibTeX

@mastersthesis{ZhangMSc2016,
TITLE = {Diversified Social Media Retrieval for News Stories},
AUTHOR = {Zhang, Hang},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2016},
DATE = {2016},
}

Endnote

%0 Thesis
%A Zhang, Hang
%Y Neumann, G&#252;nther
%A referee: Weikum, Gerhard
%A referee: Setty, Vinay
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Diversified Social Media Retrieval for News Stories : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-48D3-E
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2016
%V master
%9 master

2015

Conference paper

S. Abiteboul, L. Dong, O. Etzioni, D. Srivastava, G. Weikum, J. Stoyanovich, and F. M. Suchanek

“The Elephant in the Room: Getting Value from Big Data,” in Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015), Melbourne, Australia, 2015.

mehr

BibTeX

@inproceedings{AbiteboulWebDB2015,
TITLE = {The Elephant in the Room: {G}etting Value from {Big Data}},
AUTHOR = {Abiteboul, Serge and Dong, Luna and Etzioni, Oren and Srivastava, Divesh and Weikum, Gerhard and Stoyanovich, Julia and Suchanek, Fabian M.},
LANGUAGE = {eng},
ISBN = {978-1-4503-3627-7},
DOI = {10.1145/2767109.2770014},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015)},
EDITOR = {Stoyanovich, Julia and Suchanek, Fabian M.},
PAGES = {1--5},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Abiteboul , Serge
%A Dong, Luna
%A Etzioni, Oren
%A Srivastava, Divesh
%A Weikum, Gerhard
%A Stoyanovich, Julia
%A Suchanek, Fabian M.
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
T&#233;l&#233;com ParisTech
%T The Elephant in the Room: Getting Value from Big Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0027-D3F2-F
%R 10.1145/2767109.2770014
%D 2015
%B 18th International Workshop on the Web and Databases
%Z date of event: 2015-05-31 - 2015-05-31
%C Melbourne, Australia
%B Proceedings of the 18th International Workshop on Web and Databases
%E Stoyanovich, Julia; Suchanek, Fabian M.
%P 1 - 5
%I ACM
%@ 978-1-4503-3627-7

Conference paper

A. Abujabal and K. Berberich

“Important Events in the Past, Present, and Future,” in WWW’15 Companion, Florence, Italy, 2015.

mehr

BibTeX

@inproceedings{AbjuabalWWW2015,
TITLE = {Important Events in the Past, Present, and Future},
AUTHOR = {Abujabal, Abdalghani and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-3473-0},
DOI = {10.1145/2740908.2741692},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {WWW'15 Companion},
PAGES = {1315--1320},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Abujabal, Abdalghani
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Important Events in the Past, Present, and Future : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-E33A-8
%R 10.1145/2740908.2741692
%D 2015
%B 24th International Conference on World Wide Web 
%Z date of event: 2015-04-18 - 2015-04-22
%C Florence, Italy
%B WWW'15 Companion
%P 1315 - 1320
%I ACM
%@ 978-1-4503-3473-0

Thesis

A. Abujabal

“Mining Past, Present, and Future,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{AbujabalMaster2015,
TITLE = {Mining Past, Present, and Future},
AUTHOR = {Abujabal, Abdalghani},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Abujabal, Abdalghani
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mining Past, Present, and Future : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-A974-2
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P XII, 86 p.
%V master
%9 master

Article

A. Anagnostopoulos, L. Becchetti, I. Bordino, S. Leonardi, I. Mele, and P. Sankowski

“Stochastic Query Covering for Fast Approximate Document Retrieval,” ACM Transactions on Information Systems, vol. 33, no. 3, 2015.

mehr

BibTeX

@article{Anagnostopoulos:TOIS,
TITLE = {Stochastic Query Covering for Fast Approximate Document Retrieval},
AUTHOR = {Anagnostopoulos, Aris and Becchetti, Luca and Bordino, Ilaria and Leonardi, Stefano and Mele, Ida and Sankowski, Piotr},
LANGUAGE = {eng},
ISSN = {1046-8188},
DOI = {10.1145/2699671},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2015},
DATE = {2015},
JOURNAL = {ACM Transactions on Information Systems},
VOLUME = {33},
NUMBER = {3},
PAGES = {1--35},
EID = {11},
}

Endnote

%0 Journal Article
%A Anagnostopoulos, Aris
%A Becchetti, Luca
%A Bordino, Ilaria
%A Leonardi, Stefano
%A Mele, Ida
%A Sankowski, Piotr
%+ External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Stochastic Query Covering for Fast Approximate Document Retrieval : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-B6C7-2
%R 10.1145/2699671
%7 2015
%D 2015
%J ACM Transactions on Information Systems
%O TOIS
%V 33
%N 3
%& 1
%P 1 - 35
%Z sequence number: 11
%I ACM
%C New York, NY
%@ false

Conference paper

A. Anagnostopoulos, L. Becchetti, A. Fazzone, I. Mele, and M. Riondato

“The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing,” in SIGMOD’15, ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 2015.

mehr

BibTeX

@inproceedings{Anagnostopoulos:SIGMOD2015,
TITLE = {The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing},
AUTHOR = {Anagnostopoulos, Aris and Becchetti, Luca and Fazzone, Adriano and Mele, Ida and Riondato, Matteo},
LANGUAGE = {eng},
ISBN = {978-1-4503-2758-9},
DOI = {10.1145/2723372.2723722},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {SIGMOD'15, ACM SIGMOD International Conference on Management of Data},
PAGES = {983--998},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Anagnostopoulos, Aris
%A Becchetti, Luca
%A Fazzone, Adriano
%A Mele, Ida
%A Riondato, Matteo
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-B6BE-7
%R 10.1145/2723372.2723722
%D 2015
%B ACM SIGMOD International Conference on Management of Data 
%Z date of event: 2015-05-31 - 2015-06-04
%C Melbourne, Australia
%B SIGMOD'15
%P 983 - 998
%I ACM
%@ 978-1-4503-2758-9

Proceedings

K. Balog, J. Dalton, A. Doucet, and Y. Ibrahim

Eds., ESAIR’15. ACM, 2015.

mehr

BibTeX

@proceedings{Balog:2015:2810133,
TITLE = {ESAIR'15, Eight Workshop on Exploiting Semantic Annotations in Information Retrieval},
EDITOR = {Balog, Krisztian and Dalton, Jeffrey and Doucet, Antoine and Ibrahim, Yusra},
LANGUAGE = {eng},
ISBN = {978-1-4503-3790-8},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%E Balog, Krisztian
%E Dalton, Jeffrey
%E Doucet, Antoine
%E Ibrahim, Yusra
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ESAIR'15 : Proceedings of the 2015 Workshop on  
Exploiting Semantic Annotations 
in Information Retrieval
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4869-B
%@ 978-1-4503-3790-8
%I ACM
%D 2015
%B Eight Workshop on Exploiting Semantic Annotations in Information Retrieval
%Z date of event: 2015-10-23 - 2015-10-23
%D 2015
%C Melbourne, Australia

Conference paper

H. R. Bazoobandi, S. de Rooij, J. Urbani, A. ten Teije, F. van Harmelen, and H. Bal

“A Compact In-Memory Dictionary for RDF Data,” in The Semantic Web (ESWC 2015), Portorož, Slovenia, 2015.

mehr

BibTeX

@inproceedings{Urbanilncs15,
TITLE = {A Compact In-Memory Dictionary for {RDF} Data},
AUTHOR = {Bazoobandi, Hamid R. and de Rooij, Steve and Urbani, Jacopo and ten Teije, Annette and van Harmelen, Frank and Bal, Henri},
LANGUAGE = {eng},
ISBN = {978-3-319-18817-1},
DOI = {10.1007/978-3-319-18818-8_13},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {The Semantic Web (ESWC 2015)},
EDITOR = {Gandon, Fabien and Sabou, Marta and Sack, Harald and d'Amato, Claudia and Cudr{\'e}-Mauroux, Philippe and Zimmermann, Antoine},
PAGES = {205--220},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9088},
ADDRESS = {Portoro{\v z}, Slovenia},
}

Endnote

%0 Conference Proceedings
%A Bazoobandi, Hamid R.
%A de Rooij, Steve
%A Urbani, Jacopo
%A ten Teije, Annette
%A van Harmelen, Frank
%A Bal, Henri
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T A Compact In-Memory Dictionary for RDF Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0028-F1A6-9
%R 10.1007/978-3-319-18818-8_13
%D 2015
%B 12th European Semantic Web Conference
%Z date of event: 2015-05-31 - 2015-06-04
%C Portoro&#382;, Slovenia
%B The Semantic Web
%E Gandon, Fabien; Sabou, Marta; Sack, Harald; d'Amato, Claudia; Cudr&#233;-Mauroux, Philippe; Zimmermann, Antoine
%P 205 - 220
%I Springer
%@ 978-3-319-18817-1
%B Lecture Notes in Computer Science
%N 9088

Article

K. Beedkar, K. Berberich, R. Gemulla, and I. Miliaraki

“Closing the Gap: Sequence Mining at Scale,” ACM Transactions on Database Systems, vol. 40, no. 2, 2015.

mehr

BibTeX

@article{DBLP:journals/tods/BeedkarBGM15,
TITLE = {Closing the Gap: {S}equence Mining at Scale},
AUTHOR = {Beedkar, Kaustubh and Berberich, Klaus and Gemulla, Rainer and Miliaraki, Iris},
LANGUAGE = {eng},
ISSN = {0362-5915},
DOI = {10.1145/2757217},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2015},
DATE = {2015},
JOURNAL = {ACM Transactions on Database Systems},
VOLUME = {40},
NUMBER = {2},
PAGES = {1--44},
EID = {8},
}

Endnote

%0 Journal Article
%A Beedkar, Kaustubh
%A Berberich, Klaus
%A Gemulla, Rainer
%A Miliaraki, Iris
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Closing the Gap: Sequence Mining at Scale : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-5712-1
%R 10.1145/2757217
%7 2015
%D 2015
%J ACM Transactions on Database Systems
%V 40
%N 2
%& 1
%P 1 - 44
%Z sequence number: 8
%I ACM
%C New York, NY
%@ false

Thesis

A. Biswas

“Retrieving Web User’s Personal Information,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{BiswasMSc2015,
TITLE = {Retrieving Web User's Personal Information},
AUTHOR = {Biswas, Angeeka},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Biswas, Angeeka
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Retrieving Web User's Personal Information : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-48D1-1
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P VIII, 30 p.
%V master
%9 master

Conference paper

K. Budhathoki and J. Vreeken

“The Difference and the Norm - Characterising Similarities and Differences Between Databases,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), Porto, Portugal, 2015.

mehr

BibTeX

@inproceedings{BudhathokiECML2015,
TITLE = {The Difference and the Norm -- Characterising Similarities and Differences Between Databases},
AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-319-23524-0},
DOI = {10.1007/978-3-319-23525-7_13},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015)},
EDITOR = {Appice, Annalisa and Pereira Rodrigues, Pedro and Gama, Jo{\~a}o and Al{\'i}pio, Jorge and Soares, Carlos},
PAGES = {206--223},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9285},
ADDRESS = {Porto, Portugal},
}

Endnote

%0 Conference Proceedings
%A Budhathoki, Kailash
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T The Difference and the Norm - Characterising Similarities and Differences Between Databases  : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-2271-F
%R 10.1007/978-3-319-23525-7_13
%D 2015
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2015-09-07 - 2015-09-11
%C Porto, Portugal
%B Machine Learning and Knowledge Discovery in Databases
%E Appice, Annalisa; Pereira Rodrigues, Pedro; Gama, Jo&#227;o; Al&#237;pio, Jorge; Soares, Carlos
%P 206 - 223
%I Springer
%@ 978-3-319-23524-0
%B Lecture Notes in Artificial Intelligence
%N 9285

Thesis

K. Budhathoki

“Correlation by Compression,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{BudhathokiMaster2015,
TITLE = {Correlation by Compression},
AUTHOR = {Budhathoki, Kailash},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Budhathoki, Kailash
%Y Vreeken, Jilles
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Correlation by Compression : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-0753-D
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P X, 56 p.
%V master
%9 master

Thesis

K. Chakrabarti

“K-Shortest Paths with Overlap Constraints,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{ChakrabartiMSc2015,
TITLE = {K-Shortest Paths with Overlap Constraints},
AUTHOR = {Chakrabarti, Kaustuv},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Chakrabarti, Kaustuv
%Y Weikum, Gerhard
%A referee: Setty, Vinay
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T K-Shortest Paths with Overlap Constraints : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-43A5-D
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 54 p.
%V master
%9 master

Proceedings

P. Chau, J. Vreeken, M. van Leeuwen, and C. Faloutsos

Eds., Proceedings of the ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics. 2015.

mehr

BibTeX

@proceedings{chau:15:idea,
TITLE = {Proceedings of the ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2015)},
EDITOR = {Chau, Polo and Vreeken, Jilles and van Leeuwen, Matthijs and Faloutsos, Christos},
LANGUAGE = {eng},
YEAR = {2015},
PAGES = {72 p.},
ADDRESS = {Sydney, Australia},
}

Endnote

%0 Conference Proceedings
%E Chau, Polo
%E Vreeken, Jilles
%E van Leeuwen, Matthijs
%E Faloutsos, Christos
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Proceedings of the ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-578A-0
%D 2015
%B ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics
%Z date of event: 2015-08-10 - 2014-08-10
%D 2015
%C Sydney, Australia
%P 72 p.
%U http://poloclub.gatech.edu/idea2015/papers/idea15-proceedings.pdf

Thesis

IMPR-CSD5

D. Dedik

“Robust Type Classification of Out of Knowledge Base Entities,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{DedikMaster2015,
TITLE = {Robust Type Classification of Out of Knowledge Base Entities},
AUTHOR = {Dedik, Darya},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Dedik, Darya
%Y Weikum, Gerhard
%A referee: Spaniol, Marc
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Robust Type Classification of Out of Knowledge Base Entities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0026-C0EC-F
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 65 p.
%V master
%9 master

Conference paper

L. Del Corro, A. Abujabal, R. Gemulla, and G. Weikum

“FINET: Context-Aware Fine-Grained Named Entity Typing,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 2015.

mehr

BibTeX

@inproceedings{delcorro-EtAl:2015:EMNLP,
TITLE = {{FINET}: {C}ontext-Aware Fine-Grained Named Entity Typing},
AUTHOR = {Del Corro, Luciano and Abujabal, Abdalghani and Gemulla, Rainer and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-941643-32-7},
URL = {https://aclweb.org/anthology/D/D15/D15-1103},
PUBLISHER = {ACL},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)},
PAGES = {868--878},
ADDRESS = {Lisbon, Portugal},
}

Endnote

%0 Conference Proceedings
%A Del Corro, Luciano
%A Abujabal, Abdalghani
%A Gemulla, Rainer
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T FINET: Context-Aware Fine-Grained Named Entity Typing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-49C3-C
%U https://aclweb.org/anthology/D/D15/D15-1103
%D 2015
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2015-09-17 - 2015-09-21
%C Lisbon, Portugal
%B Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

%P 868 - 878
%I ACL
%@ 978-1-941643-32-7
%U https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/EMNLP/pdf/EMNLP103.pdf

Article

S. Dutta and G. Weikum

“Cross-document Co-reference Resolution using Sample-based Clustering with Knowledge Enrichment,” Transactions of the Association for Computational Linguistics, vol. 3, 2015.

mehr

BibTeX

@article{SouTACL2015,
TITLE = {Cross-document Co-reference Resolution using Sample-based Clustering with Knowledge Enrichment},
AUTHOR = {Dutta, Sourav and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {2307-387X},
PUBLISHER = {ACL},
ADDRESS = {Stroudsbourg, PA},
YEAR = {2015},
DATE = {2015},
JOURNAL = {Transactions of the Association for Computational Linguistics},
VOLUME = {3},
PAGES = {15--28},
}

Endnote

%0 Journal Article
%A Dutta, Sourav
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cross-document Co-reference Resolution using Sample-based Clustering with Knowledge Enrichment : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-54B7-C
%7 2015
%D 2015
%J Transactions of the Association for Computational Linguistics
%O TACL
%V 3
%& 15
%P 15 - 28
%I ACL
%C Stroudsbourg, PA
%@ false

Conference paper

S. Dutta, S. Bhattacherjee, and A. Narang

“Mining Wireless Intelligence using Unsupervised Edge and Core Analytics,” in 2nd Workshop on Smarter Planet and Big Data Analytics, Goa, Indien.

mehr

BibTeX

@inproceedings{SouSPBDA2015,
TITLE = {Mining Wireless Intelligence using Unsupervised Edge and Core Analytics},
AUTHOR = {Dutta, Sourav and Bhattacherjee, Souvik and Narang, Ankur},
LANGUAGE = {eng},
YEAR = {2015},
PUBLREMARK = {Accepted},
BOOKTITLE = {2nd Workshop on Smarter Planet and Big Data Analytics},
ADDRESS = {Goa, Indien},
}

Endnote

%0 Conference Proceedings
%A Dutta, Sourav
%A Bhattacherjee, Souvik
%A Narang, Ankur
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Mining Wireless Intelligence using Unsupervised Edge and Core Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-54B5-0
%D 2014
%B 2nd Workshop on Smarter Planet and Big Data Analytics
%Z date of event: 2015-01-04 - 2015-01-07
%C Goa, Indien
%B 2nd Workshop on Smarter Planet and Big Data Analytics

Conference paper

S. Dutta

“MIST: Top-k Approximate Sub-String Mining using Triplet Statistical Significance,” in Advances in Information Retrieval (ECIR 2015), Vienna, Austria, 2015.

mehr

BibTeX

@inproceedings{SouECIR2015,
TITLE = {{MIST}: Top-k Approximate Sub-String Mining using Triplet Statistical Significance},
AUTHOR = {Dutta, Sourav},
LANGUAGE = {eng},
ISBN = {978-3-319-16353-6},
DOI = {10.1007/978-3-319-16354-3_31},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2015)},
EDITOR = {Hanbury, Allan and Kazai, Gabriella and Rauber, Andreas and Fuhr, Norbert},
PAGES = {284--290},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9022},
ADDRESS = {Vienna, Austria},
}

Endnote

%0 Conference Proceedings
%A Dutta, Sourav
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T MIST: Top-k Approximate Sub-String Mining using Triplet Statistical Significance : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-54B2-5
%R 10.1007/978-3-319-16354-3_31
%D 2015
%B 37th European Conference on Information Retrieval
%Z date of event: 2015-03-29 - 2015-04-02
%C Vienna, Austria
%B Advances in Information Retrieval
%E Hanbury, Allan; Kazai, Gabriella; Rauber, Andreas; Fuhr, Norbert
%P 284 - 290
%I Springer
%@ 978-3-319-16353-6
%B Lecture Notes in Computer Science
%N 9022

Conference paper

S. Dutta, A. Narang, and S. Bhattacherjee

“Predictive Caching Framework for Mobile Wireless Networks,” in MDM 2015, 16th International Conference on Mobile Data Management, Pittsburgh, PA, USA, 2015.

mehr

BibTeX

@inproceedings{DuttaMDM2015,
TITLE = {Predictive Caching Framework for Mobile Wireless Networks},
AUTHOR = {Dutta, Sourav and Narang, Ankur and Bhattacherjee, Souvik},
LANGUAGE = {eng},
ISBN = {978-1-4799-9972-9},
DOI = {10.1109/MDM.2015.14},
PUBLISHER = {IEEE},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {MDM 2015, 16th International Conference on Mobile Data Management},
PAGES = {179--184},
ADDRESS = {Pittsburgh, PA, USA},
}

Endnote

%0 Conference Proceedings
%A Dutta, Sourav
%A Narang, Ankur
%A Bhattacherjee, Souvik
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Predictive Caching Framework for Mobile Wireless Networks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002B-A5AF-3
%R 10.1109/MDM.2015.14
%D 2015
%B 16th International Conference  on Mobile Data Management
%Z date of event: 2015-06-15 - 2015-06-18
%C Pittsburgh, PA, USA
%B MDM 2015
%P 179 - 184
%I IEEE
%@ 978-1-4799-9972-9

Conference paper

S. Dutta and G. Weikum

“C3EL: A Joint Model for Cross-Document Co-Reference Resolution and Entity Linking,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 2015.

mehr

BibTeX

@inproceedings{dutta-weikum:2015:EMNLP,
TITLE = {{C3EL}: {A} Joint Model for Cross-Document Co-Reference Resolution and Entity Linking},
AUTHOR = {Dutta, Sourav and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-941643-32-7},
URL = {https://aclweb.org/anthology/D/D15/D15-1101},
PUBLISHER = {ACL},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)},
PAGES = {846--856},
ADDRESS = {Lisbon, Portugal},
}

Endnote

%0 Conference Proceedings
%A Dutta, Sourav
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T C3EL: A Joint Model for Cross-Document Co-Reference Resolution and Entity Linking : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-49C1-0
%U https://aclweb.org/anthology/D/D15/D15-1101
%D 2015
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2015-09-17 - 2015-09-21
%C Lisbon, Portugal
%B Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

%P 846 - 856
%I ACL
%@ 978-1-941643-32-7
%U https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/EMNLP/pdf/EMNLP101.pdf

Article

P. Ernst, A. Siu, and G. Weikum

“KnowLife: A Versatile Approach for Constructing a Large Knowledge Graph for Biomedical Sciences,” BMC Bioinformatics, vol. 16, no. 1, 2015.

mehr

BibTeX

@article{ErnstSiuWeikum2015,
TITLE = {{KnowLife}: A Versatile Approach for Constructing a Large Knowledge Graph for Biomedical Sciences},
AUTHOR = {Ernst, Patrick and Siu, Amy and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1471-2105},
URL = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4448285&tool=pmcentrez&rendertype=abstract},
DOI = {10.1186/s12859-015-0549-5},
PUBLISHER = {BioMed Central},
ADDRESS = {London},
YEAR = {2015},
JOURNAL = {BMC Bioinformatics},
VOLUME = {16},
NUMBER = {1},
EID = {157},
}

Endnote

%0 Journal Article
%A Ernst, Patrick
%A Siu, Amy
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T KnowLife: A Versatile Approach for Constructing a Large Knowledge Graph for Biomedical Sciences : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0027-7AB7-0
%F OTHER: pmcidPMC4448285
%F OTHER: pmc-uid4448285
%F OTHER: publisher-id549
%R 10.1186/s12859-015-0549-5
%U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4448285&tool=pmcentrez&rendertype=abstract
%7 2015-05-14
%D 2015
%8 14.05.2015
%K Relation extraction
%J BMC Bioinformatics
%V 16
%N 1
%Z sequence number: 157
%I BioMed Central
%C London
%@ false

Thesis

IMPR-CSD5

M. Gad-Elrab

“AIDArabic+ Named Entity Disambiguation for Arabic Text,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{Gad-ElrabMaster2015,
TITLE = {{AIDArabic}+ Named Entity Disambiguation for Arabic Text},
AUTHOR = {Gad-Elrab, Mohamed},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Gad-Elrab, Mohamed
%Y Weikum, Gerhard
%A referee: Berberich, Klaus
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T AIDArabic+ Named Entity Disambiguation for Arabic Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-0F70-5
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 56 p.
%V master
%9 master

Conference paper

M. H. Gad-Elrab, M. A. Yosef, and G. Weikum

“EDRAK: Entity-Centric Data Resource for Arabic Knowledge,” in The Second Workshop on Arabic Natural Language Processing (ANLP 2015), Beijing, China, 2015.

mehr

BibTeX

@inproceedings{Gad-ElrabAnLP2015,
TITLE = {{EDRAK}: {E}ntity-Centric Data Resource for {Arabic} Knowledge},
AUTHOR = {Gad-Elrab, Mohamed H. and Yosef, Mohamed Amir and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-941643-58-7},
PUBLISHER = {ACL},
YEAR = {2015},
BOOKTITLE = {The Second Workshop on Arabic Natural Language Processing (ANLP 2015)},
PAGES = {191--200},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed H.
%A Yosef, Mohamed Amir
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T EDRAK: Entity-Centric Data Resource for Arabic Knowledge : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-0773-3
%D 2015
%B The Second Workshop on Arabic Natural Language Processing
%Z date of event: 2015-07-26 - 2015-07-31
%C Beijing, China
%B The Second Workshop on Arabic Natural Language Processing
%P 191 - 200
%I ACL
%@ 978-1-941643-58-7

Conference paper

M. H. Gad-Elrab, M. A. Yosef, and G. Weikum

“Named Entity Disambiguation for Resource-poor Languages,” in ESAIR’15, Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, Melbourne, Australia, 2015.

mehr

BibTeX

@inproceedings{Gad-ElrabESAIR2015,
TITLE = {Named Entity Disambiguation for Resource-poor Languages},
AUTHOR = {Gad-Elrab, Mohamed H. and Yosef, Mohamed Amir and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-3790-8},
DOI = {10.1145/2810133.2810138},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {ESAIR'15, Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval},
EDITOR = {Alonso, Omar and Kamps, Jaap and Karlgren, Jussi},
PAGES = {29--34},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Gad-Elrab, Mohamed H.
%A Yosef, Mohamed Amir
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Named Entity Disambiguation for Resource-poor Languages : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002A-077F-B
%R 10.1145/2810133.2810138
%D 2015
%B Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval
%Z date of event: 2015-10-23 - 2015-10-23
%C Melbourne, Australia
%B ESAIR'15
%E Alonso, Omar; Kamps, Jaap; Karlgren, Jussi
%P 29 - 34
%I ACM
%@ 978-1-4503-3790-8

Article

L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek

“Fast Rule Mining in Ontological Knowledge Bases with AMIE+,” The VLDB Journal, vol. 24, no. 6, 2015.

mehr

BibTeX

@article{Galarrag2015,
TITLE = {Fast Rule Mining in Ontological Knowledge Bases with {AMIE}+},
AUTHOR = {Gal{\'a}rraga, Luis and Teflioudi, Christina and Hose, Katja and Suchanek, Fabian M.},
LANGUAGE = {eng},
ISSN = {1066-8888},
DOI = {10.1007/s00778-015-0394-1},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2015},
DATE = {2015},
JOURNAL = {The VLDB Journal},
VOLUME = {24},
NUMBER = {6},
PAGES = {707--730},
}

Endnote

%0 Journal Article
%A Gal&#225;rraga, Luis
%A Teflioudi, Christina
%A Hose, Katja
%A Suchanek, Fabian M.
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
T&#233;l&#233;com ParisTech
%T Fast Rule Mining in Ontological Knowledge Bases with AMIE+ : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-3510-3
%R 10.1007/s00778-015-0394-1
%7 2015
%D 2015
%J The VLDB Journal
%V 24
%N 6
%& 707
%P 707 - 730
%I Springer
%C Berlin
%@ false

Conference paper

J. Geiß, A. Spitz, J. Strötgen, and M. Gertz

“The Wikipedia Location Network - Overcoming Borders and Oceans,” in Proceedings of the 9th Workshop on Geographic Information Retrieval (GIR 2015), Paris, France, 2015.

mehr

BibTeX

@inproceedings{GIR2015,
TITLE = {The {Wikipedia} Location Network -- Overcoming Borders and Oceans},
AUTHOR = {Gei{\ss}, Johanna and Spitz, Andreas and Str{\"o}tgen, Jannik and Gertz, Michael},
LANGUAGE = {eng},
ISBN = {978-1-4503-3937-7},
DOI = {10.1145/2837689.2837694},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the 9th Workshop on Geographic Information Retrieval (GIR 2015)},
EDITOR = {Purves, Ross S. and Jones, Christopher B.},
PAGES = {1--3},
EID = {2},
ADDRESS = {Paris, France},
}

Endnote

%0 Conference Proceedings
%A Gei&#223;, Johanna
%A Spitz, Andreas
%A Str&#246;tgen, Jannik
%A Gertz, Michael
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T The Wikipedia Location Network - Overcoming Borders and Oceans  : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-216D-0
%R 10.1145/2837689.2837694
%D 2015
%B 9th Workshop on Geographic Information Retrieval
%Z date of event: 2015-11-26 - 2015-11-27
%C Paris, France
%B Proceedings of the 9th Workshop on Geographic Information Retrieval
%E Purves, Ross S.; Jones, Christopher B.
%P 1 - 3
%Z sequence number: 2
%I ACM
%@ 978-1-4503-3937-7

Conference paper

A. Grycner, G. Weikum, J. Pujara, J. Foulds, and L. Getoor

“RELLY: Inferring Hypernym Relationships Between Relational Phrases,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 2015.

mehr

BibTeX

@inproceedings{grycner-EtAl:2015:EMNLP,
TITLE = {{RELLY}: {I}nferring Hypernym Relationships Between Relational Phrases},
AUTHOR = {Grycner, Adam and Weikum, Gerhard and Pujara, Jay and Foulds, James and Getoor, Lise},
LANGUAGE = {eng},
ISBN = {978-1-941643-32-7},
URL = {http://aclweb.org/anthology/D15-1113},
PUBLISHER = {ACL},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)},
PAGES = {971--981},
ADDRESS = {Lisbon, Portugal},
}

Endnote

%0 Conference Proceedings
%A Grycner, Adam
%A Weikum, Gerhard
%A Pujara, Jay
%A Foulds, James
%A Getoor, Lise
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T RELLY: Inferring Hypernym Relationships Between Relational Phrases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-49B0-5
%U http://aclweb.org/anthology/D15-1113
%D 2015
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2015-09-17 - 2015-09-21
%C Lisbon, Portugal
%B Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

%P 971 - 981
%I ACL
%@ 978-1-941643-32-7
%U https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/EMNLP/pdf/EMNLP113.pdf

Conference paper

D. Gupta and K. Berberich

“Temporal Query Classification at Different Granularities,” in String Processing and Information Retrieval (SPIRE 2015), London, UK, 2015.

mehr

BibTeX

@inproceedings{spire15-gupta,
TITLE = {Temporal Query Classification at Different Granularities},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-23825-8},
DOI = {10.1007/978-3-319-23826-5_16},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {String Processing and Information Retrieval (SPIRE 2015)},
EDITOR = {Iliopoulos, Costas S. and Publisi, Simon J. and Yilmaz, Emine},
PAGES = {137--148},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9309},
ADDRESS = {London, UK},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Temporal Query Classification at Different Granularities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-4249-D
%R 10.1007/978-3-319-23826-5_16
%D 2015
%B 22nd International Symposium on String Processing and Information Retrieval
%Z date of event: 2015-08-31 - 2015-09-02
%C London, UK
%B String Processing and Information Retrieval
%E Iliopoulos, Costas S.; Publisi, Simon J.; Yilmaz, Emine
%P 137 - 148
%I Springer
%@ 978-3-319-23825-8
%B Lecture Notes in Computer Science
%N 9309

Thesis

D5IMPR-CS

C. D. Hariman

“Part-Whole Commonsense Knowledge Harvesting from the Web,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{HarimanMaster2015,
TITLE = {Part-Whole Commonsense Knowledge Harvesting from the Web},
AUTHOR = {Hariman, Charles Darwis},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Hariman, Charles Darwis
%Y Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Part-Whole Commonsense Knowledge Harvesting from the Web : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0026-C0E6-C
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 53 p.
%V master
%9 master

Thesis

D5IMPR-CS

J. Hoffart

“Discovering and Disambiguating Named Entities in Text,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@phdthesis{Hoffartthesis,
TITLE = {Discovering and Disambiguating Named Entities in Text},
AUTHOR = {Hoffart, Johannes},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-60226},
DOI = {10.22028/D291-25418},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Hoffart, Johannes
%Y Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering and Disambiguating Named Entities in Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-6C44-0
%R 10.22028/D291-25418
%U urn:nbn:de:bsz:291-scidok-60226
%F OTHER: hdl:20.500.11880/25474
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P X, 103 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2015/6022/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

J. Hoffart, N. Preda, F. M. Suchanek, and G. Weikum

“Knowledge Bases for Web Content Analytics,” in WWW’15 Companion, Florence, Italy, 2015.

mehr

BibTeX

@inproceedings{hoffart2015knowledgebases,
TITLE = {Knowledge Bases for {Web} Content Analytics},
AUTHOR = {Hoffart, Johannes and Preda, Nicoleta and Suchanek, Fabian M. and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-3473-0},
DOI = {10.1145/2740908.2741984},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {WWW'15 Companion},
PAGES = {1535--1535},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%A Preda, Nicoleta
%A Suchanek, Fabian M.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Bases for Web Content Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0028-8E68-7
%R 10.1145/2740908.2741984
%D 2015
%B 24th International Conference on World Wide Web 
%Z date of event: 2015-05-18 - 2015-05-22
%C Florence, Italy
%B WWW'15 Companion
%P 1535 - 1535
%I ACM
%@ 978-1-4503-3473-0

Conference paper

K. Hui and K. Berberich

“Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation,” in String Processing and Information Retrieval (SPIRE 2015), London, UK, 2015.

mehr

BibTeX

@inproceedings{spire15-kaihui,
TITLE = {Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation},
AUTHOR = {Hui, Kai and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-23825-8},
DOI = {10.1007/978-3-319-23826-5_14},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {String Processing and Information Retrieval (SPIRE 2015)},
EDITOR = {Iliopoulos, Costas S. and Publisi, Simon J. and Yilmaz, Emine},
PAGES = {137--148},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9309},
ADDRESS = {London, UK},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0028-5DAA-5
%R 10.1007/978-3-319-23826-5_14
%D 2015
%B 22nd International Symposium on String Processing and Information Retrieval
%Z date of event: 2015-08-31 - 2015-09-02
%C London, UK
%B String Processing and Information Retrieval
%E Iliopoulos, Costas S.; Publisi, Simon J.; Yilmaz, Emine
%P 137 - 148
%I Springer
%@ 978-3-319-23825-8
%B Lecture Notes in Computer Science
%N 9309

Conference paper

S. Karaev, P. Miettinen, and J. Vreeken

“Getting to Know the Unknown Unknowns: Destructive-noise Resistant Boolean Matrix Factorization,” in Proceedings of the 2015 SIAM International Conference on Data Mining (SDM 2015), Vancouver, Canada, 2015.

mehr

Abstract

BibTeX

@inproceedings{karaev15getting,
TITLE = {Getting to Know the Unknown Unknowns: {D}estructive-noise Resistant {Boolean} Matrix Factorization},
AUTHOR = {Karaev, Sanjar and Miettinen, Pauli and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-401-0},
DOI = {10.1137/1.9781611974010.37},
PUBLISHER = {SIAM},
YEAR = {2015},
DATE = {2015},
ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.},
BOOKTITLE = {Proceedings of the 2015 SIAM International Conference on Data Mining (SDM 2015)},
EDITOR = {Venkatasubramanian, Suresh and Ye, Jieping},
PAGES = {325--333},
ADDRESS = {Vancouver, Canada},
}

Endnote

%0 Conference Proceedings
%A Karaev, Sanjar
%A Miettinen, Pauli
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Getting to Know the Unknown Unknowns: Destructive-noise Resistant Boolean Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6C59-C
%R 10.1137/1.9781611974010.37
%D 2015
%B 15th SIAM International Conference on Data Mining
%Z date of event: 2015-04-30 - 2015-05-02
%C Vancouver, Canada
%X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its  performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
%B Proceedings of the 2015 SIAM International Conference on Data Mining
%E Venkatasubramanian, Suresh; Ye, Jieping
%P 325 - 333
%I SIAM
%@ 978-1-61197-401-0

Thesis

A. Kopali

“Mitigation of Privacy Risk for Search Queries,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{KopaliMSc2015,
TITLE = {Mitigation of Privacy Risk for Search Queries},
AUTHOR = {Kopali, Agim},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Kopali, Agim
%Y Weikum, Gerhard
%A referee: Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mitigation of Privacy Risk for Search Queries  : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-48CC-F
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P X, 48 p.
%V master
%9 master

Article

D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos

“Summarizing and Understanding Large Graphs,” Statistical Analysis and Data Mining, vol. 8, no. 3, pp. 183–202, 2015.

mehr

BibTeX

@article{koutra:15:vog,
TITLE = {Summarizing and Understanding Large Graphs},
AUTHOR = {Koutra, Danai and Kang, U and Vreeken, Jilles and Faloutsos, Christos},
LANGUAGE = {eng},
ISSN = {1932-1872},
DOI = {10.1002/sam.11267},
PUBLISHER = {Wiley-Blackwell},
ADDRESS = {Chichester},
YEAR = {2015},
DATE = {2015},
JOURNAL = {Statistical Analysis and Data Mining},
VOLUME = {8},
NUMBER = {3},
PAGES = {183--202},
}

Endnote

%0 Journal Article
%A Koutra, Danai
%A Kang, U
%A Vreeken, Jilles
%A Faloutsos, Christos
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Summarizing and Understanding Large Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0026-D185-2
%R 10.1002/sam.11267
%7 2015-05-18
%D 2015
%J Statistical Analysis and Data Mining
%O The ASA Data Science Journal
%V 8
%N 3
%& 183
%P 183 - 202
%I Wiley-Blackwell
%C Chichester
%@ false

Thesis

IMPR-CSD5

P. Mandros

“Information Theoretic Supervised Feature Selection for Continuous Data,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{MandrosMaster2015,
TITLE = {Information Theoretic Supervised Feature Selection for Continuous Data},
AUTHOR = {Mandros, Panagiotis},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Mandros, Panagiotis
%Y Weikum, Gerhard
%A referee: Vreeken, Jilles
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Information Theoretic Supervised Feature Selection for Continuous Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-BAF3-F
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 67 p.
%V master
%9 master

Conference paper

S. Metzger, R. Schenkel, and M. Sydow

“Aspect-based Similar Entity Search in Semantic Knowledge Graphs with Diversity-awareness and Relaxation,” in The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops (WI-IAT 2014), Warsaw, Poland, 2015.

mehr

BibTeX

@inproceedings{MetzgerIAT2014,
TITLE = {Aspect-based Similar Entity Search in Semantic Knowledge Graphs with Diversity-awareness and Relaxation},
AUTHOR = {Metzger, Steffen and Schenkel, Ralf and Sydow, Marcin},
LANGUAGE = {eng},
ISBN = {978-1-4799-4143-8},
DOI = {10.1109/WI-IAT.2014.17},
PUBLISHER = {IEEE},
YEAR = {2014},
DATE = {2015},
BOOKTITLE = {The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology -- Workshops (WI-IAT 2014)},
EDITOR = {{\'S}l{\k e}zak, Dominik and Nguyen, Hung Son and Reformat, Marek and Santos, Eugene},
PAGES = {60--69},
ADDRESS = {Warsaw, Poland},
}

Endnote

%0 Conference Proceedings
%A Metzger, Steffen
%A Schenkel, Ralf
%A Sydow, Marcin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Aspect-based Similar Entity Search in Semantic Knowledge Graphs with Diversity-awareness and Relaxation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-424D-5
%R 10.1109/WI-IAT.2014.17
%D 2015
%B IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology
%Z date of event: 2014-08-11 - 2014-08-14
%C Warsaw, Poland
%B The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops
%E &#346;l&#281;zak, Dominik; Nguyen, Hung Son; Reformat, Marek; Santos, Eugene
%P 60 - 69
%I IEEE
%@ 978-1-4799-4143-8

Article

S. Metzler and P. Miettinen

“Clustering Boolean Tensors,” Data Mining and Knowledge Discovery, vol. 29, no. 5, 2015.

mehr

BibTeX

@article{MetzlerMiettinen2015,
TITLE = {Clustering {Boolean} tensors},
AUTHOR = {Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
DOI = {10.1007/s10618-015-0420-3},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2015},
DATE = {2015},
JOURNAL = {Data Mining and Knowledge Discovery},
VOLUME = {29},
NUMBER = {5},
PAGES = {1343--1373},
}

Endnote

%0 Journal Article
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Clustering Boolean Tensors : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0028-536A-B
%R 10.1007/s10618-015-0420-3
%7 2015
%D 2015
%J Data Mining and Knowledge Discovery
%V 29
%N 5
%& 1343
%P 1343 - 1373
%I Springer
%C New York, NY

Conference paper

S. Metzler and P. Miettinen

“Join Size Estimation on Boolean Tensors of RDF Data,” in WWW’15 Companion, Florence, Italy, 2015.

mehr

BibTeX

@inproceedings{metzler15join,
TITLE = {Join Size Estimation on {Boolean} Tensors of {RDF} Data},
AUTHOR = {Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-4503-3473-0},
DOI = {10.1145/2740908.2742738},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {WWW'15 Companion},
PAGES = {77--78},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Join Size Estimation on Boolean Tensors of RDF Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-CCED-A
%R 10.1145/2740908.2742738
%D 2015
%B 24th International Conference on World Wide Web 
%Z date of event: 2015-05-18 - 2015-05-22
%C Florence, Italy
%B WWW'15 Companion
%P 77 - 78
%I ACM
%@ 978-1-4503-3473-0

Paper

S. Metzler and P. Miettinen

“Clustering Boolean Tensors,” 2015. [Online]. Available: http://arxiv.org/abs/1501.00696.

mehr

Abstract

Tensor factorizations are computationally hard problems, and in particular,

are often significantly harder than their matrix counterparts. In case of

Boolean tensor factorizations -- where the input tensor and all the factors are

required to be binary and we use Boolean algebra -- much of that hardness comes

from the possibility of overlapping components. Yet, in many applications we

are perfectly happy to partition at least one of the modes. In this paper we

investigate what consequences does this partitioning have on the computational

complexity of the Boolean tensor factorizations and present a new algorithm for

the resulting clustering problem. This algorithm can alternatively be seen as a

particularly regularized clustering algorithm that can handle extremely

high-dimensional observations. We analyse our algorithms with the goal of

maximizing the similarity and argue that this is more meaningful than

minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient

0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm

for Boolean tensor clustering achieves high scalability, high similarity, and

good generalization to unseen data with both synthetic and real-world data

sets.

BibTeX

@online{metzler15clustering:arxiv,
TITLE = {Clustering {Boolean} Tensors},
AUTHOR = {Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1501.00696},
EPRINT = {1501.00696},
EPRINTTYPE = {arXiv},
YEAR = {2015},
ABSTRACT = {Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets.},
}

Endnote

%0 Report
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Clustering Boolean Tensors : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6C5B-8
%U http://arxiv.org/abs/1501.00696
%D 2015
%X   Tensor factorizations are computationally hard problems, and in particular,
are often significantly harder than their matrix counterparts. In case of
Boolean tensor factorizations -- where the input tensor and all the factors are
required to be binary and we use Boolean algebra -- much of that hardness comes
from the possibility of overlapping components. Yet, in many applications we
are perfectly happy to partition at least one of the modes. In this paper we
investigate what consequences does this partitioning have on the computational
complexity of the Boolean tensor factorizations and present a new algorithm for
the resulting clustering problem. This algorithm can alternatively be seen as a
particularly regularized clustering algorithm that can handle extremely
high-dimensional observations. We analyse our algorithms with the goal of
maximizing the similarity and argue that this is more meaningful than
minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient
0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm
for Boolean tensor clustering achieves high scalability, high similarity, and
good generalization to unseen data with both synthetic and real-world data
sets.

%K Computer Science, Numerical Analysis, cs.NA,Computer Science, Data Structures and Algorithms, cs.DS

Paper

S. Metzler and P. Miettinen

“On Defining SPARQL with Boolean Tensor Algebra,” 2015. [Online]. Available: http://arxiv.org/abs/1503.00301.

mehr

Abstract

The Resource Description Framework (RDF) represents information as

subject-predicate-object triples. These triples are commonly interpreted as a

directed labelled graph. We propose an alternative approach, interpreting the

data as a 3-way Boolean tensor. We show how SPARQL queries - the standard

queries for RDF - can be expressed as elementary operations in Boolean algebra,

giving us a complete re-interpretation of RDF and SPARQL. We show how the

Boolean tensor interpretation allows for new optimizations and analyses of the

complexity of SPARQL queries. For example, estimating the size of the results

for different join queries becomes much simpler.

BibTeX

@online{metzler15defining:arxiv,
TITLE = {On Defining {SPARQL} with {B}oolean Tensor Algebra},
AUTHOR = {Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1503.00301},
EPRINT = {1503.00301},
EPRINTTYPE = {arXiv},
YEAR = {2015},
ABSTRACT = {The Resource Description Framework (RDF) represents information as subject-predicate-object triples. These triples are commonly interpreted as a directed labelled graph. We propose an alternative approach, interpreting the data as a 3-way Boolean tensor. We show how SPARQL queries -- the standard queries for RDF -- can be expressed as elementary operations in Boolean algebra, giving us a complete re-interpretation of RDF and SPARQL. We show how the Boolean tensor interpretation allows for new optimizations and analyses of the complexity of SPARQL queries. For example, estimating the size of the results for different join queries becomes much simpler.},
}

Endnote

%0 Report
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On Defining SPARQL with Boolean Tensor Algebra : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-054A-9
%U http://arxiv.org/abs/1503.00301
%D 2015
%8 03.03.2015
%X   The Resource Description Framework (RDF) represents information as
subject-predicate-object triples. These triples are commonly interpreted as a
directed labelled graph. We propose an alternative approach, interpreting the
data as a 3-way Boolean tensor. We show how SPARQL queries - the standard
queries for RDF - can be expressed as elementary operations in Boolean algebra,
giving us a complete re-interpretation of RDF and SPARQL. We show how the
Boolean tensor interpretation allows for new optimizations and analyses of the
complexity of SPARQL queries. For example, estimating the size of the results
for different join queries becomes much simpler.

%K Computer Science, Databases, cs.DB

Conference paper

P. Miettinen

“Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: Complexity Beyond Blocks,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), Porto, Portugal, 2015.

mehr

BibTeX

@inproceedings{MiettinenECML2015,
TITLE = {Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: {C}omplexity Beyond Blocks},
AUTHOR = {Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-3-319-23524-0},
DOI = {10.1007/978-3-319-23525-7_3},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015)},
EDITOR = {Appice, Annalisa and Pereira Rodrigues, Pedro and Gama, Jo{\~a}o and Al{\'i}pio, Jorge and Soares, Carlos},
PAGES = {36--52},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9285},
ADDRESS = {Porto, Portugal},
}

Endnote

%0 Conference Proceedings
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: Complexity Beyond Blocks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-2278-1
%R 10.1007/978-3-319-23525-7_3
%D 2015
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2015-09-07 - 2015-09-11
%C Porto, Portugal
%B Machine Learning and Knowledge Discovery in Databases
%E Appice, Annalisa; Pereira Rodrigues, Pedro; Gama, Jo&#227;o; Al&#237;pio, Jorge; Soares, Carlos
%P 36 - 52
%I Springer
%@ 978-3-319-23524-0
%B Lecture Notes in Artificial Intelligence
%N 9285

Conference paper

A. Mishra and K. Berberich

“EXPOSÉ: EXploring Past news fOr Seminal Events,” in WWW’15 Companion, Florence, Italy, 2015.

mehr

BibTeX

@inproceedings{MishraWWW2015,
TITLE = {{EXPOS{\'E}}: {EXploring Past news for Seminal Events}},
AUTHOR = {Mishra, Arunav and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-3473-0},
DOI = {10.1145/2740908.2742844},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {WWW'15 Companion},
PAGES = {223--226},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T EXPOS&#201;: EXploring Past news fOr Seminal Events : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-E33E-F
%R 10.1145/2740908.2742844
%D 2015
%B 24th International Conference on World Wide Web 
%Z date of event: 2015-04-18 - 2015-04-22
%C Florence, Italy
%B WWW'15 Companion
%P 223 - 226
%I ACM
%@ 978-1-4503-3473-0

Conference paper

S. Mukherjee, H. Lamba, and G. Weikum

“Experience-aware Item Recommendation in Evolving Review Communities,” in 15th IEEE International Conference on Data Mining (ICDM 2015), Atlantic City, NJ, USA, 2015.

mehr

BibTeX

@inproceedings{mukherjee-experience-model,
TITLE = {Experience-aware Item Recommendation in Evolving Review Communities},
AUTHOR = {Mukherjee, Subhabrata and Lamba, Hemank and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4673-9503-8},
DOI = {10.1109/ICDM.2015.111},
PUBLISHER = {IEEE},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {15th IEEE International Conference on Data Mining (ICDM 2015)},
EDITOR = {Aggarwal, Charu and Zhou, Zhi-Hua and Tuzhilin, Alexander and Xiong, Hui and Wu, Xindong},
PAGES = {925--930},
ADDRESS = {Atlantic City, NJ, USA},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Lamba, Hemank
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Experience-aware Item Recommendation in Evolving Review Communities  : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-49F3-F
%R 10.1109/ICDM.2015.111
%D 2015
%B 15th International Conference on Data Mining
%Z date of event: 2015-11-14 - 2015-11-17
%C Atlantic City, NJ, USA
%B 15th IEEE International Conference on Data Mining 
%E Aggarwal, Charu; Zhou, Zhi-Hua; Tuzhilin, Alexander; Xiong, Hui; Wu, Xindong
%P 925 - 930
%I IEEE
%@ 978-1-4673-9503-8

Conference paper

S. Mukherjee and G. Weikum

“Leveraging Joint Interactions for Credibility Analysis in News Communities,” in CIKM ’15, 24th ACM International Conference on Information and Knowledge Management, Melbourne, Australia, 2015.

mehr

BibTeX

@inproceedings{mukherjee-credibility-analysis,
TITLE = {Leveraging Joint Interactions for Credibility Analysis in News Communities},
AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-3794-6},
DOI = {10.1145/2806416.2806537},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {CIKM '15, 24th ACM International Conference on Information and Knowledge Management},
PAGES = {353--362},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Leveraging Joint Interactions for Credibility Analysis in News Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-49DE-1
%R 10.1145/2806416.2806537
%D 2015
%B 24th ACM International Conference on Information and Knowledge
Management
%Z date of event: 2015-10-19 - 2015-10-23
%C Melbourne, Australia
%B CIKM '15
%P 353 - 362
%I ACM
%@ 978-1-4503-3794-6

Thesis

S. Neumann

“On Some Problems of Rounding Rank,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{NeumannMaster2015,
TITLE = {On Some Problems of Rounding Rank},
AUTHOR = {Neumann, Stefan},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Neumann, Stefan
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On Some Problems of Rounding Rank : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-57D6-2
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P X, 77 p.
%V master
%9 master

Conference paper

H.-V. Nguyen and J. Vreeken

“Non-parametric Jensen-Shannon Divergence,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), Porto, Portugal, 2015.

mehr

BibTeX

@inproceedings{NguyenECML2015,
TITLE = {Non-parametric {Jensen}-{Shannon} Divergence},
AUTHOR = {Nguyen, Hoang-Vu and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-319-23524-0},
DOI = {10.1007/978-3-319-23525-7_11},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015)},
EDITOR = {Appice, Annalisa and Pereira Rodrigues, Pedro and Gama, Jo{\~a}o and Al{\'i}pio, Jorge and Soares, Carlos},
PAGES = {173--189},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {9285},
ADDRESS = {Porto, Portugal},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Hoang-Vu
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Non-parametric Jensen-Shannon Divergence : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-2286-3
%R 10.1007/978-3-319-23525-7_11
%D 2015
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2015-09-07 - 2015-09-11
%C Porto, Portugal
%B Machine Learning and Knowledge Discovery in Databases
%E Appice, Annalisa; Pereira Rodrigues, Pedro; Gama, Jo&#227;o; Al&#237;pio, Jorge; Soares, Carlos
%P 173 - 189
%I Springer
%@ 978-3-319-23524-0
%B Lecture Notes in Artificial Intelligence
%N 9285

Conference paper

F. Petroni, L. Del Corro, and R. Gemulla

“CORE: Context-Aware Open Relation Extraction with Factorization Machines,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 2015.

mehr

BibTeX

@inproceedings{conf/emnlp/PetroniCG15,
TITLE = {{CORE}: {C}ontext-Aware {O}pen {R}elation {E}xtraction with Factorization Machines},
AUTHOR = {Petroni, Fabio and Del Corro, Luciano and Gemulla, Rainer},
LANGUAGE = {eng},
ISBN = {978-1-941643-32-7},
URL = {http://aclweb.org/anthology/D15-1204},
PUBLISHER = {ACL},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)},
PAGES = {1763--1773},
ADDRESS = {Lisbon, Portugal},
}

Endnote

%0 Conference Proceedings
%A Petroni, Fabio
%A Del Corro, Luciano
%A Gemulla, Rainer
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T CORE: Context-Aware Open Relation Extraction with Factorization Machines : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-4112-5
%U http://aclweb.org/anthology/D15-1204
%D 2015
%B Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2015-09-17 - 2015-09-21
%C Lisbon, Portugal
%B Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

%P 1763 - 1773
%I ACL
%@ 978-1-941643-32-7
%U http://aclweb.org/anthology/D/D15/D15-1204.pdf

Conference paper

R. Pienta, Z. Lin, M. Kahng, J. Vreeken, P. P. Talukdar, J. Abello, G. Parameswaran, and D. H. Chau

“AdaptiveNav: Adaptive Discovery of Interesting and Surprising Nodes in Large Graphs,” in IEEE VIS 2015, Chicago, IL, USA, 2015.

mehr

BibTeX

@inproceedings{pienta:15:adaptivenav,
TITLE = {{AdaptiveNav}: {A}daptive Discovery of Interesting and Surprising Nodes in Large Graphs},
AUTHOR = {Pienta, Robert and Lin, Zhiyuan and Kahng, Minsuk and Vreeken, Jilles and Talukdar, Partha P. and Abello, James and Parameswaran, Ganesh and Chau, Duen Horng},
LANGUAGE = {eng},
YEAR = {2015},
BOOKTITLE = {IEEE VIS 2015},
ADDRESS = {Chicago, IL, USA},
}

Endnote

%0 Conference Proceedings
%A Pienta, Robert
%A Lin, Zhiyuan
%A Kahng, Minsuk
%A Vreeken, Jilles
%A Talukdar, Partha P.
%A Abello, James
%A Parameswaran, Ganesh
%A Chau, Duen Horng
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T AdaptiveNav: Adaptive Discovery of Interesting and Surprising Nodes in Large Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-57B4-E
%D 2015
%B IEEE VIS 2015
%Z date of event: 2015-10-25 - 2015-10-30
%C Chicago, IL, USA
%B IEEE VIS 2015

Conference paper

N. Prytkova, M. Spaniol, and G. Weikum

“Aligning Multi-cultural Knowledge Taxonomies by Combinatorial Optimization,” in WWW’15 Companion, Florence, Italy, 2015.

mehr

BibTeX

@inproceedings{PSWe15,
TITLE = {Aligning Multi-cultural Knowledge Taxonomies by Combinatorial Optimization},
AUTHOR = {Prytkova, Natalia and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-3473-0},
DOI = {10.1145/2740908.2742721},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {WWW'15 Companion},
PAGES = {93--94},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Prytkova, Natalia
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Aligning Multi-cultural Knowledge Taxonomies by Combinatorial Optimization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-06E5-3
%R 10.1145/2740908.2742721
%D 2015
%B 24th International Conference on World Wide Web
%Z date of event: 2015-05-18 - 2015-05-22
%C Florence, Italy
%B WWW'15 Companion
%P 93 - 94
%I ACM
%@ 978-1-4503-3473-0

Conference paper

D2D5

A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele

“A Dataset for Movie Description,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 2015.

mehr

BibTeX

@inproceedings{Rohrbach15cvpr,
TITLE = {A Dataset for Movie Description},
AUTHOR = {Rohrbach, Anna and Rohrbach, Marcus and Tandon, Niket and Schiele, Bernt},
LANGUAGE = {eng},
DOI = {10.1109/CVPR.2015.7298940},
PUBLISHER = {IEEE},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)},
PAGES = {3202--3212},
ADDRESS = {Boston, MA, USA},
}

Endnote

%0 Conference Proceedings
%A Rohrbach, Anna
%A Rohrbach, Marcus
%A Tandon, Niket
%A Schiele, Bernt
%+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T A Dataset for Movie Description : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-01B9-B
%R 10.1109/CVPR.2015.7298940
%D 2015
%B IEEE Conference on Computer Vision and Pattern Recognition
%Z date of event: 2015-06-08 - 2015-06-10
%C Boston, MA, USA
%B IEEE Conference on Computer Vision and Pattern Recognition
%P 3202 - 3212
%I IEEE

Conference paper

C. Schulte, B. Taneva, and G. Weikum

“On-topic Cover Stories from News Archives,” in Advances in Information Retrieval (ECIR 2015), Vienna, Austria, 2015.

mehr

BibTeX

@inproceedings{Schulte:ECIR2015,
TITLE = {On-topic Cover Stories from News Archives},
AUTHOR = {Schulte, Christian and Taneva, Bilyana and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-16353-6},
DOI = {10.1007/978-3-319-16354-3_4},
PUBLISHER = {Springer},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Advances in Information Retrieval (ECIR 2015)},
EDITOR = {Hanbury, Allan and Kazai, Gabriella and Rauber, Andreas and Fuhr, Norbert},
PAGES = {37--42},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {9022},
ADDRESS = {Vienna, Austria},
}

Endnote

%0 Conference Proceedings
%A Schulte, Christian
%A Taneva, Bilyana
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On-topic Cover Stories from News Archives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-A6DE-B
%R 10.1007/978-3-319-16354-3_4
%D 2015
%B 37th European Conference on Information Retrieval
%Z date of event: 2015-03-29 - 2015-04-02
%C Vienna, Austria
%B Advances in Information Retrieval
%E Hanbury, Allan; Kazai, Gabriella; Rauber, Andreas; Fuhr, Norbert
%P 37 - 42
%I Springer
%@ 978-3-319-16353-6
%B Lecture Notes in Computer Science
%N 9022

Thesis

D5IMPR-CSRG1

S. Seufert

“Algorithmic Building Blocks for Relationship Analysis over Large Graphs,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@phdthesis{Seufertphd15,
TITLE = {Algorithmic Building Blocks for Relationship Analysis over Large Graphs},
AUTHOR = {Seufert, Stephan},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-61833},
DOI = {10.22028/D291-25420},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Seufert, Stephan
%Y Bedathur, Srikanta
%A referee: Barbosa, Denilson
%A referee: Weidenbach, Christoph
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Automation of Logic, MPI for Informatics, Max Planck Society
%T Algorithmic Building Blocks for Relationship Analysis over Large Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-6E65-D
%R 10.22028/D291-25420
%U urn:nbn:de:bsz:291-scidok-61833
%F OTHER: hdl:20.500.11880/25476
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 198 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2015/6183/http://scidok.sulb.uni-saarland.de/doku/urheberrecht.php?la=de

Conference paper

D. Seyler, M. Yahya, and K. Berberich

“Generating Quiz Questions from Knowledge Graphs,” in WWW’15 Companion, Florence, Italy, 2015.

mehr

BibTeX

@inproceedings{SeylerWWW2015,
TITLE = {Generating Quiz Questions from Knowledge Graphs},
AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-3473-0},
DOI = {10.1145/2740908.2742722},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {WWW'15 Companion},
PAGES = {113--114},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Seyler, Dominic
%A Yahya, Mohamed
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Generating Quiz Questions from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-E33C-4
%R 10.1145/2740908.2742722
%D 2015
%B 24th International Conference on World Wide Web 
%Z date of event: 2015-04-18 - 2015-04-22
%C Florence, Italy
%B WWW'15 Companion
%P 113 - 114
%I ACM
%@ 978-1-4503-3473-0

Thesis

D. Seyler

“Question Generation from Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{SeylerMaster2015,
TITLE = {Question Generation from Knowledge Graphs},
AUTHOR = {Seyler, Dominic},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Seyler, Dominic
%Y Berberich, Klaus
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Question Generation from Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-08B0-4
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P XII, 104 p.
%V master
%9 master

Conference paper

E. Shutova, N. Tandon, and G. de Melo

“Perceptually Grounded Selectional Preferences,” in The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL 2015), Beijing, China, 2015.

mehr

BibTeX

@inproceedings{ShutovaTandonDemelo:ACL2015,
TITLE = {Perceptually Grounded Selectional Preferences},
AUTHOR = {Shutova, Ekaterina and Tandon, Niket and de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-1-941643-72-3},
URL = {http://www.aclweb.org/anthology/P15-1092},
PUBLISHER = {ACL},
YEAR = {2015},
BOOKTITLE = {The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL 2015)},
PAGES = {950--960},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Shutova, Ekaterina
%A Tandon, Niket
%A de Melo, Gerard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Perceptually Grounded Selectional Preferences : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-54B8-9
%U http://www.aclweb.org/anthology/P15-1092
%D 2015
%Z Review method: peer-reviewed
%B 53rd Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2015-07-26 - 2015-07-31
%C Beijing, China
%B The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
%P 950 - 960
%I ACL
%@ 978-1-941643-72-3
%U http://www.aclweb.org/anthology/P/P15/P15-1092.pdf

Thesis

A. Sierra

“Ad-hoc Information Retrieval using Annotated Queries and Documents,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{SierraMaster2015,
TITLE = {Ad-hoc Information Retrieval using Annotated Queries and Documents},
AUTHOR = {Sierra, Alejandro},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Sierra, Alejandro
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Ad-hoc Information Retrieval using Annotated Queries and Documents : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-A968-D
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 68 p.
%V master
%9 master

Conference paper

J. Singh, A. Anand, V. Setty, and A. Anand

“Exploring Long Running News Stories using Wikipedia,” in Proceedings of the 2015 ACM Web Science Conference, Oxford, UK, 2015.

mehr

BibTeX

@inproceedings{DBLP:conf/websci/SinghASA15,
TITLE = {Exploring Long Running News Stories using {Wikipedia}},
AUTHOR = {Singh, Jaspreet and Anand, Abhijit and Setty, Vinay and Anand, Avishek},
LANGUAGE = {eng},
ISBN = {978-1-4503-3672-7},
DOI = {10.1145/2786451.2786489},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the 2015 ACM Web Science Conference},
EID = {57},
ADDRESS = {Oxford, UK},
}

Endnote

%0 Conference Proceedings
%A Singh, Jaspreet
%A Anand, Abhijit
%A Setty, Vinay
%A Anand, Avishek
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Exploring Long Running News Stories using Wikipedia : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1CB3-4
%R 10.1145/2786451.2786489
%D 2015
%B ACM Web Science Conference
%Z date of event: 2015-06-28 - 2015-07-01
%C Oxford, UK
%B Proceedings of the 2015 ACM Web Science Conference
%Z sequence number: 57
%I ACM
%@ 978-1-4503-3672-7

Conference paper

A. Siu and G. Weikum

“Semantic Type Classification of Common Words in Biomedical Noun Phrases,” in Workshop on Biomedical Natural Language Processing (BioNLP 2015), Beijing, China, 2015.

mehr

BibTeX

@inproceedings{Siu15,
TITLE = {Semantic Type Classification of Common Words in Biomedical Noun Phrases},
AUTHOR = {Siu, Amy and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-5108-0943-7},
PUBLISHER = {ACL},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Workshop on Biomedical Natural Language Processing (BioNLP 2015)},
PAGES = {98--103},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Siu, Amy
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Semantic Type Classification of Common Words in Biomedical Noun Phrases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-2042-0
%D 2015
%B Workshop on Biomedical Natural Language Processing 
%Z date of event: 2015-07-30 - 2015-07-30
%C Beijing, China
%B Workshop on Biomedical Natural Language Processing 
%P 98 - 103
%I ACL
%@ 978-1-5108-0943-7

Thesis

M. Srinivasamurthy

“Mining European Statistics for Social Events,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{SrinivasamurthyMSc2015,
TITLE = {Mining European Statistics for Social Events},
AUTHOR = {Srinivasamurthy, Mena},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Srinivasamurthy, Mena
%Y Weikum, Gerhard
%A referee: Spaniol, Marc
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mining European Statistics for Social Events : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-43AB-1
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 52 p.
%V master
%9 master

Conference paper

S. Sundareisan, J. Vreeken, and B. A. Prakash

“Hidden Hazards: Finding Missing Nodes in Large Graph Epidemics,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2015), Vancouver, Canada, 2015.

mehr

BibTeX

@inproceedings{sundareisan:15:netfill,
TITLE = {Hidden Hazards: {Finding} Missing Nodes in Large Graph Epidemics},
AUTHOR = {Sundareisan, Shashi and Vreeken, Jilles and Prakash, B. Aditya},
LANGUAGE = {eng},
ISBN = {978-1-61197-401-0},
DOI = {10.1137/1.9781611974010.47},
PUBLISHER = {SIAM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2015)},
EDITOR = {Venkatasubramanian, Suresh and Ye, Jieping},
PAGES = {415--423},
ADDRESS = {Vancouver, Canada},
}

Endnote

%0 Conference Proceedings
%A Sundareisan, Shashi
%A Vreeken, Jilles
%A Prakash, B. Aditya
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Hidden Hazards: Finding Missing Nodes in Large Graph Epidemics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-A82A-2
%R 10.1137/1.9781611974010.47
%D 2015
%B 15th SIAM International Conference on Data Mining
%Z date of event: 2015-04-30 - 2015-05-02
%C Vancouver, Canada
%B Proceedings of the SIAM International Conference on Data Mining
%E Venkatasubramanian, Suresh; Ye, Jieping
%P 415 - 423
%I SIAM
%@  978-1-61197-401-0

Conference paper

N. Tandon, G. de Melo, A. De, and G. Weikum

“Knowlywood: Mining Activity Knowledge From Hollywood Narratives,” in CIKM’ 15, 24th ACM International Conference on Information and Knowledge Management, Melbourne, Australia, 2015, pp. 223–232.

mehr

BibTeX

@inproceedings{Tandon:2015:KMA:2806416.2806583,
TITLE = {Knowlywood: {M}ining Activity Knowledge From {H}ollywood Narratives},
AUTHOR = {Tandon, Niket and de Melo, Gerard and De, Abir and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-3794-6},
DOI = {10.1145/2806416.2806583},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {CIKM' 15, 24th ACM International Conference on Information and Knowledge Management},
PAGES = {223--232},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Tandon, Niket
%A de Melo, Gerard
%A De, Abir
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowlywood: Mining Activity Knowledge From Hollywood Narratives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-49E0-A
%R 10.1145/2806416.2806583
%D 2015
%B 24th ACM International Conference on Information and Knowledge
Management
%Z date of event: 2015-10-19 - 2015-10-23
%C Melbourne, Australia
%B CIKM' 15
%P 223 - 232
%I ACM
%@ 978-1-4503-3794-6

Conference paper

N. Tandon, G. de Melo, A. De, and G. Weikum

“Lights, Camera, Action: Knowledge Extraction from Movie Scripts,” in WWW’15 Companion, Florence, Italy, 2015.

mehr

BibTeX

@inproceedings{tandon2015moviescripts,
TITLE = {Lights, Camera, Action: Knowledge Extraction from Movie Scripts},
AUTHOR = {Tandon, Niket and de Melo, Gerard and De, Abir and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-3473-0},
DOI = {10.1145/2740908.2742756},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {WWW'15 Companion},
PAGES = {127--128},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Tandon, Niket
%A de Melo, Gerard
%A De, Abir
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Lights, Camera, Action: Knowledge Extraction from Movie Scripts : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-E32D-6
%R 10.1145/2740908.2742756
%D 2015
%B 24th International Conference on World Wide Web 
%Z date of event: 2015-05-18 - 2015-05-22
%C Florence, Italy
%B WWW'15 Companion
%P 127 - 128
%I ACM
%@ 978-1-4503-3473-0

Conference paper

C. Teflioudi, R. Gemulla, and O. Mykytiuk

“LEMP: Fast Retrieval of Large Entries in a Matrix Product,” in SIGMOD’15, ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 2015.

mehr

BibTeX

@inproceedings{Teflioudi15,
TITLE = {{LEMP}: {F}ast Retrieval of Large Entries in a Matrix Product},
AUTHOR = {Teflioudi, Christina and Gemulla, Rainer and Mykytiuk, Olga},
LANGUAGE = {eng},
ISBN = {978-1-4503-2758-9},
DOI = {10.1145/2723372.2747647},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {SIGMOD'15, ACM SIGMOD International Conference on Management of Data},
PAGES = {107--122},
ADDRESS = {Melbourne, Australia},
}

Endnote

%0 Conference Proceedings
%A Teflioudi, Christina
%A Gemulla, Rainer
%A Mykytiuk, Olga
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T LEMP: Fast Retrieval of Large Entries in a Matrix Product : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-4A1C-F
%R 10.1145/2723372.2747647
%D 2015
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2015-05-31 - 2015-06-04
%C Melbourne, Australia
%B SIGMOD'15
%P 107 - 122
%I ACM
%@ 978-1-4503-2758-9

Conference paper

C. Tryfonopoulos, P. Raftopoulou, V. Setty, and A. Xiros

“Towards Content-Based Publish/Subscribe for Distributed Social Networks,” in DEBS’15, 9th ACM International Conference on Distributed Event-Based Systems, Oslo, Norway, 2015.

mehr

BibTeX

@inproceedings{DBLP:conf/debs/TryfonopoulosRS15,
TITLE = {Towards Content-Based Publish/Subscribe for Distributed Social Networks},
AUTHOR = {Tryfonopoulos, Christos and Raftopoulou, Paraskevi and Setty, Vinay and Xiros, Argiris},
LANGUAGE = {eng},
ISBN = {978-1-4503-3286-6},
DOI = {10.1145/2675743.2776770},
PUBLISHER = {ACM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {DEBS'15, 9th ACM International Conference on Distributed Event-Based Systems},
PAGES = {340--343},
ADDRESS = {Oslo, Norway},
}

Endnote

%0 Conference Proceedings
%A Tryfonopoulos, Christos
%A Raftopoulou, Paraskevi
%A Setty, Vinay
%A Xiros, Argiris
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Towards Content-Based Publish/Subscribe for Distributed Social Networks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-002C-1CB1-8
%R 10.1145/2675743.2776770
%D 2015
%B 9th ACM International Conference on Distributed Event-Based Systems  
%Z date of event: 2015-06-29 - 2015-07-03
%C Oslo, Norway
%B DEBS'15
%P 340 - 343
%I ACM
%@ 978-1-4503-3286-6

Thesis

D5IMPR-CS

T. Tylenda

“Methods and Tools for Summarization of Entities and Facts in Knowledge Bases,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@phdthesis{TylendaPhd15,
TITLE = {Methods and Tools for Summarization of Entities and Facts in Knowledge Bases},
AUTHOR = {Tylenda, Tomasz},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-62630},
DOI = {10.22028/D291-26620},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Tylenda, Tomasz
%Y Weikum, Gerhard
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Methods and Tools for Summarization of Entities and Facts in Knowledge Bases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0028-FC65-5
%R 10.22028/D291-26620
%U urn:nbn:de:bsz:291-scidok-62630
%F OTHER: hdl:20.500.11880/26676
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P 113 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2015/6263/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

J. Vreeken

“Causal Inference by Direction of Information,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2015), Vancouver, Canada, 2015.

mehr

BibTeX

@inproceedings{vreeken:15:ergo,
TITLE = {Causal Inference by Direction of Information},
AUTHOR = {Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-1-61197-401-0},
DOI = {10.1137/1.9781611974010.102},
PUBLISHER = {SIAM},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2015)},
EDITOR = {Venkatasubramanian, Suresh and Ye, Jieping},
PAGES = {909--917},
ADDRESS = {Vancouver, Canada},
}

Endnote

%0 Conference Proceedings
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Causal Inference by Direction of Information : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-A82C-D
%R 10.1137/1.9781611974010.102
%D 2015
%B 15th SIAM International Conference on Data Mining
%Z date of event: 2015-04-30 - 2015-05-02
%C Vancouver, Canada
%B Proceedings of the SIAM International Conference on Data Mining
%E Venkatasubramanian, Suresh; Ye, Jieping
%P 909 - 917
%I SIAM
%@ 978-1-61197-401-0

Thesis

H. Wang

“Retrospective Summarization: What Did I Miss?,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{WangMaster2015,
TITLE = {Retrospective Summarization: What Did I Miss?},
AUTHOR = {Wang, He},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Wang, He
%Y Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Retrospective Summarization: What Did I Miss? : 
%U http://hdl.handle.net/11858/00-001M-0000-0026-A0B4-B
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P XVI, 73 p.
%V master
%9 master

Thesis

D5IMPR-CS

M. A. Yosef

“U-AIDA: A Customizable System for Named Entity Recognition, Classification, and Disambiguation,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@phdthesis{Yosefphd15,
TITLE = {U-{AIDA}: A Customizable System for Named Entity Recognition, Classification, and Disambiguation},
AUTHOR = {Yosef, Mohamed Amir},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-63703},
DOI = {10.22028/D291-25426},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Yosef, Mohamed Amir
%Y Weikum, Gerhard
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T U-AIDA: A Customizable System for Named Entity Recognition, Classification, and Disambiguation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-B9B9-C
%R 10.22028/D291-25426
%U urn:nbn:de:bsz:291-scidok-63703
%F OTHER: hdl:20.500.11880/25482
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P XV, 101 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2016/6370/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Article

A. Zimek and J. Vreeken

“The Blind Men and the Elephant: On Meeting the Problem of Multiple Truths in Data from Clustering and Pattern Mining Perspectives,” Machine Learning, vol. 98, no. 1, 2015.

mehr

BibTeX

@article{zimek:15:blind,
TITLE = {The Blind Men and the Elephant: On Meeting the Problem of Multiple Truths in Data from Clustering and Pattern Mining Perspectives},
AUTHOR = {Zimek, Arthur and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {0885-6125},
DOI = {10.1007/s10994-013-5334-y},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2015},
DATE = {2015},
JOURNAL = {Machine Learning},
VOLUME = {98},
NUMBER = {1},
PAGES = {121--155},
}

Endnote

%0 Journal Article
%A Zimek, Arthur
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T The Blind Men and the Elephant: On Meeting the Problem of Multiple Truths in Data from Clustering and Pattern Mining Perspectives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-57AE-D
%R 10.1007/s10994-013-5334-y
%7 2013-03-07
%D 2015
%J Machine Learning
%V 98
%N 1
%& 121
%P 121 - 155
%I Springer
%C New York, NY
%@ false

Thesis

D5IMPR-CS

T. Zinchenko

“Redescription Mining Over non-Binary Data Sets Using Decision Trees,” Universität des Saarlandes, Saarbrücken, 2015.

mehr

BibTeX

@mastersthesis{ZinchenkoMaster2014,
TITLE = {Redescription Mining Over non-Binary Data Sets Using Decision Trees},
AUTHOR = {Zinchenko, Tetiana},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2015},
DATE = {2015},
}

Endnote

%0 Thesis
%A Zinchenko, Tetiana
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Redescription Mining Over non-Binary Data Sets Using Decision Trees : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-B73A-5
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2015
%P X, 118 p.
%V master
%9 master

Conference paper

T. Zinchenko, E. Galbrun, and P. Miettinen

“Mining Predictive Redescriptions with Trees,” in 15th IEEE International Conference on Data Mining Workshop (ICDMW 2015), Atlantic City, NJ, USA, 2015.

mehr

BibTeX

@inproceedings{zinchenko15mining,
TITLE = {Mining Predictive Redescriptions with Trees},
AUTHOR = {Zinchenko, Tetiana and Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-4673-8492-6},
DOI = {10.1109/ICDMW.2015.123},
PUBLISHER = {IEEE},
YEAR = {2015},
DATE = {2015},
BOOKTITLE = {15th IEEE International Conference on Data Mining Workshop (ICDMW 2015)},
EDITOR = {Cui, Peng and Dy, Jennifer and Aggarwal, Charu and Zhou, Zhi-Hua and Tuzhilin, Alexander and Xiong, Hui and Wu, Xindong},
PAGES = {1672--1675},
ADDRESS = {Atlantic City, NJ, USA},
}

Endnote

%0 Conference Proceedings
%A Zinchenko, Tetiana
%A Galbrun, Esther
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mining Predictive Redescriptions with Trees  : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0029-5424-A
%R 10.1109/ICDMW.2015.123
%D 2015
%B 15th International Conference on Data Mining
%Z date of event: 2015-11-14 - 2015-11-17
%C Atlantic City, NJ, USA
%B 15th IEEE International Conference on Data Mining Workshop 
%E Cui, Peng; Dy, Jennifer; Aggarwal, Charu; Zhou, Zhi-Hua; Tuzhilin, Alexander; Xiong, Hui; Wu, Xindong
%P 1672 - 1675
%I IEEE
%@ 978-1-4673-8492-6

2014

Conference paper

F. Alvanaki and S. Michel

“Tracking Set Correlations at Large Scale,” in SIGMOD’14, ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 2014.

mehr

BibTeX

@inproceedings{Alvanaki2014,
TITLE = {Tracking Set Correlations at Large Scale},
AUTHOR = {Alvanaki, Foteini and Michel, Sebastian},
LANGUAGE = {eng},
ISBN = {978-1-4503-2376-5},
DOI = {10.1145/2588555.2610510},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {SIGMOD'14, ACM SIGMOD International Conference on Management of Data},
EDITOR = {Dyresson, Curtis and Li, Feifei and {\"O}zsu, M. Tamer},
PAGES = {1507--1518},
ADDRESS = {Snowbird, UT, USA},
}

Endnote

%0 Conference Proceedings
%A Alvanaki, Foteini
%A Michel, Sebastian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Tracking Set Correlations at Large Scale : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0019-8423-2
%R 10.1145/2588555.2610510
%D 2014
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2014-06-22 - 2014-06-27
%C Snowbird, UT, USA
%B SIGMOD'14
%E Dyresson, Curtis; Li, Feifei; &#214;zsu, M. Tamer
%P 1507 - 1518
%I ACM
%@ 978-1-4503-2376-5

Thesis

D5IMPR-CS

F. Alvanaki

“Mining Interesting Events on Large and Dynamic Data,” Universität des Saarlandes, Saarbrücken, 2014.

mehr

BibTeX

@phdthesis{Alvanakithesis,
TITLE = {Mining Interesting Events on Large and Dynamic Data},
AUTHOR = {Alvanaki, Foteini},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-59857},
DOI = {10.22028/D291-26593},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2014},
DATE = {2014},
}

Endnote

%0 Thesis
%A Alvanaki, Foteini
%Y Michel, Sebastian
%A referee: Weikum, Gerhard
%A referee: Delis, Alexis
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Mining Interesting Events on Large and Dynamic Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-6C4E-B
%R 10.22028/D291-26593
%U urn:nbn:de:bsz:291-scidok-59857
%F OTHER: hdl:20.500.11880/26649
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2014
%P 128 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2015/5985/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

A. Anand, I. Mele, S. Bedathur, and K. Berberich

“Phrase Query Optimization on Inverted Indexes,” in CIKM’14, 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{Anand:CIKM2014,
TITLE = {Phrase Query Optimization on Inverted Indexes},
AUTHOR = {Anand, Avishek and Mele, Ida and Bedathur, Srikanta and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-2598-1},
DOI = {10.1145/2661829.2661928},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {CIKM'14, 23rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Li, Jianzhong and Wang, X. Sean and Garofalakis, Minos and Soboroff, Ian and Suel, Torsten and Wang, Min},
PAGES = {1807--1810},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Anand, Avishek
%A Mele, Ida
%A Bedathur, Srikanta
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Phrase Query Optimization on Inverted Indexes : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-549A-0
%R 10.1145/2661829.2661928
%D 2014
%B 23rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-07
%C Shanghai, China
%K multi-word indexing, phrase queries, query optimization
%B CIKM'14
%E Li, Jianzhong; Wang, X. Sean; Garofalakis, Minos; Soboroff, Ian; Suel, Torsten; Wang, Min
%P 1807 - 1810
%I ACM
%@ 978-1-4503-2598-1

Report

A. Anand, I. Mele, S. Bedathur, and K. Berberich

“Phrase Query Optimization on Inverted Indexes,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2014-5-002, 2014.

mehr

Abstract

Phrase queries are a key functionality of modern search engines. Beyond that, they increasingly serve as an important building block for applications such as entity-oriented search, text analytics, and plagiarism detection. Processing phrase queries is costly, though, since positional information has to be kept in the index and all words, including stopwords, need to be considered.

We consider an augmented inverted index that indexes selected variable-length multi-word sequences in addition to single words. We study how arbitrary phrase queries can be processed efficiently on such an augmented inverted index. We show that the underlying optimization problem is NP-hard in the general case and describe an exact exponential algorithm and an approximation algorithm to its solution. Experiments on ClueWeb09 and The New York Times with different real-world query workloads examine the practical performance of our methods.

BibTeX

@techreport{AnandMeleBedathurBerberich2014,
TITLE = {Phrase Query Optimization on Inverted Indexes},
AUTHOR = {Anand, Avishek and Mele, Ida and Bedathur, Srikanta and Berberich, Klaus},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2014-5-002},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2014},
ABSTRACT = {Phrase queries are a key functionality of modern search engines. Beyond that, they increasingly serve as an important building block for applications such as entity-oriented search, text analytics, and plagiarism detection. Processing phrase queries is costly, though, since positional information has to be kept in the index and all words, including stopwords, need to be considered. We consider an augmented inverted index that indexes selected variable-length multi-word sequences in addition to single words. We study how arbitrary phrase queries can be processed efficiently on such an augmented inverted index. We show that the underlying optimization problem is NP-hard in the general case and describe an exact exponential algorithm and an approximation algorithm to its solution. Experiments on ClueWeb09 and The New York Times with different real-world query workloads examine the practical performance of our methods.},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Anand, Avishek
%A Mele, Ida
%A Bedathur, Srikanta
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Phrase Query Optimization on Inverted Indexes :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-022A-3
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2014
%P 20 p.
%X Phrase queries are a key functionality of modern search engines. Beyond that, they increasingly serve as an important building block for applications such as entity-oriented search, text analytics, and plagiarism detection. Processing phrase queries is costly, though, since positional information has to be kept in the index and all words, including stopwords, need to be considered.
We consider an augmented inverted index that indexes selected variable-length multi-word sequences in addition to single words. We study how arbitrary phrase queries can be processed efficiently on such an augmented inverted index. We show that the underlying optimization problem is NP-hard in the general case and describe an exact exponential algorithm and an approximation algorithm to its solution. Experiments on ClueWeb09 and The New York Times with different real-world query workloads examine the practical performance of our methods.
%B Research Report
%@ false

Article

N. An, L. Jiang, J. Wang, P. Luo, M. Wang, and B. N. Li

“Toward Detection of Aliases without String Similarity,” Information Sciences, vol. 261, 2014.

mehr

BibTeX

@article{AnJiangWang2014,
TITLE = {Toward Detection of Aliases without String Similarity},
AUTHOR = {An, Ning and Jiang, Lili and Wang, Jianyong and Luo, Ping and Wang, Min and Li, Bing Nan},
LANGUAGE = {eng},
ISSN = {0020-0255},
DOI = {10.1016/j.ins.2013.11.010},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2014},
DATE = {2014},
JOURNAL = {Information Sciences},
VOLUME = {261},
PAGES = {89--100},
}

Endnote

%0 Journal Article
%A An, Ning
%A Jiang, Lili
%A Wang, Jianyong
%A Luo, Ping
%A Wang, Min
%A Li, Bing Nan
%+ external
Databases and Information Systems, MPI for Informatics, Max Planck Society
external
external
external
external
%T Toward Detection of Aliases without String Similarity : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-3DFB-8
%F ISI: 000331689700005
%R 10.1016/j.ins.2013.11.010
%7 2013-11-18
%D 2014
%J Information Sciences
%O Inf. Sci.
%V 261
%& 89
%P 89 - 100
%I Elsevier
%C Amsterdam
%@ false

Conference paper

D4D5

K. Athukorala, A. Oulasvirta, D. Glowacka, J. Vreeken, and G. Jaccuci

“Supporting Exploratory Search Through User Modelling,” in UMAP 2014 Extended Proceedings (PIA 2014 in conjunction with UMAP 2014), Aalborg, Denmark, 2014.

mehr

BibTeX

@inproceedings{atukorala:14:supporting,
TITLE = {Supporting Exploratory Search Through User Modelling},
AUTHOR = {Athukorala, Kumaripaba and Oulasvirta, Antti and Glowacka, Dorata and Vreeken, Jilles and Jaccuci, Giulio},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-1181/pia2014_paper_04.pdf; urn:nbn:de:0074-1181-4; http://ceur-ws.org/Vol-1181/pia2014_proceedings.pdf},
PUBLISHER = {CEUR-WS.org},
YEAR = {2014},
BOOKTITLE = {UMAP 2014 Extended Proceedings (PIA 2014 in conjunction with UMAP 2014)},
EDITOR = {Cantador, Iv{\'a}n and Chi, Min and Farzan, Rosta and J{\"a}schke, Robert},
PAGES = {1--47},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1181},
ADDRESS = {Aalborg, Denmark},
}

Endnote

%0 Conference Proceedings
%A Athukorala, Kumaripaba
%A Oulasvirta, Antti
%A Glowacka, Dorata
%A Vreeken, Jilles
%A Jaccuci, Giulio
%+ External Organizations
Computer Graphics, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Supporting Exploratory Search Through User Modelling : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-538C-7
%U http://ceur-ws.org/Vol-1181/pia2014_paper_04.pdf
%D 2014
%B Joint Workshop on Personalised Information Access
%Z date of event: 2014-07-07 - 2014-07-07
%C Aalborg, Denmark
%B UMAP 2014 Extended Proceedings
%E Cantador, Iv&#225;n; Chi, Min; Farzan, Rosta; J&#228;schke, Robert
%P 1 - 47
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1181
%@ false
%U http://ceur-ws.org/Vol-1181/pia2014_paper_04.pdf

Conference paper

D4D5

K. Athukorala, A. Oulasvirta, D. Glowacka, J. Vreeken, and G. Jaccuci

“Interaction Model to Predict Subjective-specificity of Search Results,” in UMAP 2014 Extended Proceedings, Aalborg, Denmark, 2014.

mehr

BibTeX

@inproceedings{atukorala:14:interaction,
TITLE = {Interaction Model to Predict Subjective-specificity of Search Results},
AUTHOR = {Athukorala, Kumaripaba and Oulasvirta, Antti and Glowacka, Dorata and Vreeken, Jilles and Jaccuci, Giulio},
LANGUAGE = {eng},
URL = {http://ceur-ws.org/Vol-1181/umap2014_lateresults_01.pdf; urn:nbn:de:0074-1181-4},
PUBLISHER = {CEUR-WS.org},
YEAR = {2014},
BOOKTITLE = {UMAP 2014 Extended Proceedings},
EDITOR = {Cantador, Iv{\'a}n and Chi, Min and Farzan, Rosta and J{\"a}schke, Robert},
PAGES = {69--74},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1181},
ADDRESS = {Aalborg, Denmark},
}

Endnote

%0 Conference Proceedings
%A Athukorala, Kumaripaba
%A Oulasvirta, Antti
%A Glowacka, Dorata
%A Vreeken, Jilles
%A Jaccuci, Giulio
%+ External Organizations
Computer Graphics, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Interaction Model to Predict Subjective-specificity of Search Results : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5397-D
%U http://ceur-ws.org/Vol-1181/umap2014_lateresults_01.pdf
%D 2014
%B 22nd Conference on User Modeling, Adaptation, and Personalization
%Z date of event: 2014-07-07 - 2014-07-11
%C Aalborg, Denmark
%B UMAP 2014 Extended Proceedings
%E Cantador, Iv&#225;n; Chi, Min; Farzan, Rosta; J&#228;schke, Robert
%P 69 - 74
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1181
%U http://ceur-ws.org/Vol-1181/umap2014_lateresults_01.pdf

Conference paper

D4D5

K. Athukorala, A. Oulasvirta, D. Glowacka, J. Vreeken, and G. Jaccuci

“Narrow or Broad? Estimating Subjective Specificity in Exploratory Search,” in CIKM’14, 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{atukorala:14:foraging,
TITLE = {Narrow or Broad? {Estimating} Subjective Specificity in Exploratory Search},
AUTHOR = {Athukorala, Kumaripaba and Oulasvirta, Antti and Glowacka, Dorata and Vreeken, Jilles and Jaccuci, Giulio},
LANGUAGE = {eng},
ISBN = {978-1-4503-2598-1},
DOI = {10.1145/2661829.2661904},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {CIKM'14, 23rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Li, Jianzhong and Wang, X. Sean and Garofalakis, Minos and Soboroff, Ian and Suel, Torsten and Wang, Min},
PAGES = {819--828},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Athukorala, Kumaripaba
%A Oulasvirta, Antti
%A Glowacka, Dorata
%A Vreeken, Jilles
%A Jaccuci, Giulio
%+ External Organizations
Computer Graphics, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Narrow or Broad? Estimating Subjective Specificity in Exploratory Search : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53A1-6
%R 10.1145/2661829.2661904
%D 2014
%B 23rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-07
%C Shanghai, China
%B CIKM'14
%E Li, Jianzhong; Wang, X. Sean; Garofalakis, Minos; Soboroff, Ian; Suel, Torsten; Wang, Min
%P 819 - 828
%I ACM
%@ 978-1-4503-2598-1

Book chapter / section

K. Berberich

“Web Archives,” in Encyclopedia of Social Network Analysis and Mining, Berlin: Springer, 2014.

mehr

BibTeX

@incollection{DBLP:reference/snam/Berberich14,
TITLE = {Web Archives},
AUTHOR = {Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4614-6169-2},
DOI = {10.1007/978-1-4614-6170-8_128},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Encyclopedia of Social Network Analysis and Mining},
PAGES = {2337--2343},
}

Endnote

%0 Book Section
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Web Archives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53C1-B
%R 10.1007/978-1-4614-6170-8_128
%D 2014
%B Encyclopedia of Social Network Analysis and Mining
%P 2337 - 2343
%I Springer
%C Berlin
%@ 978-1-4614-6169-2

Conference paper

J. Biega, I. Mele, and G. Weikum

“Probabilistic Prediction of Privacy Risks in User Search Histories,” in PSBD’14, First International Workshop on Privacy and Security of Big Data, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{Biega:PSBD2014,
TITLE = {Probabilistic Prediction of Privacy Risks in User Search Histories},
AUTHOR = {Biega, Joanna and Mele, Ida and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1583-8},
DOI = {10.1145/2663715.2669609},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {PSBD'14, First International Workshop on Privacy and Security of Big Data},
PAGES = {29--36},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Biega, Joanna
%A Mele, Ida
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Probabilistic Prediction of Privacy Risks in User Search Histories : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5486-B
%R 10.1145/2663715.2669609
%D 2014
%B First International Workshop on Privacy and Security of Big Data
%Z date of event: 2014-11-07 - 2014-11-07
%C Shanghai, China
%K privacy risk prediction, probabilistic privacy, query logs, user-centric privacy
%B PSBD'14
%P 29 - 36
%I ACM
%@ 978-1-4503-1583-8

Conference paper

R. Burghartz and K. Berberich

“MPI-INF at the NTCIR-11 Temporal Query Classification Task,” in Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan, 2014.

mehr

BibTeX

@inproceedings{burghartz2014,
TITLE = {{MPI}-{INF} at the {NTCIR}-11 Temporal Query Classification Task},
AUTHOR = {Burghartz, Robin and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-4-86049-065-2},
PUBLISHER = {National Institute of Informatics},
YEAR = {2014},
BOOKTITLE = {Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies},
EDITOR = {Kando, Noriko and Joho, Hideo and Kishida, Kazuaki},
PAGES = {443--450},
ADDRESS = {Tokyo, Japan},
}

Endnote

%0 Conference Proceedings
%A Burghartz, Robin
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T MPI-INF at the NTCIR-11 Temporal Query Classification Task : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5418-1
%D 2014
%8 09.12.2014
%B 11th NTCIR Conference on Evaluation of Information Access Technologies
%Z date of event: 2014-12-09 - 2014-12-12
%C Tokyo, Japan
%B Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies
%E Kando, Noriko; Joho, Hideo; Kishida, Kazuaki
%P 443 - 450
%I National Institute of Informatics
%@ 978-4-86049-065-2
%U http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings11/pdf/NTCIR/Temporalia/03-NTCIR11-TEMPORALIA-BurghartzR.pdf

Proceedings

P. Chau, J. Vreeken, M. van Leeuwen, and C. Faloutsos

Eds., Proceedings of the ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics. Georgia Institute of Technology, 2014.

mehr

BibTeX

@proceedings{Chau2014,
TITLE = {Proceedings of the ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2014)},
EDITOR = {Chau, Polo and Vreeken, Jilles and van Leeuwen, Matthijs and Faloutsos, Christos},
LANGUAGE = {eng},
PUBLISHER = {Georgia Institute of Technology},
YEAR = {2014},
PAGES = {130 p.},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%E Chau, Polo
%E Vreeken, Jilles
%E van Leeuwen, Matthijs
%E Faloutsos, Christos
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Proceedings of the ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5564-F
%I Georgia Institute of Technology
%D 2014
%B ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics
%Z date of event: 2014-08-24 - 2014-08-24
%D 2014
%C New York, NY, USA
%P 130 p.
%U http://poloclub.gatech.edu/idea2014/papers/idea14-proceedings.pdf

Conference paper

L. Del Corro, R. Gemulla, and G. Weikum

“Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning,” in The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), Doha, Qatar, 2014.

mehr

BibTeX

@inproceedings{DelCorro2014,
TITLE = {Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning},
AUTHOR = {Del Corro, Luciano and Gemulla, Rainer and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-96-1},
URL = {http://aclweb.org/anthology/D14-1042},
PUBLISHER = {ACL},
YEAR = {2014},
BOOKTITLE = {The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014)},
PAGES = {374--385},
ADDRESS = {Doha, Qatar},
}

Endnote

%0 Conference Proceedings
%A Del Corro, Luciano
%A Gemulla, Rainer
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-51DF-E
%U http://aclweb.org/anthology/D14-1042
%D 2014
%B 2014 Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2014-10-25 - 2014-10-29
%C Doha, Qatar
%B The 2014 Conference on Empirical Methods in Natural Language Processing
%P 374 - 385
%I ACL
%@ 978-1-937284-96-1

Article

G. de Melo and G. Weikum

“Taxonomic Data Integration from Multilingual Wikipedia Editions,” Knowledge and Information Systems, vol. 39, no. 1, 2014.

mehr

BibTeX

@article{deMeloWeikum2013KAIS,
TITLE = {Taxonomic Data Integration from Multilingual {Wikipedia} Editions},
AUTHOR = {de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {0219-1377},
DOI = {10.1007%2Fs10115-012-0597-3},
LOCALID = {Local-ID: E21183D8146A7A86C1257B1100306F46-deMeloWeikum2013KAIS},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2014},
DATE = {2014},
JOURNAL = {Knowledge and Information Systems},
VOLUME = {39},
NUMBER = {1},
PAGES = {1--39},
}

Endnote

%0 Journal Article
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Taxonomic Data Integration from Multilingual Wikipedia Editions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A38-F
%F OTHER: Local-ID: E21183D8146A7A86C1257B1100306F46-deMeloWeikum2013KAIS
%R 10.1007%2Fs10115-012-0597-3
%7 2013-01-08
%D 2014
%J Knowledge and Information Systems
%V 39
%N 1
%& 1
%P 1 - 39
%I Springer
%C Berlin
%@ false

Conference paper

M. Dylla, M. Theobald, and I. Miliaraki

“Querying and Learning in Probabilistic Databases,” in Reasoning Web (RW 2014), Athens, Greece, 2014.

mehr

Abstract

Probabilistic Databases (PDBs) lie at the expressive intersection of databases, first-order logic, and probability theory. PDBs employ logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query answers via logical lineage formulas (aka. "data provenance") to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for establishing a closed and complete representation model of relational operations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs. In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.

BibTeX

@inproceedings{DyllaRW2014,
TITLE = {Querying and Learning in Probabilistic Databases},
AUTHOR = {Dylla, Maximilian and Theobald, Martin and Miliaraki, Iris},
LANGUAGE = {eng},
ISBN = {978-3-319-10587-1; 978-3-319-10586-4},
DOI = {10.1007/978-3-319-10587-1_8},
PUBLISHER = {Springer},
YEAR = {2014},
DATE = {2014},
ABSTRACT = {Probabilistic Databases (PDBs) lie at the expressive intersection of databases, first-order logic, and probability theory. PDBs employ logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query answers via logical lineage formulas (aka. "data provenance") to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for establishing a closed and complete representation model of relational operations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs. In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.},
BOOKTITLE = {Reasoning Web (RW 2014)},
EDITOR = {Koubarakis, Manolis and Stamou, Giorgos and Stoilos, Giorgos and Horrocks, Ian and Kolaitis, Phokion and Lausen, Georg and Weikum, Gerhard},
PAGES = {313--368},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {8714},
ADDRESS = {Athens, Greece},
}

Endnote

%0 Conference Proceedings
%A Dylla, Maximilian
%A Theobald, Martin
%A Miliaraki, Iris
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Querying and Learning in Probabilistic Databases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-E51D-9
%F OTHER: WOS:000348929200008
%R 10.1007/978-3-319-10587-1_8
%D 2014
%B 10th Reasoning Web Summer School 
%Z date of event: 2014-09-08 - 2014-09-13
%C Athens, Greece
%X Probabilistic Databases (PDBs) lie at the expressive intersection of databases, first-order logic, and probability theory. PDBs employ logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query answers via logical lineage formulas (aka. "data provenance") to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for establishing a closed and complete representation model of relational operations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs. In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.
%K Probabilistic and Temporal Databases
Deduction Rules
Consistency
Constraints
Information Extraction
LINEAGE
SYSTEMS
WEB
Computer Science, Information Systems
Computer Science, Theory &
Methods
%B Reasoning Web
%E Koubarakis, Manolis; Stamou, Giorgos; Stoilos, Giorgos; Horrocks, Ian; Kolaitis, Phokion; Lausen, Georg; Weikum, Gerhard
%P 313 - 368
%I Springer
%@ 978-3-319-10587-1 978-3-319-10586-4
%B Lecture Notes in Computer Science
%N 8714

Report

M. Dylla and M. Theobald

“Learning Tuple Probabilities in Probabilistic Databases,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2014-5-001, 2014.

mehr

Abstract

Learning the parameters of complex probabilistic-relational models from labeled

training data is a standard technique in machine learning, which has been

intensively studied in the subfield of Statistical Relational Learning (SRL),

but---so far---this is still an under-investigated topic in the context of

Probabilistic Databases (PDBs). In this paper, we focus on learning the

probability values of base tuples in a PDB from query answers, the latter of

which are represented as labeled lineage formulas. Specifically, we consider

labels in the form of pairs, each consisting of a Boolean lineage formula and a

marginal probability that comes attached to the corresponding query answer. The

resulting learning problem can be viewed as the inverse problem to confidence

computations in PDBs: given a set of labeled query answers, learn the

probability values of the base tuples, such that the marginal probabilities of

the query answers again yield in the assigned probability labels. We analyze

the learning problem from a theoretical perspective, devise two

optimization-based objectives, and provide an efficient algorithm (based on

Stochastic Gradient Descent) for solving these objectives. Finally, we conclude

this work by an experimental evaluation on three real-world and one synthetic

dataset, while competing with various techniques from SRL, reasoning in

information extraction, and optimization.

BibTeX

@techreport{Dylla-Learning2014,
TITLE = {Learning Tuple Probabilities in Probabilistic Databases},
AUTHOR = {Dylla, Maximilian and Theobald, Martin},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2014-5-001},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2014},
ABSTRACT = {Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from query answers, the latter of which are represented as labeled lineage formulas. Specifically, we consider labels in the form of pairs, each consisting of a Boolean lineage formula and a marginal probability that comes attached to the corresponding query answer. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, devise two optimization-based objectives, and provide an efficient algorithm (based on Stochastic Gradient Descent) for solving these objectives. Finally, we conclude this work by an experimental evaluation on three real-world and one synthetic dataset, while competing with various techniques from SRL, reasoning in information extraction, and optimization.},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Dylla, Maximilian
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Learning Tuple Probabilities in Probabilistic Databases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0019-8492-6
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2014
%P 51 p.
%X Learning the parameters of complex probabilistic-relational models from labeled 
training data is a standard technique in machine learning, which has been 
intensively studied in the subfield of Statistical Relational Learning (SRL), 
but---so far---this is still an under-investigated topic in the context of 
Probabilistic Databases (PDBs). In this paper, we focus on learning the 
probability values of base tuples in a PDB from query answers, the latter of 
which are represented as labeled lineage formulas. Specifically, we consider 
labels in the form of pairs, each consisting of a Boolean lineage formula and a 
marginal probability that comes attached to the corresponding query answer. The 
resulting learning problem can be viewed as the inverse problem to confidence 
computations in PDBs: given a set of labeled query answers, learn the 
probability values of the base tuples, such that the marginal probabilities of 
the query answers again yield in the assigned probability labels. We analyze 
the learning problem from a theoretical perspective, devise two 
optimization-based objectives, and provide an efficient algorithm (based on 
Stochastic Gradient Descent) for solving these objectives. Finally, we conclude 
this work by an experimental evaluation on three real-world and one synthetic 
dataset, while competing with various techniques from SRL, reasoning in 
information extraction, and optimization.
%B Research Report
%@ false

Thesis

D5IMPR-CS

M. Dylla

“Efficient Querying and Learning in Probabilistic and Temporal Databases,” Universität des Saarlandes, Saarbrücken, 2014.

mehr

Abstract

Probabilistic databases store, query, and manage large amounts of uncertain information. This thesis advances the state-of-the-art in probabilistic databases in three different ways:
1. We present a closed and complete data model for temporal probabilistic databases and analyze its complexity. Queries are posed via temporal deduction rules which induce lineage formulas capturing both time and uncertainty.
2. We devise a methodology for computing the top-k most probable query answers. It is based on first-order lineage formulas representing sets of answer candidates. Theoretically derived probability bounds on these formulas enable pruning low-probability answers.
3. We introduce the problem of learning tuple probabilities which allows updating and cleaning of probabilistic databases. We study its complexity, characterize its solutions, cast it into an optimization problem, and devise an approximation algorithm based on stochastic gradient descent.
All of the above contributions support consistency constraints and are evaluated experimentally.

BibTeX

@phdthesis{DyllaPhDThesis2014,
TITLE = {Efficient Querying and Learning in Probabilistic and Temporal Databases},
AUTHOR = {Dylla, Maximilian},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-58146},
DOI = {10.22028/D291-26567},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2014},
DATE = {2014},
ABSTRACT = {Probabilistic databases store, query, and manage large amounts of uncertain information. This thesis advances the state-of-the-art in probabilistic databases in three different ways:<br>1. We present a closed and complete data model for temporal probabilistic databases and analyze its complexity. Queries are posed via temporal deduction rules which induce lineage formulas capturing both time and uncertainty.<br>2. We devise a methodology for computing the top-k most probable query answers. It is based on first-order lineage formulas representing sets of answer candidates. Theoretically derived probability bounds on these formulas enable pruning low-probability answers.<br>3. We introduce the problem of learning tuple probabilities which allows updating and cleaning of probabilistic databases. We study its complexity, characterize its solutions, cast it into an optimization problem, and devise an approximation algorithm based on stochastic gradient descent.<br>All of the above contributions support consistency constraints and are evaluated experimentally.},
}

Endnote

%0 Thesis
%A Dylla, Maximilian
%Y Weikum, Gerhard
%A referee: Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficient Querying and Learning in Probabilistic and Temporal Databases :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-3C44-E
%U urn:nbn:de:bsz:291-scidok-58146
%R 10.22028/D291-26567
%F OTHER: hdl:20.500.11880/26623
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2014
%P VIII, 169 p.
%V phd
%9 phd
%X Probabilistic databases store, query, and manage large amounts of uncertain information. This thesis advances the state-of-the-art in probabilistic databases in three different ways: 1. We present a closed and complete data model for temporal probabilistic databases and analyze its complexity. Queries are posed via temporal deduction rules which induce lineage formulas capturing both time and uncertainty. 2. We devise a methodology for computing the top-k most probable query answers. It is based on first-order lineage formulas representing sets of answer candidates. Theoretically derived probability bounds on these formulas enable pruning low-probability answers. 3. We introduce the problem of learning tuple probabilities which allows updating and cleaning of probabilistic databases. We study its complexity, characterize its solutions, cast it into an optimization problem, and devise an approximation algorithm based on stochastic gradient descent. All of the above contributions support consistency constraints and are evaluated experimentally.
%K Deduction Rules, Probabilistic Database, Temporal Database, Learning, Constraints, Top-k
%U http://scidok.sulb.uni-saarland.de/volltexte/2014/5814/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Article

D. Erdős, R. Gemulla, and E. Terzi

“Reconstructing Graphs from Neighborhood Data,” ACM Transactions on Knowledge Discovery from Data, vol. 8, no. 4, 2014.

mehr

BibTeX

@article{Erdos:2014:RGN:2663597.2641761,
TITLE = {Reconstructing Graphs from Neighborhood Data},
AUTHOR = {Erd{\H o}s, D{\'o}ra and Gemulla, Rainer and Terzi, Evimaria},
LANGUAGE = {eng},
ISSN = {1556-4681},
DOI = {10.1145/2641761},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2014},
DATE = {2014},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {8},
NUMBER = {4},
PAGES = {1--22},
EID = {23},
}

Endnote

%0 Journal Article
%A Erd&#337;s, D&#243;ra
%A Gemulla, Rainer
%A Terzi, Evimaria
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Reconstructing Graphs from Neighborhood Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-692A-E
%R 10.1145/2641761
%7 2014
%D 2014
%K Bipartite graph reconstruction, adjacency matrix, singular value decomposition
%J ACM Transactions on Knowledge Discovery from Data
%O TKDD
%V 8
%N 4
%& 1
%P 1 - 22
%Z sequence number: 23
%I ACM
%C New York, NY
%@ false

Conference paper

P. Ernst, C. Meng, A. Siu, and G. Weikum

“KnowLife: A Knowledge Graph for Health and Life Sciences,” in 30th International Conference on Data Engineering (ICDE 2014), Chicago, IL, USA, 2014.

mehr

BibTeX

@inproceedings{DBLP:conf/icde/ErnstMSW14,
TITLE = {{KnowLife}: A Knowledge Graph for Health and Life Sciences},
AUTHOR = {Ernst, Patrick and Meng, Cynthia and Siu, Amy and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1109/ICDE.2014.6816754},
PUBLISHER = {IEEE},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {30th International Conference on Data Engineering (ICDE 2014)},
PAGES = {1254--1257},
ADDRESS = {Chicago, IL, USA},
}

Endnote

%0 Conference Proceedings
%A Ernst, Patrick
%A Meng, Cynthia
%A Siu, Amy
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T KnowLife: A Knowledge Graph for Health and Life Sciences : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6BA0-1
%R 10.1109/ICDE.2014.6816754
%D 2014
%B 30th International Conference on Data Engineering
%Z date of event: 2014-03-31 - 2014-04-04
%C Chicago, IL, USA
%B 30th International Conference on Data Engineering
%P 1254 - 1257
%I IEEE
%U http://dx.doi.org/10.1109/ICDE.2014.6816754

Conference paper

E. Galbrun and P. Miettinen

“Interactive Redescription Mining,” in SIGMOD’14, ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 2014.

mehr

Abstract

Exploratory data analysis consists of multiple iterated steps: a data mining method is run on the data, the results are interpreted, new insights are formed, and the resulting knowl- edge is utilized when executing the method in a next round, and so on until satisfactory results are obtained.

We focus on redescription mining, a powerful data analysis method that aims at finding alternative descriptions of the same entities, for example, ways to characterize geographical regions in terms of both the fauna that inhabits them and their bioclimatic conditions, so-called bioclimatic niches.

We present Siren, a tool for interactive redescription min- ing. It is designed to facilitate the exploratory analysis of data by providing a seamless environment for mining, visu- alizing and editing redescriptions in an interactive fashion, supporting the analysis process in all its stages. We demon- strate its use for exploratory data mining.

Simultaneously, Siren exemplifies the power of the various visualizations and means of interaction integrated into it; Techniques that reach beyond the task of redescription mining considered here, to other analysis methods.

BibTeX

@inproceedings{galbrun14interactive,
TITLE = {Interactive Redescription Mining},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-4503-2376-5},
DOI = {10.1145/2588555.2594520},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014-03},
ABSTRACT = {Exploratory data analysis consists of multiple iterated steps: a data mining method is run on the data, the results are interpreted, new insights are formed, and the resulting knowl- edge is utilized when executing the method in a next round, and so on until satisfactory results are obtained. We focus on redescription mining, a powerful data analysis method that aims at finding alternative descriptions of the same entities, for example, ways to characterize geographical regions in terms of both the fauna that inhabits them and their bioclimatic conditions, so-called bioclimatic niches. We present Siren, a tool for interactive redescription min- ing. It is designed to facilitate the exploratory analysis of data by providing a seamless environment for mining, visu- alizing and editing redescriptions in an interactive fashion, supporting the analysis process in all its stages. We demon- strate its use for exploratory data mining. Simultaneously, Siren exemplifies the power of the various visualizations and means of interaction integrated into it; Techniques that reach beyond the task of redescription mining considered here, to other analysis methods.},
BOOKTITLE = {SIGMOD'14, ACM SIGMOD International Conference on Management of Data},
DEBUG = {author: {\"O}zsu, M. Tamer},
EDITOR = {Dyresson, Curtis and Li, Feifei},
PAGES = {1079--1082},
ADDRESS = {Snowbird, UT, USA},
}

Endnote

%0 Conference Proceedings
%A Galbrun, Esther
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Interactive Redescription Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4987-F
%R 10.1145/2588555.2594520
%D 2014
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2014-06-22 - 2014-06-27
%C Snowbird, UT, USA
%X Exploratory data analysis consists of multiple iterated steps: a data mining method is run on the data, the results are interpreted, new insights are formed, and the resulting knowl- edge is utilized when executing the method in a next round, and so on until satisfactory results are obtained.
We focus on redescription mining, a powerful data analysis method that aims at finding alternative descriptions of the same entities, for example, ways to characterize geographical regions in terms of both the fauna that inhabits them and their bioclimatic conditions, so-called bioclimatic niches.
We present Siren, a tool for interactive redescription min- ing. It is designed to facilitate the exploratory analysis of data by providing a seamless environment for mining, visu- alizing and editing redescriptions in an interactive fashion, supporting the analysis process in all its stages. We demon- strate its use for exploratory data mining.
Simultaneously, Siren exemplifies the power of the various visualizations and means of interaction integrated into it; Techniques that reach beyond the task of redescription mining considered here, to other analysis methods.
%B SIGMOD'14
%E Dyresson, Curtis; Li, Feifei; &#214;zsu, M. Tamer
%P 1079 - 1082
%I ACM
%@ 978-1-4503-2376-5

Conference paper

A. Grycner and G. Weikum

“HARPY: Hypernyms and Alignment of Relational Paraphrases,” in Proceedings of COLING 2014: Technical Papers, Dublin, Ireland, 2014.

mehr

Abstract

Collections of relational paraphrases have been automatically constructed from \u000Alarge text corpora, as a WordNet counterpart for the realm of binary predicates \u000Aand their surface forms.\u000AHowever, these resources fall short in their coverage of hypernymy links \u000A(subsumptions) among the synsets of phrases. \u000AThis paper closes this gap by computing a high‐quality alignment between the \u000Arelational phrases of the Patty taxonomy, one of the largest collections of \u000Athis kind, and the verb senses of WordNet. To this end, we devise judicious \u000Afeatures and develop a graph‐based alignment algorithm by adapting and \u000Aextending the SimRank random‐walk method.\u000AThe resulting taxonomy of relational phrases and verb senses, coined HARPY, \u000Acontains 20,812 synsets organized into a \em Directed Acyclic Graph (DAG)} \u000Awith 616,792 hypernymy links. \u000AOur empirical assessment, indicates that the alignment links between Patty and \u000AWordNet have high accuracy, with {\em Mean Reciprocal Rank (MRR)} score 0.7 and \u000A{\em Normalized Discounted Cumulative Gain (NDCG) score 0.73. \u000AAs an additional extrinsic value, HARPY provides fine‐grained lexical types for \u000Athe arguments of verb senses in WordNet.

BibTeX

@inproceedings{grycner-weikum:2014:Coling,
TITLE = {{HARPY}: {Hypernyms} and Alignment of Relational Paraphrases},
AUTHOR = {Grycner, Adam and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-941643-26-6},
URL = {http://www.aclweb.org/anthology/C14-1207},
PUBLISHER = {ACL},
YEAR = {2014},
ABSTRACT = {Collections of relational paraphrases have been automatically constructed from \u000Alarge text corpora, as a WordNet counterpart for the realm of binary predicates \u000Aand their surface forms.\u000AHowever, these resources fall short in their coverage of hypernymy links \u000A(subsumptions) among the synsets of phrases. \u000AThis paper closes this gap by computing a high-quality alignment between the \u000Arelational phrases of the Patty taxonomy, one of the largest collections of \u000Athis kind, and the verb senses of WordNet. To this end, we devise judicious \u000Afeatures and develop a graph-based alignment algorithm by adapting and \u000Aextending the SimRank random-walk method.\u000AThe resulting taxonomy of relational phrases and verb senses, coined HARPY, \u000Acontains 20,812 synsets organized into a \em Directed Acyclic Graph (DAG)} \u000Awith 616,792 hypernymy links. \u000AOur empirical assessment, indicates that the alignment links between Patty and \u000AWordNet have high accuracy, with {\em Mean Reciprocal Rank (MRR)} score 0.7 and \u000A{\em Normalized Discounted Cumulative Gain (NDCG) score 0.73. \u000AAs an additional extrinsic value, HARPY provides fine-grained lexical types for \u000Athe arguments of verb senses in WordNet.},
BOOKTITLE = {Proceedings of COLING 2014: Technical Papers},
EDITOR = {Hajic, Jan and Tsujii, Junichi},
PAGES = {2195--2204},
EID = {C14},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Grycner, Adam
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T HARPY: Hypernyms and Alignment of Relational Paraphrases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-3329-1
%U http://www.aclweb.org/anthology/C14-1207
%D 2014
%B 25th International Conference on Computational Linguistics
%Z date of event: 2014-08-23 - 2014-08-29
%C Dublin, Ireland
%X Collections of relational paraphrases have been automatically constructed from \u000Alarge text corpora, as a WordNet counterpart for the realm of binary predicates \u000Aand their surface forms.\u000AHowever, these resources fall short in their coverage of hypernymy links \u000A(subsumptions) among the synsets of phrases. \u000AThis paper closes this gap by computing a high&#8208;quality alignment between the \u000Arelational phrases of the Patty taxonomy, one of the largest collections of \u000Athis kind, and the verb senses of WordNet. To this end, we devise judicious \u000Afeatures and develop a graph&#8208;based alignment algorithm by adapting and \u000Aextending the SimRank random&#8208;walk method.\u000AThe resulting taxonomy of relational phrases and verb senses, coined HARPY, \u000Acontains 20,812 synsets organized into a \em Directed Acyclic Graph (DAG)} \u000Awith 616,792 hypernymy links. \u000AOur empirical assessment, indicates that the alignment links between Patty and \u000AWordNet have high accuracy, with {\em Mean Reciprocal Rank (MRR)} score 0.7 and \u000A{\em Normalized Discounted Cumulative Gain (NDCG) score 0.73. \u000AAs an additional extrinsic value, HARPY provides fine&#8208;grained lexical types for \u000Athe arguments of verb senses in WordNet.
%B Proceedings of COLING 2014: Technical Papers
%E Hajic, Jan; Tsujii, Junichi
%P 2195 - 2204
%Z sequence number: C14
%I ACL
%@ 978&#8208;1&#8208;941643&#8208;26&#8208;6

Conference paper

A. Grycner, G. Weikum, J. Pujara, J. Foulds, and L. Getoor

“A Unified Probabilistic Approach for Semantic Clustering of Relational Phrases,” in AKBC 2014, 4th Workshop on Automated Knowledge Base Construction, Montreal, Canada, 2014.

mehr

BibTeX

@inproceedings{grycner2014:AKBC,
TITLE = {A Unified Probabilistic Approach for Semantic Clustering of Relational Phrases},
AUTHOR = {Grycner, Adam and Weikum, Gerhard and Pujara, Jay and Foulds, James and Getoor, Lise},
LANGUAGE = {eng},
URL = {http://www.akbc.ws/2014/submissions/akbc2014_submission_13.pdf},
PUBLISHER = {AKBC Board},
YEAR = {2014},
BOOKTITLE = {AKBC 2014, 4th Workshop on Automated Knowledge Base Construction},
ADDRESS = {Montreal, Canada},
}

Endnote

%0 Conference Proceedings
%A Grycner, Adam
%A Weikum, Gerhard
%A Pujara, Jay
%A Foulds, James
%A Getoor, Lise
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T A Unified Probabilistic Approach for Semantic Clustering of Relational Phrases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5B22-D
%U http://www.akbc.ws/2014/submissions/akbc2014_submission_13.pdf
%D 2014
%B 4th Workshop on Automated Knowledge Base Construction
%Z date of event: 2014-12-13 - 2014-12-13
%C Montreal, Canada
%B AKBC 2014
%I AKBC Board
%U http://www.akbc.ws/2014/submissions/akbc2014_submission_13.pdf

Conference paper

D. Gupta and K. Berberich

“Identifying Time Intervals of Interest to Queries,” in CIKM’14, 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{DBLP:conf/cikm/GuptaB14,
TITLE = {Identifying Time Intervals of Interest to Queries},
AUTHOR = {Gupta, Dhruv and Berberich, Klaus},
LANGUAGE = {eng},
DOI = {10.1145/2661829.2661927},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {CIKM'14, 23rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Li, Jianzhong and Wang, Xiaoyang Sean and Garofalakis, Minos N. and Soboroff, Ian and Suel, Torsten and Wang, Min},
PAGES = {1835--1838},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Gupta, Dhruv
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Identifying Time Intervals of Interest to Queries : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5435-1
%R 10.1145/2661829.2661927
%D 2014
%B 23rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-07
%C Shanghai, China
%B CIKM'14
%E Li, Jianzhong; Wang, Xiaoyang Sean; Garofalakis, Minos N.; Soboroff, Ian; Suel, Torsten; Wang, Min
%P 1835 - 1838
%I ACM

Conference paper

S. Gurajada, S. Seufert, I. Miliaraki, and M. Theobald

“TriAD: A Distributed Shared-nothing RDF Engine Based on Asynchronous Message Passing,” in SIGMOD’14, ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 2014.

mehr

BibTeX

@inproceedings{Gurajada:2014:TDS:2588555.2610511,
TITLE = {{TriAD}: A Distributed Shared-nothing {RDF} Engine Based on Asynchronous Message Passing},
AUTHOR = {Gurajada, Sairam and Seufert, Stephan and Miliaraki, Iris and Theobald, Martin},
LANGUAGE = {eng},
ISBN = {978-1-4503-2376-5},
DOI = {10.1145/2588555.2610511},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {SIGMOD'14, ACM SIGMOD International Conference on Management of Data},
EDITOR = {Dyresson, Curtis and Li, Feifei and {\"O}zsu, M. Tamer},
PAGES = {289--300},
ADDRESS = {Snowbird, UT, USA},
}

Endnote

%0 Conference Proceedings
%A Gurajada, Sairam
%A Seufert, Stephan
%A Miliaraki, Iris
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TriAD: A Distributed Shared-nothing RDF Engine Based on Asynchronous Message Passing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C81-2
%R 10.1145/2588555.2610511
%D 2014
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2014-06-22 - 2014-06-27
%C Snowbird, UT, USA
%K asynchronous message passing, distributed RDF indexing 38; SparQL processing, join-ahead pruning, parallel join evaluation
%B SIGMOD'14
%E Dyresson, Curtis; Li, Feifei; &#214;zsu, M. Tamer
%P 289 - 300
%I ACM
%@ 978-1-4503-2376-5

Conference paper

S. Gurajada, S. Seufert, I. Miliaraki, and M. Theobald

“Using Graph Summarization for Join-ahead Pruning in a Distributed RDF Engine,” in SWIM’14, 6th International Workshop on Semantic Web Information Management, Snowbird, UT, USA, 2014.

mehr

BibTeX

@inproceedings{Gurajada:2014:UGS:2630602.2630610,
TITLE = {Using Graph Summarization for Join-ahead Pruning in a Distributed {RDF} Engine},
AUTHOR = {Gurajada, Sairam and Seufert, Stephan and Miliaraki, Iris and Theobald, Martin},
LANGUAGE = {eng},
ISBN = {978-1-4503-2994-1},
DOI = {10.1145/2630602.2630610},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {SWIM'14, 6th International Workshop on Semantic Web Information Management},
PAGES = {1--4},
EID = {41},
ADDRESS = {Snowbird, UT, USA},
}

Endnote

%0 Conference Proceedings
%A Gurajada, Sairam
%A Seufert, Stephan
%A Miliaraki, Iris
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Using Graph Summarization for Join-ahead Pruning in a Distributed RDF Engine : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C65-2
%R 10.1145/2630602.2630610
%D 2014
%B 6th International Workshop on Semantic Web Information Management
%Z date of event: 2014-06-22 - 2014-06-27
%C Snowbird, UT, USA
%B SWIM'14
%P 1 - 4
%Z sequence number: 41
%I ACM
%@ 978-1-4503-2994-1

Book

A. Harth, K. Hose, and R. Schenkel

Eds., Linked Data Management. Boca Raton, FL: CRC Press, 2014.

mehr

BibTeX

@book{LinkedDataBook2014,
TITLE = {Linked Data Management},
EDITOR = {Harth, Andreas and Hose, Katja and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1466582408; 1466582405},
PUBLISHER = {CRC Press},
ADDRESS = {Boca Raton, FL},
YEAR = {2014},
DATE = {2014},
PAGES = {576 p.},
SERIES = {Emerging Directions in Database Systems and Applications},
}

Endnote

%0 Edited Book
%A Harth, Andreas
%A Hose, Katja
%A Schenkel, Ralf
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Linked Data Management : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0019-8478-2
%@ 978-1466582408
%@ 1466582405
%I CRC Press
%C Boca Raton, FL
%D 2014
%P 576 p.
%B Emerging Directions in Database Systems and Applications

Conference paper

J. Hoffart, D. Milchevski, and G. Weikum

“STICS: Searching with Strings, Things, and Cats,” in SIGIR’14, 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, Gold Coast, Australia, 2014.

mehr

BibTeX

@inproceedings{Hoffart:2014dt,
TITLE = {{STICS}: Searching with Strings, Things, and Cats},
AUTHOR = {Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2257-7},
DOI = {10.1145/2600428.2611177},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {SIGIR'14, 37th International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {1247--1248},
ADDRESS = {Gold Coast, Australia},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%A Milchevski, Dragan
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T STICS: Searching with Strings, Things, and Cats : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5344-7
%R 10.1145/2600428.2611177
%D 2014
%B 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
%Z date of event: 2014-07-06 - 2014-07-11
%C Gold Coast, Australia
%B SIGIR'14
%P 1247 - 1248
%I ACM
%@ 978-1-4503-2257-7

Conference paper

J. Hoffart, Y. Altun, and G. Weikum

“Discovering Emerging Entities with Ambiguous Names,” in WWW’14, 23rd International World Wide Web Conference, Seoul, Korea, 2014.

mehr

BibTeX

@inproceedings{Hoffart:2014hp,
TITLE = {Discovering Emerging Entities with Ambiguous Names},
AUTHOR = {Hoffart, Johannes and Altun, Yasemin and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2744-2},
DOI = {10.1145/2566486.2568003},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {WWW'14, 23rd International World Wide Web Conference},
EDITOR = {Chung, Chin-Wan and Broder, Andrei and Shin, Kyuseok and Suel, Torsten},
PAGES = {385--395},
ADDRESS = {Seoul, Korea},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%A Altun, Yasemin
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Emerging Entities with Ambiguous Names : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5364-0
%R 10.1145/2566486.2568003
%D 2014
%B 23rd International World Wide Web Conference
%Z date of event: 2014-04-07 - 2014-04-11
%C Seoul, Korea
%B WWW'14
%E Chung, Chin-Wan; Broder, Andrei; Shin, Kyuseok; Suel, Torsten
%P 385 - 395
%I ACM
%@ 978-1-4503-2744-2

Conference paper

J. Hoffart, D. Milchevski, and G. Weikum

“AESTHETICS: Analytics with Strings, Things, and Cats,” in CIKM’14, 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{Hoffart:2014cy,
TITLE = {{AESTHETICS}: Analytics with Strings, Things, and Cats},
AUTHOR = {Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2598-1},
DOI = {10.1145/2661829.2661835},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {CIKM'14, 23rd ACM International Conference on Information and Knowledge Management},
DEBUG = {author: Wang, Min},
EDITOR = {Li, Jianzhong and Wang, X. Sean and Garofalakis, Minos and Soboroff, Ian and Suel, Torsten},
PAGES = {2018--2020},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%A Milchevski, Dragan
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T AESTHETICS: Analytics with Strings, Things, and Cats : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-536B-2
%R 10.1145/2661829.2661835
%D 2014
%B 23rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-07
%C Shanghai, China
%B CIKM'14
%E Li, Jianzhong; Wang, X. Sean; Garofalakis, Minos; Soboroff, Ian; Suel, Torsten; Wang, Min
%P 2018 - 2020
%I ACM
%@ 978-1-4503-2598-1

Conference paper

K. Hui

“Towards Robust & Reusable Evaluation for Novelty & Diversity,” in PIKM’14, 7th PhD Workshop in Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{Hui-pikm2014,
TITLE = {Towards Robust \& Reusable Evaluation for Novelty \& Diversity},
AUTHOR = {Hui, Kai},
LANGUAGE = {eng},
ISBN = {978-1-4503-1481-7},
DOI = {10.1145/2663714.2668045},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {PIKM'14, 7th PhD Workshop in Information and Knowledge Management},
EDITOR = {de Melo, Gerard and Kacimi, Mouna and Varde, Aparna S.},
PAGES = {9--17},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Hui, Kai
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Towards Robust & Reusable Evaluation for Novelty & Diversity : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4F55-D
%R 10.1145/2663714.2668045
%D 2014
%B 7th PhD Workshop in Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-03
%C Shanghai, China
%B PIKM'14
%E de Melo, Gerard; Kacimi, Mouna; Varde, Aparna S.
%P 9 - 17
%I ACM
%@ 978-1-4503-1481-7

Conference paper

Y. Ibrahim, M. A. Yosef, and G. Weikum

“AIDA-Social: Entity Linking on the Social Stream,” in ESAIR’14, 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{mamir:2014:aida-social,
TITLE = {{AIDA}-{Social}: {Entity} Linking on the Social Stream},
AUTHOR = {Ibrahim, Yusra and Yosef, Mohamed Amir and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1365-0},
DOI = {10.1145/2663712.2666185},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {ESAIR'14, 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval},
EDITOR = {Alonso, Omar and Kamps, Jaap and Karlgren, Jussi},
PAGES = {17--19},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Ibrahim, Yusra
%A Yosef, Mohamed Amir
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T AIDA-Social: Entity Linking on the Social Stream : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-54A3-7
%R 10.1145/2663712.2666185
%D 2014
%B 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval
%Z date of event: 2014-11-07 - 2014-11-07
%C Shanghai, China
%K information extraction, named entity linking, semantic annotation, social media
%B ESAIR'14
%E Alonso, Omar; Kamps, Jaap; Karlgren, Jussi
%P 17 - 19
%I ACM
%@ 978-1-4503-1365-0
%U http://doi.acm.org/10.1145/2663712.2666185

Conference paper

S. Karaev

“NASSAU: Description Length Minimization for Boolean Matrix Factorization,” in ECML/PKDD 2014 PhD Session Proceedings, Nancy, France, 2014.

mehr

BibTeX

@inproceedings{karaev2014nassau,
TITLE = {NASSAU: {D}escription Length Minimization for {Boolean} Matrix Factorization},
AUTHOR = {Karaev, Sanjar},
LANGUAGE = {eng},
URL = {https://phdsession-ecmlpkdd2014.greyc.fr/sites/phdsession-ecmlpkdd2014.greyc.fr/files/papers/Paper_20702.pdf},
PUBLISHER = {University of Caen},
YEAR = {2014},
BOOKTITLE = {ECML/PKDD 2014 PhD Session Proceedings},
EDITOR = {Belohlavek, Radim and Cr{\'e}milleux, Bruno},
PAGES = {177--186},
ADDRESS = {Nancy, France},
}

Endnote

%0 Conference Proceedings
%A Karaev, Sanjar
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T NASSAU: Description Length Minimization for Boolean Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-51A9-6
%U https://phdsession-ecmlpkdd2014.greyc.fr/sites/phdsession-ecmlpkdd2014.greyc.fr/files/papers/Paper_20702.pdf
%D 2014
%B The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2014-09-15 - 2014-09-19
%C Nancy, France
%B ECML/PKDD 2014 PhD Session Proceedings
%E Belohlavek, Radim; Cr&#233;milleux, Bruno
%P 177 - 186
%I University of Caen

Conference paper

S. K. Kondreddi, P. Triantafillou, and G. Weikum

“Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition,” in 30th IEEE International Conference on Data Engineering (ICDE 2014), Chicago, IL, USA, 2014.

mehr

Abstract

Automatic information extraction (IE) enables the construction of very large

knowledge bases (KBs), with relational facts on millions of entities from text

corpora and Web sources. However, such KBs contain errors and they are far from

being complete. This motivates the need for exploiting human intelligence and

knowledge using crowd-based human computing (HC) for assessing the validity of

facts and for gathering additional knowledge. This paper presents a novel

system architecture, called Higgins, which shows how to effectively integrate

an IE engine and a HC engine. Higgins generates game questions

where players choose or fill in missing relations for subject-relation-object

triples. For generating multiple-choice answer candidates, we have constructed

a large dictionary of entity names and relational phrases, and have developed

specifically designed statistical language models for phrase relatedness. To

this end, we combine semantic resources like WordNet, ConceptNet, and others

with statistics derived from a large Web corpus. We demonstrate the

effectiveness of Higgins for knowledge acquisition by crowdsourced gathering of

relationships between characters in narrative descriptions of movies and books.

BibTeX

@inproceedings{Kondreddi2014a,
TITLE = {Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition},
AUTHOR = {Kondreddi, Sarath Kumar and Triantafillou, Peter and Weikum, Gerhard},
LANGUAGE = {eng},
DOI = {10.1109/ICDE.2014.6816717},
PUBLISHER = {IEEE},
YEAR = {2014},
DATE = {2014},
ABSTRACT = {Automatic information extraction (IE) enables the construction of very large knowledge bases (KBs), with relational facts on millions of entities from text corpora and Web sources. However, such KBs contain errors and they are far from being complete. This motivates the need for exploiting human intelligence and knowledge using crowd-based human computing (HC) for assessing the validity of facts and for gathering additional knowledge. This paper presents a novel system architecture, called Higgins, which shows how to effectively integrate an IE engine and a HC engine. Higgins generates game questions where players choose or fill in missing relations for subject-relation-object triples. For generating multiple-choice answer candidates, we have constructed a large dictionary of entity names and relational phrases, and have developed specifically designed statistical language models for phrase relatedness. To this end, we combine semantic resources like WordNet, ConceptNet, and others with statistics derived from a large Web corpus. We demonstrate the effectiveness of Higgins for knowledge acquisition by crowdsourced gathering of relationships between characters in narrative descriptions of movies and books.},
BOOKTITLE = {30th IEEE International Conference on Data Engineering (ICDE 2014)},
PAGES = {988--999},
ADDRESS = {Chicago, IL, USA},
}

Endnote

%0 Conference Proceedings
%A Kondreddi, Sarath Kumar
%A Triantafillou, Peter
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Combining Information Extraction and Human Computing for Crowdsourced Knowledge Acquisition : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0023-C15D-6
%R 10.1109/ICDE.2014.6816717
%D 2014
%B 30th IEEE International Conference on Data Engineering
%Z date of event: 2014-03-31 - 2014-04-04
%C Chicago, IL, USA
%X Automatic information extraction (IE) enables the construction of very large 
knowledge bases (KBs), with relational facts on millions of entities from text 
corpora and Web sources. However, such KBs contain errors and they are far from 
being complete. This motivates the need for exploiting human intelligence and 
knowledge using crowd-based human computing (HC) for assessing the validity of 
facts and for gathering additional knowledge. This paper presents a novel 
system architecture, called Higgins, which shows how to effectively integrate 
an IE engine and a HC engine. Higgins generates game questions
where players choose or fill in missing relations for subject-relation-object 
triples. For generating multiple-choice answer candidates, we have constructed 
a large dictionary of entity names and relational phrases, and have developed 
specifically designed statistical language models for phrase relatedness. To 
this end, we combine semantic resources like WordNet, ConceptNet, and others 
with statistics derived from a large Web corpus. We demonstrate the 
effectiveness of Higgins for knowledge acquisition by crowdsourced gathering of 
relationships between characters in narrative descriptions of movies and books.
%B 30th IEEE International Conference on Data Engineering
%P 988 - 999
%I IEEE

Thesis

D5IMPR-CS

S. K. Kondreddi

“Human Computing and Crowdsourcing Methods for Knowledge Acquisition,” Universität des Saarlandes, Saarbrücken, 2014.

mehr

Abstract

Ambiguity, complexity, and diversity in natural language textual expressions
are major hindrances to automated knowledge extraction. As a result
state-of-the-art methods for extracting entities and relationships from
unstructured data make incorrect extractions or produce noise. With the advent
of human computing, computationally hard tasks have been addressed through
human inputs. While text-based knowledge acquisition can benefit from this
approach, humans alone cannot bear the burden of extracting knowledge from the
vast textual resources that exist today. Even making payments for crowdsourced
acquisition can quickly become prohibitively expensive.
In this thesis we present principled methods that effectively garner human
computing inputs for improving the extraction of knowledge-base facts from
natural language texts. Our methods complement automatic extraction techniques
with human computing to reap the benefits of both while overcoming each other�s
limitations. We present the architecture and implementation of HIGGINS, a
system that combines an information extraction (IE) engine with a human
computing (HC) engine to produce high quality facts. The IE engine combines
statistics derived from large Web corpora with semantic resources like WordNet
and ConceptNet to construct a large dictionary of entity and relational
phrases. It employs specifically designed statistical language models for
phrase relatedness to come up with questions and relevant candidate answers
that are presented to human workers. Through extensive experiments we establish
the superiority of this approach in extracting relation-centric facts from
text. In our experiments we extract facts about fictitious characters in
narrative text, where the issues of diversity and complexity in expressing
relations are far more pronounced. Finally, we also demonstrate how interesting
human computing games can be designed for knowledge acquisition tasks.

BibTeX

@phdthesis{Kondreddi2014b,
TITLE = {Human Computing and Crowdsourcing Methods for Knowledge Acquisition},
AUTHOR = {Kondreddi, Sarath Kumar},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-57948},
DOI = {10.22028/D291-26564},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2014},
DATE = {2014},
ABSTRACT = {Ambiguity, complexity, and diversity in natural language textual expressions <br>are major hindrances to automated knowledge extraction. As a result <br>state-of-the-art methods for extracting entities and relationships from <br>unstructured data make incorrect extractions or produce noise. With the advent <br>of human computing, computationally hard tasks have been addressed through <br>human inputs. While text-based knowledge acquisition can benefit from this <br>approach, humans alone cannot bear the burden of extracting knowledge from the <br>vast textual resources that exist today. Even making payments for crowdsourced <br>acquisition can quickly become prohibitively expensive.<br>In this thesis we present principled methods that effectively garner human <br>computing inputs for improving the extraction of knowledge-base facts from <br>natural language texts. Our methods complement automatic extraction techniques <br>with human computing to reap the benefits of both while overcoming each other{\diamond}s <br>limitations. We present the architecture and implementation of HIGGINS, a <br>system that combines an information extraction (IE) engine with a human <br>computing (HC) engine to produce high quality facts. The IE engine combines <br>statistics derived from large Web corpora with semantic resources like WordNet <br>and ConceptNet to construct a large dictionary of entity and relational <br>phrases. It employs specifically designed statistical language models for <br>phrase relatedness to come up with questions and relevant candidate answers <br>that are presented to human workers. Through extensive experiments we establish <br>the superiority of this approach in extracting relation-centric facts from <br>text. In our experiments we extract facts about fictitious characters in <br>narrative text, where the issues of diversity and complexity in expressing <br>relations are far more pronounced. Finally, we also demonstrate how interesting <br>human computing games can be designed for knowledge acquisition tasks.},
}

Endnote

%0 Thesis
%A Kondreddi, Sarath Kumar
%Y Triantafillou, Peter
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Human Computing and Crowdsourcing Methods for Knowledge Acquisition : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-3C3D-F
%U urn:nbn:de:bsz:291-scidok-57948
%R 10.22028/D291-26564
%F OTHER: hdl:20.500.11880/26620
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2014
%P 116 p.
%V phd
%9 phd
%X Ambiguity, complexity, and diversity in natural language textual expressions <br>are major hindrances to automated knowledge extraction. As a result <br>state-of-the-art methods for extracting entities and relationships from <br>unstructured data make incorrect extractions or produce noise. With the advent <br>of human computing, computationally hard tasks have been addressed through <br>human inputs. While text-based knowledge acquisition can benefit from this <br>approach, humans alone cannot bear the burden of extracting knowledge from the <br>vast textual resources that exist today. Even making payments for crowdsourced <br>acquisition can quickly become prohibitively expensive.<br>In this thesis we present principled methods that effectively garner human <br>computing inputs for improving the extraction of knowledge-base facts from <br>natural language texts. Our methods complement automatic extraction techniques <br>with human computing to reap the benefits of both while overcoming each other&#65533;s <br>limitations. We present the architecture and implementation of HIGGINS, a <br>system that combines an information extraction (IE) engine with a human <br>computing (HC) engine to produce high quality facts. The IE engine combines <br>statistics derived from large Web corpora with semantic resources like WordNet <br>and ConceptNet to construct a large dictionary of entity and relational <br>phrases. It employs specifically designed statistical language models for <br>phrase relatedness to come up with questions and relevant candidate answers <br>that are presented to human workers. Through extensive experiments we establish <br>the superiority of this approach in extracting relation-centric facts from <br>text. In our experiments we extract facts about fictitious characters in <br>narrative text, where the issues of diversity and complexity in expressing <br>relations are far more pronounced. Finally, we also demonstrate how interesting <br>human computing games can be designed for knowledge acquisition tasks.
%U http://scidok.sulb.uni-saarland.de/volltexte/2014/5794/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Proceedings

M. Koubarakis, G. B. Stamou, G. Stoilos, I. Horrocks, P. G. Kolaitis, G. Lausen, and G. Weikum

Eds., Reasoning Web. Springer, 2014.

mehr

BibTeX

@proceedings{DBLP:conf/rweb/2014,
TITLE = {Reasoning Web},
EDITOR = {Koubarakis, Manolis and Stamou, Giorgos B. and Stoilos, Giorgos and Horrocks, Ian and Kolaitis, Phokion G. and Lausen, Georg and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-319-10586-4},
DOI = {10.1007/978-3-319-10587-1},
PUBLISHER = {Springer},
YEAR = {2014},
DATE = {2014},
PAGES = {X, 390 p.},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {8714},
ADDRESS = {Athens, Greece},
}

Endnote

%0 Conference Proceedings
%E Koubarakis, Manolis
%E Stamou, Giorgos B.
%E Stoilos, Giorgos
%E Horrocks, Ian
%E Kolaitis, Phokion G.
%E Lausen, Georg
%E Weikum, Gerhard
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Reasoning Web : Reasoning on the Web in the Big Data Era ; 10th International Summer School 2014, Athens, Greece, September 8-13, 2014. Proceedings
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6BD5-B
%@ 978-3-319-10586-4
%R 10.1007/978-3-319-10587-1
%I Springer
%D 2014
%B 10th Reasoning Web Summer School
%Z date of event: 2014-09-08 - 2014-09-13
%D 2014
%C Athens, Greece
%P X, 390 p.
%S Lecture Notes in Computer Science
%V 8714

Conference paper

D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos

“VoG: Summarizing and Understanding Large Graphs,” in 2014 SIAM International Conference on Data Mining (SDM 2014), Philadelphia, PA, USA, 2014.

mehr

BibTeX

@inproceedings{koutra:14:vog,
TITLE = {{VoG}: {Summarizing} and Understanding Large Graphs},
AUTHOR = {Koutra, Danai and Kang, U and Vreeken, Jilles and Faloutsos, Christos},
LANGUAGE = {eng},
ISBN = {978-1-61197-344-0},
DOI = {10.1137/1.9781611973440.11},
PUBLISHER = {SIAM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {2014 SIAM International Conference on Data Mining (SDM 2014)},
PAGES = {91--99},
ADDRESS = {Philadelphia, PA, USA},
}

Endnote

%0 Conference Proceedings
%A Koutra, Danai
%A Kang, U
%A Vreeken, Jilles
%A Faloutsos, Christos
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T VoG: Summarizing and Understanding Large Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53AF-A
%R 10.1137/1.9781611973440.11
%D 2014
%B SIAM International Conference on Data Mining
%Z date of event: 2014-04-24 - 2014-04-26
%C Philadelphia, PA, USA
%B 2014 SIAM International Conference on Data Mining
%P 91 - 99
%I SIAM
%@ 978-1-61197-344-0

Paper

D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos

“VoG: Summarizing and Understanding Large Graphs,” 2014. [Online]. Available: http://arxiv.org/abs/1406.3411.

mehr

Abstract

How can we succinctly describe a million-node graph with a few simple

sentences? How can we measure the "importance" of a set of discovered subgraphs

in a large graph? These are exactly the problems we focus on. Our main ideas

are to construct a "vocabulary" of subgraph-types that often occur in real

graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the

most succinct description of a graph in terms of this vocabulary. We measure

success in a well-founded way by means of the Minimum Description Length (MDL)

principle: a subgraph is included in the summary if it decreases the total

description length of the graph.

Our contributions are three-fold: (a) formulation: we provide a principled

encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop

\method, an efficient method to minimize the description cost, and (c)

applicability: we report experimental results on multi-million-edge real

graphs, including Flickr and the Notre Dame web graph.

BibTeX

@online{KoutraKangVreekenFaloutsosarXiv2014,
TITLE = {{VoG}: {Summarizing} and Understanding Large Graphs},
AUTHOR = {Koutra, Danai and Kang, U and Vreeken, Jilles and Faloutsos, Christos},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1406.3411},
EPRINT = {1406.3411},
EPRINTTYPE = {arXiv},
YEAR = {2014},
ABSTRACT = {How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph.},
}

Endnote

%0 Report
%A Koutra, Danai
%A Kang, U
%A Vreeken, Jilles
%A Faloutsos, Christos
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T VoG: Summarizing and Understanding Large Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-49A3-F
%U http://arxiv.org/abs/1406.3411
%D 2014
%X   How can we succinctly describe a million-node graph with a few simple
sentences? How can we measure the "importance" of a set of discovered subgraphs
in a large graph? These are exactly the problems we focus on. Our main ideas
are to construct a "vocabulary" of subgraph-types that often occur in real
graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the
most succinct description of a graph in terms of this vocabulary. We measure
success in a well-founded way by means of the Minimum Description Length (MDL)
principle: a subgraph is included in the summary if it decreases the total
description length of the graph.
  Our contributions are three-fold: (a) formulation: we provide a principled
encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop
\method, an efficient method to minimize the description cost, and (c)
applicability: we report experimental results on multi-million-edge real
graphs, including Flickr and the Notre Dame web graph.

%K cs.SI, Physics, Physics and Society, physics.soc-ph

Conference paper

E. Kuzey and G. Weikum

“EVIN: Building a Knowledge Base of Events,” in WWW’14 Companion, Seoul, Korea, 2014.

mehr

BibTeX

@inproceedings{ekuzeyWWW14,
TITLE = {{EVIN}: Building a Knowledge Base of Events},
AUTHOR = {Kuzey, Erdal and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2745-9},
URL = {http://dl.acm.org/citation.cfm?id=2577009},
DOI = {10.1145/2567948.2577009},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {WWW'14 Companion},
PAGES = {103--106},
ADDRESS = {Seoul, Korea},
}

Endnote

%0 Conference Proceedings
%A Kuzey, Erdal
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T EVIN: Building a Knowledge Base of Events : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-525B-C
%R 10.1145/2567948.2577009
%U http://dl.acm.org/citation.cfm?id=2577009
%D 2014
%B 23rd International Conference on World Wide Web
%Z date of event: 2014-04-07 - 2014-04-11
%C Seoul, Korea
%B WWW'14 Companion
%P 103 - 106
%I ACM
%@ 978-1-4503-2745-9

Conference paper

E. Kuzey, J. Vreeken, and G. Weikum

“A Fresh Look on Knowledge Bases: Distilling Named Events from News,” in CIKM’14, 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{ekuzeyCIKM14,
TITLE = {A Fresh Look on Knowledge Bases: Distilling Named Events from News},
AUTHOR = {Kuzey, Erdal and Vreeken, Jilles and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2598-1},
DOI = {10.1145/2661829.2661984},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {CIKM'14, 23rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Li, Jianzhong and Garofalakis, Minos and Soboroff, Ian and Suel, Torsten and Wang, Min},
PAGES = {1689--1698},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Kuzey, Erdal
%A Vreeken, Jilles
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Fresh Look on Knowledge Bases: Distilling Named Events from News : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5263-9
%R 10.1145/2661829.2661984
%D 2014
%B 23rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-07
%C Shanghai, China
%B CIKM'14
%E Li, Jianzhong; Wang, X. Sean; Garofalakis, Minos; Soboroff, Ian; Suel, Torsten; Wang, Min
%P 1689 - 1698
%I ACM
%@ 978-1-4503-2598-1

Conference paper

F. Mahdisoltani, J. Biega, and F. Suchanek

“YAGO3: A Knowledge Base from Multilingual Wikipedias,” in 7th Biennial Conference on Innovative Data Systems Research (CIDR 2015), Asilomar, CA, USA, 2014.

mehr

BibTeX

@inproceedings{Mahdisoltani:2015,
TITLE = {{YAGO}3: A Knowledge Base from Multilingual Wikipedias},
AUTHOR = {Mahdisoltani, Farzaneh and Biega, Joanna and Suchanek, Fabian},
LANGUAGE = {eng},
URL = {http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper1.pdf},
PUBLISHER = {CIDR Conference},
YEAR = {2015},
BOOKTITLE = {7th Biennial Conference on Innovative Data Systems Research (CIDR 2015)},
ADDRESS = {Asilomar, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Mahdisoltani, Farzaneh
%A Biega, Joanna
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T YAGO3: A Knowledge Base from Multilingual Wikipedias : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-501C-6
%U http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper1.pdf
%D 2014
%B 7th Biennial Conference on Innovative Data Systems Research
%Z date of event: 2015-01-04 - 2015-01-07
%C Asilomar, CA, USA
%B 7th Biennial Conference on Innovative Data Systems Research
%I CIDR Conference

Article

F. Makari, C. Teflioudi, R. Gemulla, P. Haas, and Y. Sismanis

“Shared-memory and Shared-nothing Stochastic Gradient Descent Algorithms for Matrix Completion,” Knowledge and Information Systems, vol. 42, no. 3, 2014.

mehr

BibTeX

@article{MakariTeflioudiGemulla2014,
TITLE = {Shared-memory and Shared-nothing Stochastic Gradient Descent Algorithms for Matrix Completion},
AUTHOR = {Makari, Faraz and Teflioudi, Christina and Gemulla, Rainer and Haas, Peter and Sismanis, Yannis},
LANGUAGE = {eng},
ISSN = {0219-1377},
DOI = {10.1007/s10115-013-0718-7},
PUBLISHER = {Springer},
ADDRESS = {London},
YEAR = {2014},
DATE = {2014},
JOURNAL = {Knowledge and Information Systems},
VOLUME = {42},
NUMBER = {3},
PAGES = {493--523},
}

Endnote

%0 Journal Article
%A Makari, Faraz
%A Teflioudi, Christina
%A Gemulla, Rainer
%A Haas, Peter
%A Sismanis, Yannis
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Shared-memory and Shared-nothing Stochastic Gradient Descent Algorithms for Matrix Completion : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4F57-9
%R 10.1007/s10115-013-0718-7
%7 2014-02-15
%D 2014
%J Knowledge and Information Systems
%V 42
%N 3
%& 493
%P 493 - 523
%I Springer
%C London
%@ false

Thesis

D5IMPR-CS

F. Makari Manshadi

“Scalable Optimization Algorithms for Recommender Systems,” Universität des Saarlandes, Saarbrücken, 2014.

mehr

BibTeX

@phdthesis{MakariManshadi2014,
TITLE = {Scalable Optimization Algorithms for Recommender Systems},
AUTHOR = {Makari Manshadi, Faraz},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-59221},
DOI = {10.22028/D291-26583},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2014},
DATE = {2014},
}

Endnote

%0 Thesis
%A Makari Manshadi, Faraz
%Y Gemulla, Rainer
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Scalable Optimization Algorithms for Recommender Systems : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-96AA-5
%R 10.22028/D291-26583
%U urn:nbn:de:bsz:291-scidok-59221
%F OTHER: hdl:20.500.11880/26639
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2014
%P 121 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2014/5922/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Thesis

D5IMPR-CS

S. Metzger

“User-centric Knowledge Extraction and Maintenance,” Universität des Saarlandes, Saarbrücken, 2014.

mehr

BibTeX

@phdthesis{Metzger2014,
TITLE = {User-centric Knowledge Extraction and Maintenance},
AUTHOR = {Metzger, Steffen},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-57632},
DOI = {10.22028/D291-26563},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2014},
DATE = {2014},
}

Endnote

%0 Thesis
%A Metzger, Steffen
%Y Schenkel, Ralf
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T User-centric Knowledge Extraction and Maintenance : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-96AE-E
%R 10.22028/D291-26563
%U urn:nbn:de:bsz:291-scidok-57632
%F OTHER: hdl:20.500.11880/26619
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2014
%P 230 p.
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2014/5763/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

S. Metzler and P. Miettinen

“Clustering Boolean Tensors,” in ECML/PKDD 2014 PhD Session Proceedings, Nancy, France, 2014.

mehr

BibTeX

@inproceedings{Metzler2014Clustering,
TITLE = {Clustering {Boolean} Tensors},
AUTHOR = {Metzler, Saskia and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {https://phdsession-ecmlpkdd2014.greyc.fr/sites/phdsession-ecmlpkdd2014.greyc.fr/files/papers/Paper_20692.pdf},
PUBLISHER = {University of Caen},
YEAR = {2014},
BOOKTITLE = {ECML/PKDD 2014 PhD Session Proceedings},
EDITOR = {Belohlavek, Radim and Cr{\'e}milleux, Bruno},
PAGES = {31--40},
ADDRESS = {Nancy, France},
}

Endnote

%0 Conference Proceedings
%A Metzler, Saskia
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Clustering Boolean Tensors : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C44-C
%U https://phdsession-ecmlpkdd2014.greyc.fr/sites/phdsession-ecmlpkdd2014.greyc.fr/files/papers/Paper_20692.pdf
%D 2014
%B The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2014-09-15 - 2014-09-19
%C Nancy, France
%B ECML/PKDD 2014 PhD Session Proceedings
%E Belohlavek, Radim; Cr&#233;milleux, Bruno
%P 31 - 40
%I University of Caen

Conference paper

P. Miettinen

“Interactive Data Mining Considered Harmful (If Done Wrong),” in Proceedings of the ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2014), New York, NY, USA, 2014.

mehr

Abstract

Interactive data mining can be a powerful tool for data analysis. But in this short opinion piece I argue that this power comes with new pitfalls that can undermine the value of interactive mining, if not properly addressed. Most notably, there is a serious risk that the user of powerful interactive data mining tools will only find the results she was expecting. The purpose of this piece is to raise awareness of this potential issue, stimulate discussion on it, and hopefully give rise to new research directions in addressing it.

BibTeX

@inproceedings{miettinen14interactive,
TITLE = {Interactive Data Mining Considered Harmful (If Done Wrong)},
AUTHOR = {Miettinen, Pauli},
LANGUAGE = {eng},
YEAR = {2014},
DATE = {2014-07},
ABSTRACT = {Interactive data mining can be a powerful tool for data analysis. But in this short opinion piece I argue that this power comes with new pitfalls that can undermine the value of interactive mining, if not properly addressed. Most notably, there is a serious risk that the user of powerful interactive data mining tools will only find the results she was expecting. The purpose of this piece is to raise awareness of this potential issue, stimulate discussion on it, and hopefully give rise to new research directions in addressing it.},
BOOKTITLE = {Proceedings of the ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2014)},
EDITOR = {Chau, Polo and Vreeken, Jilles and van Leeuwen, Matthijs and Faloutsos, Christos},
PAGES = {85--87},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Interactive Data Mining Considered Harmful (If Done Wrong) : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5567-9
%D 2014
%B ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics
%Z date of event: 2014-08-24 - 2014-08-24
%C New York, NY, USA
%X Interactive data mining can be a powerful tool for data analysis. But in this short opinion piece I argue that this power comes with new pitfalls that can undermine the value of interactive mining, if not properly addressed. Most notably, there is a serious risk that the user of powerful interactive data mining tools will only find the results she was expecting. The purpose of this piece is to raise awareness of this potential issue, stimulate discussion on it, and hopefully give rise to new research directions in addressing it.
%B Proceedings of the ACM SIGKDD 2014 Full-day Workshop on Interactive Data Exploration and Analytics
%E Chau, Polo; Vreeken, Jilles; van Leeuwen, Matthijs; Faloutsos, Christos
%P 85 - 87
%U http://poloclub.gatech.edu/idea2014/papers/p85-miettinen.pdf

Article

P. Miettinen and J. Vreeken

“MDL4BMF: Minimum Description Length for Boolean Matrix Factorization,” ACM Transactions on Knowledge Discovery from Data, vol. 8, no. 4, Oct. 2014.

mehr

BibTeX

@article{miettinen14mdl4bmf,
TITLE = {{MDL4BMF}: {Minimum} {D}escription {L}ength for {Boolean} {M}atrix {F}actorization},
AUTHOR = {Miettinen, Pauli and Vreeken, Jilles},
LANGUAGE = {eng},
DOI = {10.1145/2601437},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2014},
DATE = {2014-10},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {8},
NUMBER = {4},
PAGES = {1--31},
EID = {18},
}

Endnote

%0 Journal Article
%A Miettinen, Pauli
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T MDL4BMF: Minimum Description Length for Boolean Matrix Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4980-E
%R 10.1145/2601437
%7 2014
%D 2014
%J ACM Transactions on Knowledge Discovery from Data
%V 8
%N 4
%& 1
%P 1 - 31
%Z sequence number: 18
%I ACM
%C New York, NY
%U http://dl.acm.org/citation.cfm?id=2663597.2601437

Conference paper

D. Milchevski and K. Berberich

“X-REC: Cross-category Entity Recommendation,” in Proceedings of the 5th Information Interaction in Context Conference (IIiX 2014), Regensburg, Germany, 2014.

mehr

BibTeX

@inproceedings{DBLP:conf/iiix/MilchevskiB14,
TITLE = {{X-REC}: Cross-category Entity Recommendation},
AUTHOR = {Milchevski, Dragan and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-1-4503-2976-7},
DOI = {10.1145/2637002.2637049},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Proceedings of the 5th Information Interaction in Context Conference (IIiX 2014)},
EDITOR = {Elsweiler, David and Ludwig, Bernd and Azzopardi, Leif and Wilson, Max L.},
PAGES = {308--311},
ADDRESS = {Regensburg, Germany},
}

Endnote

%0 Conference Proceedings
%A Milchevski, Dragan
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T X-REC: Cross-category Entity Recommendation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5430-B
%R 10.1145/2637002.2637049
%D 2014
%B  5th Information Interaction in Context Conference
%Z date of event: 2014-08-26 - 2014-08-29
%C Regensburg, Germany
%B Proceedings of the 5th Information Interaction in Context Conference
%E Elsweiler, David; Ludwig, Bernd; Azzopardi, Leif; Wilson, Max L.
%P 308 - 311
%I ACM
%@ 978-1-4503-2976-7

Conference paper

A. Mishra

“Linking Today’s Wikipedia and News from the Past,” in PIKM’14, 7th PhD Workshop in Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{Mishra:2014:LTW:2663714.2668048,
TITLE = {Linking Today's {Wikipedia} and News from the Past},
AUTHOR = {Mishra, Arunav},
LANGUAGE = {eng},
ISBN = {978-1-4503-1481-7},
DOI = {10.1145/2663714.2668048},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {PIKM'14, 7th PhD Workshop in Information and Knowledge Management},
EDITOR = {de Melo, Gerard and Kacimi, Mouna and Varde, Aparna S.},
PAGES = {1--8},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Linking Today's Wikipedia and News from the Past : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6C6E-D
%R 10.1145/2663714.2668048
%D 2014
%B 7th PhD Workshop in Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-07
%C Shanghai, China
%K events, linking, time-aware language model, wikipedia
%B PIKM'14
%E de Melo, Gerard; Kacimi, Mouna; Varde, Aparna S.
%P 1 - 8
%I ACM
%@ 978-1-4503-1481-7

Conference paper

A. Mishra, D. Milchevski, and K. Berberich

“Linking Wikipedia Events to Past News,” in SIGIR 2014 Workshop on Temporal, Social and Spatially-aware Information Access (TAIA 2014), Gold Coast, Australia, 2014.

mehr

BibTeX

@inproceedings{Mishra2014a,
TITLE = {Linking {Wikipedia} Events to Past News},
AUTHOR = {Mishra, Arunav and Milchevski, Dragan and Berberich, Klaus},
LANGUAGE = {eng},
URL = {http://research.microsoft.com/en-US/people/milads/taia2014-mishra.pdf},
PUBLISHER = {Microsoft Research},
YEAR = {2014},
BOOKTITLE = {SIGIR 2014 Workshop on Temporal, Social and Spatially-aware Information Access (TAIA 2014)},
PAGES = {1--4},
ADDRESS = {Gold Coast, Australia},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Milchevski, Dragan
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Linking Wikipedia Events to Past News : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-3C35-0
%U http://research.microsoft.com/en-US/people/milads/taia2014-mishra.pdf
%D 2014
%B SIGIR 2014 Workshop on Temporal, Social and Spatially-aware Information Access
%Z date of event: 2014-07-11 - 2014-07-11
%C Gold Coast, Australia
%B SIGIR 2014 Workshop on Temporal, Social and Spatially-aware Information Access
%P 1 - 4
%I Microsoft Research

Conference paper

S. Mukherjee, J. Ajmera, and S. Joshi

“Unsupervised Approach for Shallow Domain Ontology Construction from Corpus,” in WWW’14 Companion, Seoul, Korea, 2014.

mehr

BibTeX

@inproceedings{Mukherjee:2014:DCU,
TITLE = {Unsupervised Approach for Shallow Domain Ontology Construction from Corpus},
AUTHOR = {Mukherjee, Subhabrata and Ajmera, Jitendra and Joshi, Sachindra},
LANGUAGE = {eng},
ISBN = {978-1-4503-2745-9},
URL = {http://dl.acm.org/citation.cfm?id=2577350},
DOI = {10.1145/2567948.2577021},
PUBLISHER = {ACM},
YEAR = {2014},
BOOKTITLE = {WWW'14 Companion},
PAGES = {349--350},
ADDRESS = {Seoul, Korea},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Ajmera, Jitendra
%A Joshi, Sachindra
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Unsupervised Approach for Shallow Domain Ontology Construction from Corpus : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4FFD-6
%R 10.1145/2567948.2577021
%U http://dl.acm.org/citation.cfm?id=2577350
%D 2014
%B 23rd International Conference on World Wide Web
%Z date of event: 2014-04-07 - 2014-04-11
%C Seoul, Korea
%B WWW'14 Companion
%P 349 - 350
%I ACM
%@ 978-1-4503-2745-9

Conference paper

S. Mukherjee and S. Jos

“Author-Specific Sentiment Aggregation for Polarity Prediction of Reviews,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland, 2014.

mehr

BibTeX

@inproceedings{Mukherjee:2014:PASOT,
TITLE = {Author-Specific Sentiment Aggregation for Polarity Prediction of Reviews},
AUTHOR = {Mukherjee, Subhabrata and Jos, Sachindra},
LANGUAGE = {eng},
ISBN = {978-2-9517408-8-4},
PUBLISHER = {ELRA},
YEAR = {2014},
BOOKTITLE = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)},
PAGES = {3092--3099},
ADDRESS = {Reykjavik, Iceland},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Jos, Sachindra
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Author-Specific Sentiment Aggregation for Polarity Prediction of Reviews : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4FF7-1
%D 2014
%B Ninth International Conference on Language Resources and Evaluation
%Z date of event: 2014-05-26 - 2014-05-31
%C Reykjavik, Iceland
%B Proceedings of the Ninth International Conference on Language Resources and Evaluation
%P 3092 - 3099
%I ELRA
%@ 978-2-9517408-8-4
%U http://www.lrec-conf.org/proceedings/lrec2014/pdf/467_Paper.pdf

Conference paper

S. Mukherjee and S. Joshi

“Help Yourself: A Virtual Self-assist System,” in WWW’14 Companion, Seoul, Korea, 2014.

mehr

BibTeX

@inproceedings{Mukherjee:2014:SelfAssist,
TITLE = {Help Yourself: A Virtual Self-assist System},
AUTHOR = {Mukherjee, Subhabrata and Joshi, Sachindra},
LANGUAGE = {eng},
ISBN = {978-1-4503-2745-9},
URL = {http://dl.acm.org/citation.cfm?id=2577021},
DOI = {10.1145/2567948.2577021},
PUBLISHER = {ACM},
YEAR = {2014},
BOOKTITLE = {WWW'14 Companion},
PAGES = {171--174},
ADDRESS = {Seoul, Korea},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Joshi, Sachindra
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Help Yourself: A Virtual Self-assist System : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5007-5
%R 10.1145/2567948.2577021
%U http://dl.acm.org/citation.cfm?id=2577021
%D 2014
%B 23rd International Conference on World Wide Web
%Z date of event: 2014-04-07 - 2014-04-11
%C Seoul, Korea
%B WWW'14 Companion
%P 171 - 174
%I ACM
%@ 978-1-4503-2745-9

Conference paper

S. Mukherjee, G. Basu, and S. Joshi

“42 - Joint Author Sentiment Topic Model,” in 2014 SIAM International Conference on Data Mining (SDM 2014), Philadelphia, PA, USA, 2014.

mehr

BibTeX

@inproceedings{Mukherjee:2014:JAST,
TITLE = {42 -- Joint Author Sentiment Topic Model},
AUTHOR = {Mukherjee, Subhabrata and Basu, Gaurab and Joshi, Sachindra},
LANGUAGE = {eng},
ISBN = {978-1-61197-344-0},
DOI = {10.1137/1.9781611973440.43},
PUBLISHER = {SIAM},
YEAR = {2014},
BOOKTITLE = {2014 SIAM International Conference on Data Mining (SDM 2014)},
PAGES = {370--378},
ADDRESS = {Philadelphia, PA, USA},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Basu, Gaurab
%A Joshi, Sachindra
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T 42 - Joint Author Sentiment Topic Model : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4F9D-B
%R 10.1137/1.9781611973440.43
%D 2014
%B SIAM International Conference on Data Mining
%Z date of event: 2014-04-24 - 2014-04-26
%C Philadelphia, PA, USA
%B 2014 SIAM International Conference on Data Mining
%P 370 - 378
%I SIAM
%@ 978-1-61197-344-0

Conference paper

S. Mukherjee, J. Ajmera, and S. Joshi

“Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus,” in CIKM’14, 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 2014.

mehr

BibTeX

@inproceedings{Mukherjee:2014:DomainCartridge,
TITLE = {Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus},
AUTHOR = {Mukherjee, Subhabrata and Ajmera, Jitendra and Joshi, Sachindra},
LANGUAGE = {eng},
ISBN = {978-1-4503-2598-1},
DOI = {10.1145/2661829.2662087},
PUBLISHER = {ACM},
YEAR = {2014},
BOOKTITLE = {CIKM'14, 23rd ACM International Conference on Information and Knowledge Management},
EDITOR = {Li, Jianzhong and Wang, X. Sean and Garofalakis, Minos and Soboroff, Ian and Suel, Torsten and Wang, Min},
PAGES = {929--938},
ADDRESS = {Shanghai, China},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Ajmera, Jitendra
%A Joshi, Sachindra
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4FB1-C
%R 10.1145/2661829.2662087
%D 2014
%8 03.11.2014
%B 23rd ACM International Conference on Information and Knowledge Management
%Z date of event: 2014-11-03 - 2014-11-07
%C Shanghai, China
%B CIKM'14
%E Li, Jianzhong; Wang, X. Sean; Garofalakis, Minos; Soboroff, Ian; Suel, Torsten; Wang, Min
%P 929 - 938
%I ACM
%@ 978-1-4503-2598-1

Conference paper

S. Mukherjee, G. Weikum, and C. Danescu-Niculescu-Mizil

“People on Drugs: Credibility of User Statements in Health Communities,” in KDD’14, 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2014.

mehr

BibTeX

@inproceedings{Mukherjee:2014:PeopleOnDrugs,
TITLE = {People on Drugs: Credibility of User Statements in Health Communities},
AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard and Danescu-Niculescu-Mizil, Cristian},
LANGUAGE = {eng},
DOI = {10.1145/2623330.2623714},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {KDD'14, 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
EDITOR = {Macskassy, Sofus A. and Perlich, Claudia and Lescovec, Jure and Wang, Wei and Ghani, Rayid},
PAGES = {65--74},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Mukherjee, Subhabrata
%A Weikum, Gerhard
%A Danescu-Niculescu-Mizil, Cristian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Group C. Danescu-Niculescu-Mizil, Max Planck Institute for Software Systems, Max Planck Society
%T People on Drugs: Credibility of User Statements in Health Communities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4FF9-E
%R 10.1145/2623330.2623714
%D 2014
%B 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 
%Z date of event: 2014-08-24 - 2014-08-27
%C New York, NY, USA
%B KDD'14
%E Macskassy, Sofus A.; Perlich, Claudia; Lescovec, Jure; Wang, Wei; Ghani, Rayid
%P 65 - 74
%I ACM

Conference paper

D. B. Nguyen, J. Hoffart, M. Theobald, and G. Weikum

“AIDA-light: High-throughput Named-entity Disambiguation,” in Linked Data on the Web (LDOW 2014), Seoul, Korea, 2014.

mehr

BibTeX

@inproceedings{Nguyen:2014wl,
TITLE = {{AIDA}--light: High-Throughput Named-entity Disambiguation},
AUTHOR = {Nguyen, Dat Ba and Hoffart, Johannes and Theobald, Martin and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-1184/ldow2014_paper_03.pdf},
PUBLISHER = {CEUR-WS.org},
YEAR = {2014},
BOOKTITLE = {Linked Data on the Web (LDOW 2014)},
DEBUG = {author: Berner-Lee, Tim},
EDITOR = {Bizer, Christian and Heath, Tom and Auer, S{\"o}ren},
PAGES = {1--10},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1184},
ADDRESS = {Seoul, Korea},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Dat Ba
%A Hoffart, Johannes
%A Theobald, Martin
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T AIDA-light: High-throughput Named-entity Disambiguation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5162-2
%U http://ceur-ws.org/Vol-1184/ldow2014_paper_03.pdf
%D 2014
%B Workshop on Linked Data on the Web 2014
%Z date of event: 2014-04-08 - 2014-04-08
%C Seoul, Korea
%B Linked Data on the Web
%E Bizer, Christian; Heath, Tom; Auer, S&#246;ren; Berner-Lee, Tim
%P 1 - 10
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1184
%@ false
%U http://ceur-ws.org/Vol-1184/ldow2014_paper_03.pdf

Conference paper

H.-V. Nguyen, E. Müller, J. Vreeken, and K. Böhm

“Multivariate Maximal Correlation Analysis,” in Proceedings of The 31st International Conference on Machine Learning (ICML 2014), Beijing, China, 2014.

mehr

BibTeX

@inproceedings{nguyen:14:mac,
TITLE = {Multivariate Maximal Correlation Analysis},
AUTHOR = {Nguyen, Hoang-Vu and M{\"u}ller, Emmanuel and Vreeken, Jilles and B{\"o}hm, Klemens},
LANGUAGE = {eng},
ISSN = {1938-7228},
URL = {http://jmlr.csail.mit.edu/proceedings/papers/v32/nguyenc14.pdf},
PUBLISHER = {JMLR},
YEAR = {2014},
BOOKTITLE = {Proceedings of The 31st International Conference on Machine Learning (ICML 2014)},
EDITOR = {Xing, Eric P. and Jebara, Tony},
PAGES = {775--783},
SERIES = {JMLR Workshop and Conference Proceedings},
VOLUME = {32},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Nguyen, Hoang-Vu
%A M&#252;ller, Emmanuel
%A Vreeken, Jilles
%A B&#246;hm, Klemens
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Multivariate Maximal Correlation Analysis : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53A7-9
%U http://jmlr.csail.mit.edu/proceedings/papers/v32/nguyenc14.pdf
%D 2014
%B 31st International Conference on Machine Learning
%Z date of event: 2014-06-21 - 2014-06-26
%C Beijing, China
%B Proceedings of The 31st International Conference on Machine Learning
%E Xing, Eric P.; Jebara, Tony
%P 775 - 783
%I JMLR
%B JMLR Workshop and Conference Proceedings
%N 32
%@ false
%U http://jmlr.csail.mit.edu/proceedings/papers/v32/nguyenc14.pdf

Article

H.-V. Nguyen, E. Müller, J. Vreeken, and K. Böhm

“Unsupervised Interaction-preserving Discretization of Multivariate Data,” Data Mining and Knowledge Discovery, vol. 28, no. 5–6, 2014.

mehr

BibTeX

@article{nguyen:14:unsupervised,
TITLE = {Unsupervised Interaction-preserving Discretization of Multivariate Data},
AUTHOR = {Nguyen, Hoang-Vu and M{\"u}ller, Emmanuel and Vreeken, Jilles and B{\"o}hm, Klemens},
LANGUAGE = {eng},
DOI = {10.1007/s10618-014-0350-5},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2014},
DATE = {2014},
JOURNAL = {Data Mining and Knowledge Discovery},
VOLUME = {28},
NUMBER = {5-6},
PAGES = {1366--1397},
}

Endnote

%0 Journal Article
%A Nguyen, Hoang-Vu
%A M&#252;ller, Emmanuel
%A Vreeken, Jilles
%A B&#246;hm, Klemens
%+ Karlsruhe Institute of Technology
Karlsruhe Institute of Technology
Databases and Information Systems, MPI for Informatics, Max Planck Society
Karlsruhe Institute of Technology
%T Unsupervised Interaction-preserving Discretization of Multivariate Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-49A7-7
%R 10.1007/s10618-014-0350-5
%7 2014-04-04
%D 2014
%J Data Mining and Knowledge Discovery
%V 28
%N 5-6
%& 1366
%P 1366 - 1397
%I Springer
%C New York, NY

Conference paper

K. Panev and K. Berberich

“Phrase Queries with Inverted + Direct Indexes,” in Web Information Systems Engineering - WISE 2014, Thessaloniki, Greece, 2014, vol. 8786.

mehr

BibTeX

@inproceedings{DBLP:conf/wise/PanevB14,
TITLE = {Phrase Queries with Inverted + Direct Indexes},
AUTHOR = {Panev, Kiril and Berberich, Klaus},
LANGUAGE = {eng},
ISBN = {978-3-319-11748-5},
DOI = {10.1007/978-3-319-11749-2_13},
PUBLISHER = {Springer},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Web Information Systems Engineering -- WISE 2014},
EDITOR = {Benatallah, Boualem and Bestavros, Azer and Manolopoulos, Yannis and Vakali, Athena and Zhang, Yanchun},
VOLUME = {8786},
PAGES = {156--169},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {8786},
ADDRESS = {Thessaloniki, Greece},
}

Endnote

%0 Conference Proceedings
%A Panev, Kiril
%A Berberich, Klaus
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Phrase Queries with Inverted + Direct Indexes : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53C6-1
%R 10.1007/978-3-319-11749-2_13
%D 2014
%B 15th International Conference on Web Information Systems Engineering
%Z date of event: 2014-10-12 - 2014-10-14
%C Thessaloniki, Greece
%B Web Information Systems Engineering - WISE 2014
%E Benatallah, Boualem; Bestavros, Azer; Manolopoulos, Yannis; Vakali, Athena; Zhang, Yanchun
%V 8786
%P 156 - 169
%I Springer
%@ 978-3-319-11748-5
%B Lecture Notes in Computer Science
%N 8786
%U http://dx.doi.org/10.1007/978-3-319-11749-2_13

Article

B. A. Prakash, J. Vreeken, and C. Faloutsos

“Efficiently Spotting the Starting Points of an Epidemic in a Large Graph,” Knowledge and Information Systems, vol. 38, no. 1, 2014.

mehr

BibTeX

@article{prakash:14:culprits,
TITLE = {Efficiently Spotting the Starting Points of an Epidemic in a Large Graph},
AUTHOR = {Prakash, B. Aditya and Vreeken, Jilles and Faloutsos, Christos},
LANGUAGE = {eng},
ISSN = {0219-1377},
DOI = {10.1007/s10115-013-0671-5},
PUBLISHER = {Springer},
ADDRESS = {London},
YEAR = {2014},
DATE = {2014},
JOURNAL = {Knowledge and Information Systems},
VOLUME = {38},
NUMBER = {1},
PAGES = {35--59},
}

Endnote

%0 Journal Article
%A Prakash, B. Aditya
%A Vreeken, Jilles
%A Faloutsos, Christos
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Efficiently Spotting the Starting Points of an Epidemic in a Large Graph : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53B3-D
%R 10.1007/s10115-013-0671-5
%7 2013-07-17
%D 2014
%J Knowledge and Information Systems
%V 38
%N 1
%& 35
%P 35 - 59
%I Springer
%C London
%@ false

Article

L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, and G. Weikum

“Senti-LSSVM: Sentiment-oriented Multi-relation Extraction with Latent structural SVM,” Transactions of the Association for Computational Linguistics (Proc. ACL 2014), vol. 2, 2014.

mehr

BibTeX

@article{Gemullaacl2014,
TITLE = {Senti-{LSSVM}: {S}entiment-oriented Multi-relation Extraction with Latent structural {SVM}},
AUTHOR = {Qu, Lizhen and Zhang, Yi and Wang, Rui and Jiang, Lili and Gemulla, Rainer and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {2307-387X},
PUBLISHER = {ACL},
ADDRESS = {Stroudsburg, PA},
YEAR = {2014},
JOURNAL = {Transactions of the Association for Computational Linguistics (Proc. ACL)},
VOLUME = {2},
PAGES = {155--164},
BOOKTITLE = {The 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)},
}

Endnote

%0 Journal Article
%A Qu, Lizhen
%A Zhang, Yi
%A Wang, Rui
%A Jiang, Lili
%A Gemulla, Rainer
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Senti-LSSVM: Sentiment-oriented Multi-relation Extraction with Latent structural SVM : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6AF0-6
%7 2014
%D 2014
%J Transactions of the Association for Computational Linguistics
%O TACL
%V 2
%& 155
%P 155 - 164
%I ACL
%C Stroudsburg, PA
%@ false
%B The 52nd Annual Meeting of the Association for Computational Linguistics
%O ACL 2014

Paper

D5D2

L. Qu and B. Andres

“Estimating Maximally Probable Constrained Relations by Mathematical Programming,” 2014. [Online]. Available: http://arxiv.org/abs/1408.0838.

mehr

Abstract

Estimating a constrained relation is a fundamental problem in machine

learning. Special cases are classification (the problem of estimating a map

from a set of to-be-classified elements to a set of labels), clustering (the

problem of estimating an equivalence relation on a set) and ranking (the

problem of estimating a linear order on a set). We contribute a family of

probability measures on the set of all relations between two finite, non-empty

sets, which offers a joint abstraction of multi-label classification,

correlation clustering and ranking by linear ordering. Estimating (learning) a

maximally probable measure, given (a training set of) related and unrelated

pairs, is a convex optimization problem. Estimating (inferring) a maximally

probable relation, given a measure, is a 01-linear program. It is solved in

linear time for maps. It is NP-hard for equivalence relations and linear

orders. Practical solutions for all three cases are shown in experiments with

real data. Finally, estimating a maximally probable measure and relation

jointly is posed as a mixed-integer nonlinear program. This formulation

suggests a mathematical programming approach to semi-supervised learning.

BibTeX

@online{qu-2014,
TITLE = {Estimating Maximally Probable Constrained Relations by Mathematical Programming},
AUTHOR = {Qu, Lizhen and Andres, Bjoern},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1408.0838},
EPRINT = {1408.0838},
EPRINTTYPE = {arXiv},
YEAR = {2014},
ABSTRACT = {Estimating a constrained relation is a fundamental problem in machine learning. Special cases are classification (the problem of estimating a map from a set of to-be-classified elements to a set of labels), clustering (the problem of estimating an equivalence relation on a set) and ranking (the problem of estimating a linear order on a set). We contribute a family of probability measures on the set of all relations between two finite, non-empty sets, which offers a joint abstraction of multi-label classification, correlation clustering and ranking by linear ordering. Estimating (learning) a maximally probable measure, given (a training set of) related and unrelated pairs, is a convex optimization problem. Estimating (inferring) a maximally probable relation, given a measure, is a 01-linear program. It is solved in linear time for maps. It is NP-hard for equivalence relations and linear orders. Practical solutions for all three cases are shown in experiments with real data. Finally, estimating a maximally probable measure and relation jointly is posed as a mixed-integer nonlinear program. This formulation suggests a mathematical programming approach to semi-supervised learning.},
}

Endnote

%0 Report
%A Qu, Lizhen
%A Andres, Bjoern
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society
%T Estimating Maximally Probable Constrained Relations by Mathematical Programming :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-D324-6
%U http://arxiv.org/abs/1408.0838
%D 2014
%8 04.08.2014
%X Estimating a constrained relation is a fundamental problem in machine
learning. Special cases are classification (the problem of estimating a map
from a set of to-be-classified elements to a set of labels), clustering (the
problem of estimating an equivalence relation on a set) and ranking (the
problem of estimating a linear order on a set). We contribute a family of
probability measures on the set of all relations between two finite, non-empty
sets, which offers a joint abstraction of multi-label classification,
correlation clustering and ranking by linear ordering. Estimating (learning) a
maximally probable measure, given (a training set of) related and unrelated
pairs, is a convex optimization problem. Estimating (inferring) a maximally
probable relation, given a measure, is a 01-linear program. It is solved in
linear time for maps. It is NP-hard for equivalence relations and linear
orders. Practical solutions for all three cases are shown in experiments with
real data. Finally, estimating a maximally probable measure and relation
jointly is posed as a mixed-integer nonlinear program. This formulation
suggests a mathematical programming approach to semi-supervised learning.

%K Computer Science, Learning, cs.LG,Computer Science, Numerical Analysis, cs.NA,Mathematics, Optimization and Control, math.OC,Statistics, Machine Learning, stat.ML

Article

P. Roy, J. Teubner, and R. Gemulla

“Low-latency Handshake Join,” Proceedings of the VLDB Endowment (Proc. VLDB 2014), vol. 7, no. 9, 2014.

mehr

BibTeX

@article{GemullaVLDB2014,
TITLE = {Low-latency Handshake Join},
AUTHOR = {Roy, Pratanu and Teubner, Jens and Gemulla, Rainer},
LANGUAGE = {eng},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2014},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {7},
NUMBER = {9},
PAGES = {709--720},
BOOKTITLE = {Proceedings of the 40th International Conference on Very Large Data Bases (VLDB 2014)},
EDITOR = {Jagadish, H. V. and Zhou, Aoying},
}

Endnote

%0 Journal Article
%A Roy, Pratanu
%A Teubner, Jens
%A Gemulla, Rainer
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Low-latency Handshake Join : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6AFF-8
%7 2014
%D 2014
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 7
%N 9
%& 709
%P 709 - 720
%I ACM
%C New York, NY
%B Proceedings of the 40th International Conference on Very Large Data Bases

%O VLDB 2014 Hangzhou, China, September 1st - 5th

Article

F. M. Suchanek and G. Weikum

“Knowledge Bases in the Age of Big Data Analytics,” Proceedings of the VLDB Endowment (Proc. VLDB 2014), vol. 7, no. 13, 2014.

mehr

BibTeX

@article{DBLP:journals/pvldb/SuchanekW14,
TITLE = {Knowledge Bases in the Age of Big Data Analytics},
AUTHOR = {Suchanek, Fabian M. and Weikum, Gerhard},
LANGUAGE = {eng},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2014},
DATE = {2014},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {7},
NUMBER = {13},
PAGES = {1713--1714},
BOOKTITLE = {Proceedings of the 40th International Conference on Very Large Data Bases (VLDB 2014)},
EDITOR = {Jagadish, H. V. and Zhou, Aoying},
}

Endnote

%0 Journal Article
%A Suchanek, Fabian M.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Bases in the Age of Big Data Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-6B2A-F
%7 2014
%D 2014
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 7
%N 13
%& 1713
%P 1713 - 1714
%I ACM
%C New York, NY
%B Proceedings of the 40th International Conference on Very Large Data Bases
%O VLDB 2014 Hangzhou, China, September 1st - 5th
%U http://www.vldb.org/pvldb/vol7/p1713-suchanek.pdf

Conference paper

N. Tandon, G. de Melo, F. M. Suchanek, and G. Weikum

“WebChild: Harvesting and Organizing Commonsense Knowledge from the Web,” in WSDM’14, 7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 2014.

mehr

BibTeX

@inproceedings{Tandon2013,
TITLE = {{WebChild}: Harvesting and Organizing Commonsense Knowledge from the Web},
AUTHOR = {Tandon, Niket and de Melo, Gerard and Suchanek, Fabian M. and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2351-2},
DOI = {10.1145/2556195.2556245},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {WSDM'14, 7th ACM International Conference on Web Search and Data Mining},
PAGES = {523--532},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Tandon, Niket
%A de Melo, Gerard
%A Suchanek, Fabian M.
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T WebChild: Harvesting and Organizing Commonsense Knowledge from the Web : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0019-84C4-7
%R 10.1145/2556195.2556245
%D 2014
%B 7th ACM International Conference on Web Search and Data Mining
%Z date of event: 2014-04-24 - 2014-04-28
%C New York, NY, USA
%B WSDM'14
%P 523 - 532
%I ACM
%@ 978-1-4503-2351-2

Conference paper

N. Tandon, G. de Melo, and G. Weikum

“Acquiring Comparative Commonsense Knowledge from the Web,” in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence and the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, Québec City, Québec, Canada, 2014.

mehr

BibTeX

@inproceedings{DBLP:conf/aaai/TandonMW14,
TITLE = {Acquiring Comparative Commonsense Knowledge from the Web},
AUTHOR = {Tandon, Niket and de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-57735-661-5},
PUBLISHER = {AAAI Press},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence and the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference},
EDITOR = {Brodley, Carla E. and Stone, Peter},
PAGES = {166--172},
ADDRESS = {Qu{\'e}bec City, Qu{\'e}bec, Canada},
}

Endnote

%0 Conference Proceedings
%A Tandon, Niket
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Acquiring Comparative Commonsense Knowledge from the Web : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-49A0-6
%D 2014
%B Twenty-Eighth AAAI Conference on Artificial Intelligence
%Z date of event: 2014-07-27 - 2014-07-31
%C Qu&#233;bec City, Qu&#233;bec, Canada
%B Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence and the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference
%E Brodley, Carla E.; Stone, Peter
%P 166 - 172
%I AAAI Press
%@ 978-1-57735-661-5
%U http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8649

Conference paper

T. Tylenda, S. K. Kondreddi, and G. Weikum

“Spotting Knowledge Base Facts in Web Texts,” in AKBC 2014, 4th Workshop on Automated Knowledge Base Construction, Montreal, Canada, 2014.

mehr

BibTeX

@inproceedings{TylendaKW2014,
TITLE = {Spotting Knowledge Base Facts in Web Texts},
AUTHOR = {Tylenda, Tomasz and Kondreddi, Sarath Kumar and Weikum, Gerhard},
LANGUAGE = {eng},
PUBLISHER = {AKBC Board},
YEAR = {2014},
BOOKTITLE = {AKBC 2014, 4th Workshop on Automated Knowledge Base Construction},
ADDRESS = {Montreal, Canada},
}

Endnote

%0 Conference Proceedings
%A Tylenda, Tomasz
%A Kondreddi, Sarath Kumar
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Spotting Knowledge Base Facts in Web Texts : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-689C-7
%D 2014
%B 4th Workshop on Automated Knowledge Base Construction
%Z date of event: 2014-12-13 - 2014-12-13
%C Montreal, Canada
%B AKBC 2014
%I AKBC Board
%U http://www.akbc.ws/2014/submissions/akbc2014_submission_8.pdf

Conference paper

T. Tylenda, Y. Wang, and G. Weikum

“Spotting Facts in the Wild,” in Workshop on Automatic Creation and Curation of Knowledge Bases at SIGMOD (WACCK 2014), Snowbird, UT, USA, 2014.

mehr

BibTeX

@inproceedings{TylendaWW2014,
TITLE = {Spotting Facts in the Wild},
AUTHOR = {Tylenda, Tomasz and Wang, Yafang and Weikum, Gerhard},
LANGUAGE = {eng},
YEAR = {2014},
BOOKTITLE = {Workshop on Automatic Creation and Curation of Knowledge Bases at SIGMOD (WACCK 2014)},
ADDRESS = {Snowbird, UT, USA},
}

Endnote

%0 Conference Proceedings
%A Tylenda, Tomasz
%A Wang, Yafang
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Spotting Facts in the Wild : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-68A8-B
%D 2014
%B Workshop on Automatic Creation and Curation of Knowledge Bases 
%Z date of event: 2014-06-27 - 2014-06-27
%C Snowbird, UT, USA
%B Workshop on Automatic Creation and Curation of Knowledge Bases at SIGMOD

Book chapter / section

M. van Leeuwen and J. Vreeken

“Mining and Using Sets of Patterns through Compression,” in Frequent Pattern Mining, New York, NY: Springer, 2014.

mehr

BibTeX

@incollection{leeuwen:14:compression,
TITLE = {Mining and Using Sets of Patterns through Compression},
AUTHOR = {van Leeuwen, Matthijs and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-319-07820-5},
DOI = {10.1007/978-3-319-07821-2_8},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Frequent Pattern Mining},
EDITOR = {Aggarwal, Charu C. and Han, Jiawei},
PAGES = {165--198},
}

Endnote

%0 Book Section
%A van Leeuwen, Matthijs
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mining and Using Sets of Patterns through Compression : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53BB-E
%R 10.1007/978-3-319-07821-2_8
%D 2014
%B Frequent Pattern Mining
%E Aggarwal, Charu C.; Han, Jiawei
%P 165 - 198
%I Springer
%C New York, NY
%@ 978-3-319-07820-5

Book chapter / section

J. Vreeken and N. Tatti

“Interesting Patterns,” in Frequent Pattern Mining, New York, NY: Springer, 2014.

mehr

BibTeX

@incollection{vreeken:14:interesting,
TITLE = {Interesting Patterns},
AUTHOR = {Vreeken, Jilles and Tatti, Nikolaj},
LANGUAGE = {eng},
ISBN = {978-3-319-07820-5},
DOI = {10.1007/978-3-319-07821-2_5},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Frequent Pattern Mining},
EDITOR = {Aggarwal, Charu C. and Han, Jiawei},
PAGES = {105--134},
}

Endnote

%0 Book Section
%A Vreeken, Jilles
%A Tatti, Nikolaj
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Interesting Patterns : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53B9-1
%R 10.1007/978-3-319-07821-2_5
%D 2014
%K Pattern mining; Interestingness measures; Statistics; Ranking; Pattern set mining
%B Frequent Pattern Mining
%E Aggarwal, Charu C.; Han, Jiawei
%P 105 - 134
%I Springer
%C New York, NY
%@ 978-3-319-07820-5

Article

G. I. Webb and J. Vreeken

“Efficient Discovery of the Most Interesting Associations,” ACM Transactions on Knowledge Discovery from Data, vol. 8, no. 3, 2014.

mehr

BibTeX

@article{webb:14:selfsufs,
TITLE = {Efficient Discovery of the Most Interesting Associations},
AUTHOR = {Webb, Geoffrey I. and Vreeken, Jilles},
LANGUAGE = {eng},
DOI = {10.1145/2601433},
PUBLISHER = {ACM},
YEAR = {2014},
DATE = {2014},
JOURNAL = {ACM Transactions on Knowledge Discovery from Data},
VOLUME = {8},
NUMBER = {3},
PAGES = {1--31},
EID = {15},
}

Endnote

%0 Journal Article
%A Webb, Geoffrey I.
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficient Discovery of the Most Interesting Associations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53B1-2
%R 10.1145/2601433
%7 2014
%D 2014
%J ACM Transactions on Knowledge Discovery from Data
%O TKDD
%V 8
%N 3
%& 1
%P 1 - 31
%Z sequence number: 15
%I ACM

Conference paper

G. Weikum

“Big Text: von Sprache zu Wissen,” in Informatik 2014: Big Data - Komplexität meistern, Stuttgart, Deutschland, 2014.

mehr

BibTeX

@inproceedings{DBLP:conf/gi/Weikum14,
TITLE = {{Big {Text}: von {Sprache} zu {Wissen}}},
AUTHOR = {Weikum, Gerhard},
LANGUAGE = {deu},
ISBN = {978-388579626-8},
PUBLISHER = {GI},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Informatik 2014: Big Data -- Komplexit{\"a}t meistern},
EDITOR = {Pl{\"o}dereder, Erhard and Grunske, Lars and Schneider, Eric and Ull, Dominik},
PAGES = {55},
SERIES = {Lecture Notes in Informatics},
VOLUME = {P-232},
ADDRESS = {Stuttgart, Deutschland},
}

Endnote

%0 Conference Proceedings
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Big Text: von Sprache zu Wissen : 
%G deu
%U http://hdl.handle.net/11858/00-001M-0000-0024-54D4-A
%D 2014
%B 44. Jahrestagung der Gesellschaft f&#252;r Informatik
%Z date of event: 2014-09-22 - 2014-09-26
%C Stuttgart, Deutschland
%B Informatik 2014: Big Data - Komplexit&#228;t meistern
%E Pl&#246;dereder, Erhard; Grunske, Lars; Schneider, Eric; Ull, Dominik
%P 55
%I GI
%@ 978-388579626-8
%B Lecture Notes in Informatics
%N P-232

Article

H. Wu, J. Vreeken, N. Tatti, and N. Ramakrishnan

“Uncovering the Plot: Detecting Surprising Coalitions of Entities in Multi-relational Schemas,” Data Mining and Knowledge Discovery, vol. 28, no. 5–6, 2014.

mehr

BibTeX

@article{wu:14:plots,
TITLE = {Uncovering the Plot: {Detecting} Surprising Coalitions of Entities in Multi-relational Schemas},
AUTHOR = {Wu, Hao and Vreeken, Jilles and Tatti, Nikolaj and Ramakrishnan, Naren},
LANGUAGE = {eng},
DOI = {10.1007/s10618-014-0370-1},
PUBLISHER = {Springer},
ADDRESS = {London},
YEAR = {2014},
DATE = {2014},
JOURNAL = {Data Mining and Knowledge Discovery},
VOLUME = {28},
NUMBER = {5-6},
PAGES = {1398--1428},
}

Endnote

%0 Journal Article
%A Wu, Hao
%A Vreeken, Jilles
%A Tatti, Nikolaj
%A Ramakrishnan, Naren
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Uncovering the Plot: Detecting Surprising Coalitions of Entities in Multi-relational Schemas : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53B7-5
%R 10.1007/s10618-014-0370-1
%7 2014-07-22
%D 2014
%J Data Mining and Knowledge Discovery
%V 28
%N 5-6
%& 1398
%P 1398 - 1428
%I Springer
%C London

Conference paper

M. Yahya, S. E. Whang, R. Gupta, and A. Halevy

“ReNoun: Fact Extraction for Nominal Attributes,” in The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), Doha, Qatar, 2014.

mehr

Abstract

Search engines are increasingly relying on large knowledge bases of facts to provide direct answers to users' queries. However, the construction of these knowledge bases is largely manual and does not scale to the long and heavy tail of facts. Open information extraction tries to address this challenge, but typically assumes that facts are expressed with verb phrases, and therefore has had difficulty extracting facts for noun‐based relations.

We describe ReNoun, an open information extraction system that complements previous efforts by focusing on nominal attributes and on the long tail. ReNoun's approach is based on leveraging a large ontology of noun attributes mined from a text corpus and from user queries. ReNoun creates a seed set of training data by using specialized patterns and requiring that the facts mention an attribute in the ontology. ReNoun then generalizes from this seed set to produce a much larger set of extractions that are then scored. We describe experiments that show that we extract facts with high precision and for attributes that cannot be extracted with verb‐based techniques.

BibTeX

@inproceedings{YahyaSRA14,
TITLE = {{ReNoun}: Fact Extraction for Nominal Attributes},
AUTHOR = {Yahya, Mohamed and Whang, Steven Euijong and Gupta, Rahul and Halevy, Alon},
ISBN = {978-1-937284-96-1},
PUBLISHER = {ACL},
YEAR = {2014},
DATE = {2014-10},
ABSTRACT = {Search engines are increasingly relying on large knowledge bases of facts to provide direct answers to users' queries. However, the construction of these knowledge bases is largely manual and does not scale to the long and heavy tail of facts. Open information extraction tries to address this challenge, but typically assumes that facts are expressed with verb phrases, and therefore has had difficulty extracting facts for noun-based relations. We describe ReNoun, an open information extraction system that complements previous efforts by focusing on nominal attributes and on the long tail. ReNoun's approach is based on leveraging a large ontology of noun attributes mined from a text corpus and from user queries. ReNoun creates a seed set of training data by using specialized patterns and requiring that the facts mention an attribute in the ontology. ReNoun then generalizes from this seed set to produce a much larger set of extractions that are then scored. We describe experiments that show that we extract facts with high precision and for attributes that cannot be extracted with verb-based techniques.},
BOOKTITLE = {The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014)},
PAGES = {325--335},
ADDRESS = {Doha, Qatar},
}

Endnote

%0 Conference Proceedings
%A Yahya, Mohamed
%A Whang, Steven Euijong
%A Gupta, Rahul
%A Halevy, Alon
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T ReNoun: Fact Extraction for Nominal Attributes : 
%U http://hdl.handle.net/11858/00-001M-0000-0024-2589-7
%D 2014
%B 2014 Conference on Empirical Methods in Natural Language Processing
%Z date of event: 2014-10-25 - 2014-10-29
%C Doha, Qatar
%X Search engines are increasingly relying on large knowledge bases of facts to provide direct answers to users' queries. However, the construction of these knowledge bases is largely manual and does not scale to the long and heavy tail of facts. Open information extraction tries to address this challenge, but typically assumes that facts are expressed with verb phrases, and therefore has had difficulty extracting facts for noun&#8208;based relations. 

We describe ReNoun, an open information extraction system that complements previous efforts by focusing on nominal attributes and on the long tail. ReNoun's approach is based on leveraging a large ontology of noun attributes mined from a text corpus and from user queries. ReNoun creates a seed set of training data by using specialized patterns and requiring that the facts mention an attribute in the ontology. ReNoun then generalizes from this seed set to produce a much larger set of extractions that are then scored. We describe experiments that show that we extract facts with high precision and for attributes that cannot be extracted with verb&#8208;based techniques.
%B The 2014 Conference on Empirical Methods in Natural Language Processing
%P 325 - 335
%I ACL
%@ 978-1-937284-96-1
%U http://emnlp2014.org/papers/pdf/EMNLP2014038.pdf

Conference paper

M. A. Yosef, J. Hoffart, Y. Ibrahim, A. Boldyrev, and G. Weikum

“Adapting AIDA for Tweets,” in Proceedings of the 4th Workshop on Making Sense of Microposts, Seoul, Korea, 2014.

mehr

BibTeX

@inproceedings{YosefMicroposts2014,
TITLE = {Adapting {AIDA} for Tweets},
AUTHOR = {Yosef, Mohamed Amir and Hoffart, Johannes and Ibrahim, Yusra and Boldyrev, Artem and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-1141-0},
PUBLISHER = {CEUR-WS.org},
YEAR = {2014},
BOOKTITLE = {Proceedings of the 4th Workshop on Making Sense of Microposts},
EDITOR = {Rowe, Matthew and Stankovic, Milan and Dadzie, Aba-Sah},
PAGES = {68--69},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1141},
ADDRESS = {Seoul, Korea},
}

Endnote

%0 Conference Proceedings
%A Yosef, Mohamed Amir
%A Hoffart, Johannes
%A Ibrahim, Yusra
%A Boldyrev, Artem
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Adapting AIDA for Tweets : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-54AB-8
%D 2014
%B 4th Workshop on Making Sense of Microposts
%Z date of event: 2014-04-07 - 2014-04-07
%C Seoul, Korea
%B Proceedings of the 4th Workshop on Making Sense of Microposts
%E Rowe, Matthew; Stankovic, Milan; Dadzie, Aba-Sah
%P 68 - 69
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%N 1141
%@ false
%U http://ceur-ws.org/Vol-1141/paper_15.pdf

Conference paper

M. A. Yosef, M. Spaniol, and G. Weikum

“AIDArabic: A Named-entity Disambiguation Framework for Arabic Text,” in The EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP 2014), Dohar, Qatar, 2014.

mehr

BibTeX

@inproceedings{mamir:2014:aidarabic,
TITLE = {{AIDArabic}: A Named-entity Disambiguation Framework for {Arabic} Text},
AUTHOR = {Yosef, Mohamed Amir and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-96-1},
PUBLISHER = {ACL},
YEAR = {2014},
BOOKTITLE = {The EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP 2014)},
PAGES = {187--195},
EID = {W14-3626},
ADDRESS = {Dohar, Qatar},
}

Endnote

%0 Conference Proceedings
%A Yosef, Mohamed Amir
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T AIDArabic: A Named-entity Disambiguation Framework for Arabic Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-548F-A
%D 2014
%B The EMNLP 2014 Workshop on Arabic Natural Language Processing
%Z date of event: 2014-10-25 - 2014-10-25
%C Dohar, Qatar
%B The EMNLP 2014 Workshop on Arabic Natural Language Processing
%P 187 - 195
%Z sequence number: W14-3626
%I ACL
%@ 978-1-937284-96-1
%U http://www.aclweb.org/anthology/W14-3626

Book chapter / section

A. Zimek, I. Assent, and J. Vreeken

“Frequent Pattern Mining Algorithms for Data Clustering,” in Frequent Pattern Mining, New York, NY: Springer, 2014.

mehr

BibTeX

@incollection{zimek:14:clustering,
TITLE = {Frequent Pattern Mining Algorithms for Data Clustering},
AUTHOR = {Zimek, Arthur and Assent, Ira and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-319-07820-5},
DOI = {10.1007/978-3-319-07821-2_16},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2014},
DATE = {2014},
BOOKTITLE = {Frequent Pattern Mining},
EDITOR = {Aggarwal, Charu C. and Han, Jiawei},
PAGES = {403--423},
}

Endnote

%0 Book Section
%A Zimek, Arthur
%A Assent, Ira
%A Vreeken, Jilles
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Frequent Pattern Mining Algorithms for Data Clustering : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-53BD-A
%R 10.1007/978-3-319-07821-2_16
%D 2014
%B Frequent Pattern Mining
%E Aggarwal, Charu C.; Han, Jiawei
%P 403 - 423
%I Springer
%C New York, NY
%@ 978-3-319-07820-5

2013

Conference paper

E. Aksehirli, B. Goethals, E. Müller, and J. Vreeken

“Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data,” in IEEE 13th International Conference on Data Mining (ICDM 2013), Dallas, TX, USA, 2013.

mehr

BibTeX

@inproceedings{Aksehirli2013a,
TITLE = {Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data},
AUTHOR = {Aksehirli, Emin and Goethals, Bart and M{\"u}ller, Emmanuel and Vreeken, Jilles},
LANGUAGE = {eng},
DOI = {10.1109/ICDM.2013.146},
LOCALID = {Local-ID: 9972B38173345D64C1257C600054DB8E-Aksehirli2013a},
PUBLISHER = {IEEE},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {IEEE 13th International Conference on Data Mining (ICDM 2013)},
EDITOR = {Karypis, George and Xiong, Hui},
PAGES = {937--942},
ADDRESS = {Dallas, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Aksehirli, Emin
%A Goethals, Bart
%A M&#252;ller, Emmanuel
%A Vreeken, Jilles
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-19EA-5
%R 10.1109/ICDM.2013.146
%F OTHER: Local-ID: 9972B38173345D64C1257C600054DB8E-Aksehirli2013a
%D 2013
%B 13th International Conference on Data Mining
%Z date of event: 2013-12-07 - 2013-12-10
%C Dallas, TX, USA
%B IEEE 13th International Conference on Data Mining 
%E Karypis, George; Xiong, Hui
%P 937 - 942
%I IEEE

Conference paper

F. Alvanaki and S. Michel

“A Thin Monitoring Layer for Top-k Aggregation Queries over a Database,” in 7th International Workshop on Ranking in Databases (DBRank 2013), Riva del Garda, Italy, 2013.

mehr

BibTeX

@inproceedings{AlvanakiMichel2013c,
TITLE = {A Thin Monitoring Layer for Top-k Aggregation Queries over a Database},
AUTHOR = {Alvanaki, Foteini and Michel, Sebastian},
LANGUAGE = {eng},
ISBN = {978-1-4503-2497-7},
DOI = {10.1145/2524828.2524831},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {7th International Workshop on Ranking in Databases (DBRank 2013)},
PAGES = {1--6},
EID = {3},
ADDRESS = {Riva del Garda, Italy},
}

Endnote

%0 Conference Proceedings
%A Alvanaki, Foteini
%A Michel, Sebastian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Thin Monitoring Layer for Top-k Aggregation Queries over a Database : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-CCDD-E
%R 10.1145/2524828.2524831
%D 2013
%B 7th International Workshop on Ranking in Databases
%Z date of event: 2013-08-30 - 2013-08-30
%C Riva del Garda, Italy
%B 7th International Workshop on Ranking in Databases
%P 1 - 6
%Z sequence number: 3
%I ACM
%@ 978-1-4503-2497-7

Conference paper

F. Alvanaki and S. Michel

“Scalable, Continuous Tracking of Tag Co-occurrences Between Short Sets Using (Almost) Disjoint Tag Partitions,” in Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks (DBSocial 2013), New York, NY, USA, 2013.

mehr

Abstract

In this work we consider the continuous computation of set correlations over a

stream of set-valued attributes, such as Tweets and their hashtags, social

annotations of blog posts obtained through RSS, or updates to set-valued

attributes of databases. In order to compute tag correlations in a distributed

fashion, all necessary information has to be present at the computing node(s).

Our approach makes use of a partitioning scheme based on set covers for

efficient and replication-lean information flow. We report on the results of a

preliminary performance evaluation using Tweets obtained through Twitter's

streaming API.

BibTeX

@inproceedings{Avlanaki2013a,
TITLE = {Scalable, Continuous Tracking of Tag Co-occurrences Between Short Sets Using (Almost) Disjoint Tag Partitions},
AUTHOR = {Alvanaki, Foteini and Michel, Sebastian},
LANGUAGE = {eng},
ISBN = {978-1-4503-2191-4},
DOI = {10.1145/2484702.2484705},
LOCALID = {Local-ID: 305767E5408759CFC1257B97004FACE2-Avlanaki2013a},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {In this work we consider the continuous computation of set correlations over a stream of set-valued attributes, such as Tweets and their hashtags, social annotations of blog posts obtained through RSS, or updates to set-valued attributes of databases. In order to compute tag correlations in a distributed fashion, all necessary information has to be present at the computing node(s). Our approach makes use of a partitioning scheme based on set covers for efficient and replication-lean information flow. We report on the results of a preliminary performance evaluation using Tweets obtained through Twitter's streaming API.},
BOOKTITLE = {Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks (DBSocial 2013)},
EDITOR = {LeFevre, Kristen and Machanavajjhala, Ashwin and Silberstein, Adam},
PAGES = {49--54},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Alvanaki, Foteini
%A Michel, Sebastian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Scalable, Continuous Tracking of Tag Co-occurrences Between Short Sets Using (Almost) Disjoint Tag Partitions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A81-D
%R 10.1145/2484702.2484705
%F OTHER: Local-ID: 305767E5408759CFC1257B97004FACE2-Avlanaki2013a
%D 2013
%B ACM SIGMOD Workshop on Databases and Social Networks
%Z date of event: 2013-06-13 - 2013-06-13
%C New York, NY, USA
%X In this work we consider the continuous computation of set correlations over a 
stream of set-valued attributes, such as Tweets and their hashtags, social 
annotations of blog posts obtained through RSS, or updates to set-valued 
attributes of databases. In order to compute tag correlations in a distributed 
fashion, all necessary information has to be present at the computing node(s). 
Our approach makes use of a partitioning scheme based on set covers for 
efficient and replication-lean information flow. We report on the results of a 
preliminary performance evaluation using Tweets obtained through Twitter's 
streaming API.
%K Distributed Stream Processing, Tags, Twitter, Correlation, Continuous
%B Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks
%E LeFevre, Kristen; Machanavajjhala, Ashwin; Silberstein, Adam
%P 49 - 54
%I ACM
%@ 978-1-4503-2191-4

Conference paper

F. Alvanaki, E. Ilieva, S. Michel, and A. Stupar

“Interesting Event Detection through Hall of Fame Rankings,” in Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks (DBSocial 2013), New York, NY, USA, 2013.

mehr

Abstract

Everything is relative. Cars are compared by gas per mile, websites by page

rank, students based on GPA, scientists by number of publications, and

celebrities by beauty or wealth. In this paper, we study the characteristics of

such entity rankings based on a set of rankings obtained from a popular Web

portal. The obtained insights are integrated in our approach, coined Pantheon.

Pantheon maintains sets of top-k rankings and reports identified changes in a

way that appeals to users, using a novel combination of different

characteristics like competitiveness, information entropy, and scale of change.

Entity rankings are assembled by combining entity type attributes with

data-driven categorical constraints and sorting criteria on numeric attributes.

We report on the results of an experimental evaluation using real-world data

obtained from a basketball statistics website.

BibTeX

@inproceedings{Avlanaki2013b,
TITLE = {Interesting Event Detection through Hall of Fame Rankings},
AUTHOR = {Alvanaki, Foteini and Ilieva, Evica and Michel, Sebastian and Stupar, Aleksandar},
LANGUAGE = {eng},
ISBN = {978-1-4503-2191-4},
DOI = {10.1145/2484702.2484704},
LOCALID = {Local-ID: BCF76B7E62BA3435C1257B9700501576-Avlanaki2013b},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Everything is relative. Cars are compared by gas per mile, websites by page rank, students based on GPA, scientists by number of publications, and celebrities by beauty or wealth. In this paper, we study the characteristics of such entity rankings based on a set of rankings obtained from a popular Web portal. The obtained insights are integrated in our approach, coined Pantheon. Pantheon maintains sets of top-k rankings and reports identified changes in a way that appeals to users, using a novel combination of different characteristics like competitiveness, information entropy, and scale of change. Entity rankings are assembled by combining entity type attributes with data-driven categorical constraints and sorting criteria on numeric attributes. We report on the results of an experimental evaluation using real-world data obtained from a basketball statistics website.},
BOOKTITLE = {Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks (DBSocial 2013)},
EDITOR = {LeFevre, Kristen and Machanavajjhala, Ashwin and Silberstein, Adam},
PAGES = {7--12},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Alvanaki, Foteini
%A Ilieva, Evica
%A Michel, Sebastian
%A Stupar, Aleksandar
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Interesting Event Detection through Hall of Fame Rankings : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A8A-C
%R 10.1145/2484702.2484704
%F OTHER: Local-ID: BCF76B7E62BA3435C1257B9700501576-Avlanaki2013b
%D 2013
%B ACM SIGMOD Workshop on Databases and Social Networks
%Z date of event: 2013-06-22 - 2013-06-27
%C New York, NY, USA
%X Everything is relative. Cars are compared by gas per mile, websites by page 
rank, students based on GPA, scientists by number of publications, and 
celebrities by beauty or wealth. In this paper, we study the characteristics of 
such entity rankings based on a set of rankings obtained from a popular Web 
portal. The obtained insights are integrated in our approach, coined Pantheon. 
Pantheon maintains sets of top-k rankings and reports identified changes in a 
way that appeals to users, using a novel combination of different 
characteristics like competitiveness, information entropy, and scale of change. 
Entity rankings are assembled by combining entity type attributes with 
data-driven categorical constraints and sorting criteria on numeric attributes. 
We report on the results of an experimental evaluation using real-world data 
obtained from a basketball statistics website.
%B Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks
%E LeFevre, Kristen; Machanavajjhala, Ashwin; Silberstein, Adam
%P 7 - 12
%I ACM
%@ 978-1-4503-2191-4

Thesis

D5IMPR-CS

A. Anand

“Indexing Methods for Web Archives,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

Abstract

There have been numerous efforts recently to digitize previously published
content and preserving born-digital content leading to the widespread growth of
large text repositories.
Web archives are such continuously growing text collections which contain
versions
of documents spanning over long time periods. Web archives present many
opportunities for historical, cultural and political analyses. Consequently
there is a growing need for tools which can efficiently access and search them.
In this work, we are interested in indexing methods for supporting text-search
workloads over web archives like time-travel queries and phrase queries. To
this end we make the following contributions:
Time-travel queries are keyword queries with a temporal predicate, e.g., mpii
saarland @ [06/2009], which return versions of documents in the past. We
introduce
a novel index organization strategy, called index sharding, for efficiently
supporting time-travel queries without incurring additional index-size blowup.
We also propose index-maintenance approaches which scale to such continuously
growing collections. We develop query-optimization techniques for time-travel
queries called partition selection which maximizes recall at any given
query-execution stage. We propose indexing methods to support phrase queries,
e.g., to be or not to be that is the question. We index multi-word sequences
and devise novel queryoptimization methods over the indexed sequences to
efficiently answer phrase queries. We demonstrate the superior performance of
our approaches over existing methods by extensive experimentation on real-world
web archives.

BibTeX

@phdthesis{Anand2013,
TITLE = {Indexing Methods for Web Archives},
AUTHOR = {Anand, Avishek},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-55319},
DOI = {10.22028/D291-26542},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {There have been numerous efforts recently to digitize previously published <br>content and preserving born-digital content leading to the widespread growth of <br>large text repositories.<br>Web archives are such continuously growing text collections which contain <br>versions<br>of documents spanning over long time periods. Web archives present many <br>opportunities for historical, cultural and political analyses. Consequently <br>there is a growing need for tools which can efficiently access and search them.<br>In this work, we are interested in indexing methods for supporting text-search <br>workloads over web archives like time-travel queries and phrase queries. To <br>this end we make the following contributions:<br>Time-travel queries are keyword queries with a temporal predicate, e.g., mpii<br>saarland @ [06/2009], which return versions of documents in the past. We <br>introduce<br>a novel index organization strategy, called index sharding, for efficiently<br>supporting time-travel queries without incurring additional index-size blowup.<br>We also propose index-maintenance approaches which scale to such continuously<br>growing collections. We develop query-optimization techniques for time-travel <br>queries called partition selection which maximizes recall at any given <br>query-execution stage. We propose indexing methods to support phrase queries, <br>e.g., to be or not to be that is the question. We index multi-word sequences <br>and devise novel queryoptimization methods over the indexed sequences to <br>efficiently answer phrase queries. We demonstrate the superior performance of <br>our approaches over existing methods by extensive experimentation on real-world <br>web archives.},
}

Endnote

%0 Thesis
%A Anand, Avishek
%Y Berberich, Klaus
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Indexing Methods for Web Archives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0026-CB4B-0
%R 10.22028/D291-26542
%U urn:nbn:de:bsz:291-scidok-55319
%F OTHER: hdl:20.500.11880/26598
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V phd
%9 phd
%X There have been numerous efforts recently to digitize previously published <br>content and preserving born-digital content leading to the widespread growth of <br>large text repositories.<br>Web archives are such continuously growing text collections which contain <br>versions<br>of documents spanning over long time periods. Web archives present many <br>opportunities for historical, cultural and political analyses. Consequently <br>there is a growing need for tools which can efficiently access and search them.<br>In this work, we are interested in indexing methods for supporting text-search <br>workloads over web archives like time-travel queries and phrase queries. To <br>this end we make the following contributions:<br>Time-travel queries are keyword queries with a temporal predicate, e.g., mpii<br>saarland @ [06/2009], which return versions of documents in the past. We <br>introduce<br>a novel index organization strategy, called index sharding, for efficiently<br>supporting time-travel queries without incurring additional index-size blowup.<br>We also propose index-maintenance approaches which scale to such continuously<br>growing collections. We develop query-optimization techniques for time-travel <br>queries called partition selection which maximizes recall at any given <br>query-execution stage. We propose indexing methods to support phrase queries, <br>e.g., to be or not to be that is the question. We index multi-word sequences <br>and devise novel queryoptimization methods over the indexed sequences to <br>efficiently answer phrase queries. We demonstrate the superior performance of <br>our approaches over existing methods by extensive experimentation on real-world <br>web archives.
%U http://scidok.sulb.uni-saarland.de/volltexte/2013/5531/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Thesis

F. Ansari

“A Comparative Study of MAX-SAT Solving Techniques with Soft and Hard Rules,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{AnsariMastersThesis2013,
TITLE = {A Comparative Study of {MAX}--{SAT} Solving Techniques with Soft and Hard Rules},
AUTHOR = {Ansari, Farzaneh},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
}

Endnote

%0 Thesis
%A Ansari, Farzaneh
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Comparative Study of MAX-SAT Solving Techniques with Soft and Hard Rules : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C6B-5
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V master
%9 master

Conference paper

R. Awadallah, M. Ramanath, and G. Weikum

“OpinioNetIt: A Structured and Faceted Knowledge-base of Opinions,” in Proceedings of the 12th IEEE International Conference on Data Mining Workshops (ICDMW 2012), Brussels, Belgium, 2013.

mehr

BibTeX

@inproceedings{Awadallah2012i,
TITLE = {{OpinioNetIt}: A Structured and Faceted Knowledge-base of Opinions},
AUTHOR = {Awadallah, Rawia and Ramanath, Maya and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4673-5164-5},
DOI = {10.1109/ICDMW.2012.49},
LOCALID = {Local-ID: 04756AF15FFC805BC1257B12002D6750-Awadallah2012i},
PUBLISHER = {IEEE},
YEAR = {2012},
DATE = {2013},
BOOKTITLE = {Proceedings of the 12th IEEE International Conference on Data Mining Workshops (ICDMW 2012)},
EDITOR = {Vreeken, Jilles and Ling, Charles and Javeed Zaki, Mohammed and Siebes, Arno and Yu, Jeffrey Xu and Goethals, Bart and Webb, Geoffrey I. and Wu, Xindong},
PAGES = {878 --881},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Awadallah, Rawia
%A Ramanath, Maya
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T OpinioNetIt: A Structured and Faceted Knowledge-base of Opinions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-198C-C
%F OTHER: Local-ID: 04756AF15FFC805BC1257B12002D6750-Awadallah2012i
%R 10.1109/ICDMW.2012.49
%D 2013
%B 12th IEEE International Conference on Data Mining Workshops
%Z date of event: 2012-12-10 - 2012-12-10
%C Brussels, Belgium
%B Proceedings of the 12th IEEE International Conference on Data Mining Workshops
%E Vreeken, Jilles; Ling, Charles; Javeed Zaki, Mohammed; Siebes, Arno; Yu, Jeffrey Xu; Goethals, Bart; Webb, Geoffrey I.; Wu, Xindong
%P 878  - 881
%I IEEE
%@ 978-1-4673-5164-5

Conference paper

S. Bedathur, K. Berberich, I. Patlakas, P. Triantafillou, and G. Weikum

“D-Hive: Data Bees Pollinating RDF, Text, and Time,” in Online Proceedings of Sixth Biennial Conference on Innovative Data Systems Research (CIDR 2013), Asilomar, CA, USA, 2013.

mehr

BibTeX

@inproceedings{Bedathur2013,
TITLE = {{D-Hive}: Data Bees Pollinating {RDF}, Text, and Time},
AUTHOR = {Bedathur, Srikanta and Berberich, Klaus and Patlakas, Ioannis and Triantafillou, Peter and Weikum, Gerhard},
LANGUAGE = {eng},
LOCALID = {Local-ID: D3BAD8992F713EB5C1257B10002BB930-Bedathur2013},
PUBLISHER = {cidrdb.org},
YEAR = {2013},
BOOKTITLE = {Online Proceedings of Sixth Biennial Conference on Innovative Data Systems Research (CIDR 2013)},
EID = {73},
ADDRESS = {Asilomar, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Bedathur, Srikanta
%A Berberich, Klaus
%A Patlakas, Ioannis
%A Triantafillou, Peter
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T D-Hive: Data Bees Pollinating RDF, Text, and Time : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A7C-A
%F OTHER: Local-ID: D3BAD8992F713EB5C1257B10002BB930-Bedathur2013
%D 2013
%B Sixth Biennial Conference on Innovative Data Systems Research
%Z date of event: 2013-01-06 - 2013-01-09
%C Asilomar, CA, USA
%B Online Proceedings of Sixth Biennial Conference on Innovative Data Systems Research
%Z sequence number: 73
%I cidrdb.org
%U http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper73.pdf

Conference paper

K. Beedkar, L. Del Corro, and R. Gemulla

“Fully Parallel Inference in Markov Logic Networks,” in 15th GI-Symposium Database Systems for Business, Technology and Web (BTW 2013), Magdeburg, Germany, 2013.

mehr

BibTeX

@inproceedings{bcg-btw13,
TITLE = {Fully Parallel Inference in {M}arkov Logic Networks},
AUTHOR = {Beedkar, Kaustubh and Del Corro, Luciano and Gemulla, Rainer},
LANGUAGE = {eng},
ISBN = {978-3-88579-608-4},
LOCALID = {Local-ID: BB228B55B464BC71C1257B08003A6698-bcg-btw13},
PUBLISHER = {GI},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {15th GI-Symposium Database Systems for Business, Technology and Web (BTW 2013)},
EDITOR = {Saake, Gunther},
ADDRESS = {Magdeburg, Germany},
}

Endnote

%0 Conference Proceedings
%A Beedkar, Kaustubh
%A Del Corro, Luciano
%A Gemulla, Rainer
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Fully Parallel Inference in Markov Logic Networks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1989-1
%F OTHER: Local-ID: BB228B55B464BC71C1257B08003A6698-bcg-btw13
%D 2013
%B 15th GI-Symposium Database Systems for Business, Technology and Web
%Z date of event: 2013-03-11 - 2013-03-15
%C Magdeburg, Germany
%B 15th GI-Symposium Database Systems for Business, Technology and Web
%E Saake, Gunther
%I GI
%@ 978-3-88579-608-4
%U http://www.btw-2013.de/proceedings/Fully%20Parallel%20Inference%20in%20Markov%20Logic%20Networks.pdf

Thesis

D5IMPR-CS

R. Belet

“Leveraging Independence and Locality for Random Forests in a Distributed Environment,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

Abstract

With the emergence of big data, inducting regression trees on very large data

sets became a common data mining task. Even though centralized algorithms for

computing ensembles of Classification/Regression trees are a well studied

machine learning/data mining problem, their distributed versions still raise

scalability, efficiency and accuracy issues.

Most state of the art tree learning algorithms require data to reside in memory

on a single machine.

Adopting this approach for trees on big data is not feasible as the limited

resources provided by only one machine lead to scalability problems. While more

scalable implementations of tree learning algorithms have been proposed, they

typically require specialized parallel computing architectures rendering those

algorithms complex and error-prone.

In this thesis we will introduce two approaches to computing ensembles of

regression trees on very large training data sets using the MapReduce framework

as an underlying tool. The first approach employs the entire MapReduce cluster

to parallely and fully distributedly learn tree ensembles. The second approach

exploits locality and independence in the tree learning process.

BibTeX

@mastersthesis{Belet2013,
TITLE = {Leveraging Independence and Locality for Random Forests in a Distributed Environment},
AUTHOR = {Belet, Razvan},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {With the emergence of big data, inducting regression trees on very large data sets became a common data mining task. Even though centralized algorithms for computing ensembles of Classification/Regression trees are a well studied machine learning/data mining problem, their distributed versions still raise scalability, efficiency and accuracy issues. Most state of the art tree learning algorithms require data to reside in memory on a single machine. Adopting this approach for trees on big data is not feasible as the limited resources provided by only one machine lead to scalability problems. While more scalable implementations of tree learning algorithms have been proposed, they typically require specialized parallel computing architectures rendering those algorithms complex and error-prone. In this thesis we will introduce two approaches to computing ensembles of regression trees on very large training data sets using the MapReduce framework as an underlying tool. The first approach employs the entire MapReduce cluster to parallely and fully distributedly learn tree ensembles. The second approach exploits locality and independence in the tree learning process.},
}

Endnote

%0 Thesis
%A Belet, Razvan
%Y Weikum, Gerhard
%A referee: Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Leveraging Independence and Locality for Random Forests in a Distributed Environment : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-97B8-0
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%P 132 p.
%V master
%9 master
%X With the emergence of big data, inducting regression trees on very large data 
sets became a common data mining task. Even though centralized algorithms for
 computing ensembles of Classification/Regression trees are a well studied 
machine learning/data mining problem, their distributed versions still raise 
scalability, efficiency and accuracy issues.
Most state of the art tree learning algorithms require data to reside in memory 
on a single machine.
Adopting this approach for trees on big data is not feasible as the limited 
resources provided by only one machine lead to scalability problems. While more 
scalable implementations of tree learning algorithms have been proposed, they 
typically require specialized parallel computing architectures rendering those
algorithms complex and error-prone.
In this thesis we will introduce two approaches to computing ensembles of 
regression trees on very large training data sets using the MapReduce framework 
as an underlying tool. The first approach employs the entire MapReduce cluster 
to parallely and fully distributedly learn tree ensembles. The second approach 
exploits locality and independence in the tree learning process.

Article

P. Bellot, A. Doucet, S. Geva, S. Gurajada, J. Kamps, G. Kazai, M. Koolen, A. Mishra, V. Moriceau, J. Mothe, M. Preminger, E. SanJuan, R. Schenkel, X. Tannier, M. Theobald, M. Trappett, A. Trotman, M. Sanderson, F. Scholer, and Q. Wang

“Report on INEX 2013,” SIGIR Forum, vol. 47, no. 2, 2013.

mehr

BibTeX

@article{INEX_SIGIRF2013,
TITLE = {Report on {INEX 2013}},
AUTHOR = {Bellot, Patrice and Doucet, Antoine and Geva, Shlomo and Gurajada, Sairam and Kamps, Jaap and Kazai, Gabriella and Koolen, Marijn and Mishra, Arunav and Moriceau, V{\'e}ronique and Mothe, Josiane and Preminger, Michael and SanJuan, Eric and Schenkel, Ralf and Tannier, Xavier and Theobald, Martin and Trappett, Matthew and Trotman, Andrew and Sanderson, Mark and Scholer, Falk and Wang, Qiuyue},
LANGUAGE = {eng},
ISSN = {0163-5840},
DOI = {10.1145/2568388.2568393},
YEAR = {2013},
DATE = {2013},
JOURNAL = {SIGIR Forum},
VOLUME = {47},
NUMBER = {2},
PAGES = {21--32},
}

Endnote

%0 Journal Article
%A Bellot, Patrice
%A Doucet, Antoine
%A Geva, Shlomo
%A Gurajada, Sairam
%A Kamps, Jaap
%A Kazai, Gabriella
%A Koolen, Marijn
%A Mishra, Arunav
%A Moriceau, V&#233;ronique
%A Mothe, Josiane
%A Preminger, Michael
%A SanJuan, Eric
%A Schenkel, Ralf
%A Tannier, Xavier
%A Theobald, Martin
%A Trappett, Matthew
%A Trotman, Andrew
%A Sanderson, Mark
%A Scholer, Falk
%A Wang, Qiuyue
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Report on INEX 2013 : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0019-82A3-0
%R 10.1145/2568388.2568393
%7 2013-12
%D 2013
%J SIGIR Forum
%V 47
%N 2
%& 21
%P 21 - 32
%@ false

Conference paper

P. Bellot, A. Doucet, S. Geva, S. Gurajada, J. Kamps, G. Kazai, M. Koolen, A. Mishra, V. Moriceau, J. Mothe, M. Preminger, E. SanJuan, R. Schenkel, X. Tannier, M. Theobald, M. Trappett, and Q. Wang

“Overview of INEX 2013,” in Information Access Evaluation : Multilinguality, Multimodality, and Visualization (CLEF 2013), Valencia, Spain, 2013.

mehr

Abstract

INEX investigates focused retrieval from structured docu-

ments by providing large test collections of structured documents, uni-

form evaluation measures, and a forum for organizations to compare their

results. This paper reports on the INEX 2013 evaluation campaign, which

consisted of a four activities addressing three themes: searching profes-

sional and user generated data (Social Book Search track); searching

structured or semantic data (Linked Data track); and focused retrieval

(Snippet Retrieval and Tweet Contextualization tracks). INEX 2013 was

an exciting year for INEX in which we consolidated the collaboration

with (other activities in) CLEF and for the second time ran our work-

shop as part of the CLEF labs in order to facilitate knowledge transfer

between the evaluation forums. This paper gives an overview of all the

INEX 2013 tracks, their aims and task, the built test-collections, and

gives an initial analysis of the results.

BibTeX

@inproceedings{INEX-Kamps2012,
TITLE = {Overview of {INEX} 2013},
AUTHOR = {Bellot, Patrice and Doucet, Antoine and Geva, Shlomo and Gurajada, Sairam and Kamps, Jaap and Kazai, Gabriella and Koolen, Marijn and Mishra, Arunav and Moriceau, Veronique and Mothe, Josiane and Preminger, Michael and SanJuan, Eric and Schenkel, Ralf and Tannier, Xavier and Theobald, Martin and Trappett, Matthew and Wang, Qiuyue},
LANGUAGE = {eng},
ISSN = {0302-9743},
ISBN = {978-3-642-40801-4},
DOI = {10.1007/978-3-642-40802-1_27},
LOCALID = {Local-ID: E0D7037ADFDDA1C6C1257BBB003D49E2-INEX-Kamps2012},
PUBLISHER = {Springer},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {INEX investigates focused retrieval from structured docu- ments by providing large test collections of structured documents, uni- form evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2013 evaluation campaign, which consisted of a four activities addressing three themes: searching profes- sional and user generated data (Social Book Search track); searching structured or semantic data (Linked Data track); and focused retrieval (Snippet Retrieval and Tweet Contextualization tracks). INEX 2013 was an exciting year for INEX in which we consolidated the collaboration with (other activities in) CLEF and for the second time ran our work- shop as part of the CLEF labs in order to facilitate knowledge transfer between the evaluation forums. This paper gives an overview of all the INEX 2013 tracks, their aims and task, the built test-collections, and gives an initial analysis of the results.},
BOOKTITLE = {Information Access Evaluation : Multilinguality, Multimodality, and Visualization (CLEF 2013)},
EDITOR = {Forner, Pamela and M{\"u}ller, Henning and Paredes, Roberto and Rosso, Paolo and Stein, Benno},
PAGES = {269--281},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {8138},
ADDRESS = {Valencia, Spain},
}

Endnote

%0 Conference Proceedings
%A Bellot, Patrice
%A Doucet, Antoine
%A Geva, Shlomo
%A Gurajada, Sairam
%A Kamps, Jaap
%A Kazai, Gabriella
%A Koolen, Marijn
%A Mishra, Arunav
%A Moriceau, Veronique
%A Mothe, Josiane
%A Preminger, Michael
%A SanJuan, Eric
%A Schenkel, Ralf
%A Tannier, Xavier
%A Theobald, Martin
%A Trappett, Matthew
%A Wang, Qiuyue
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T Overview of INEX 2013 : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A78-1
%R 10.1007/978-3-642-40802-1_27
%F OTHER: Local-ID: E0D7037ADFDDA1C6C1257BBB003D49E2-INEX-Kamps2012
%D 2013
%B 4th International Conference of the CLEF Initiative
%Z date of event: 2013-09-23 - 2013-09-26
%C Valencia, Spain
%X INEX investigates focused retrieval from structured docu-
ments by providing large test collections of structured documents, uni-
form evaluation measures, and a forum for organizations to compare their
results. This paper reports on the INEX 2013 evaluation campaign, which
consisted of a four activities addressing three themes: searching profes-
sional and user generated data (Social Book Search track); searching
structured or semantic data (Linked Data track); and focused retrieval
(Snippet Retrieval and Tweet Contextualization tracks). INEX 2013 was
an exciting year for INEX in which we consolidated the collaboration
with (other activities in) CLEF and for the second time ran our work-
shop as part of the CLEF labs in order to facilitate knowledge transfer
between the evaluation forums. This paper gives an overview of all the
INEX 2013 tracks, their aims and task, the built test-collections, and
gives an initial analysis of the results.
%B Information Access Evaluation : Multilinguality, Multimodality, and Visualization
%E Forner, Pamela; M&#252;ller, Henning; Paredes, Roberto; Rosso, Paolo; Stein, Benno
%P 269 - 281
%I Springer
%@ 978-3-642-40801-4
%B Lecture Notes in Computer Science
%N 8138
%@ false

Conference paper

K. Berberich and S. Bedathur

“Temporal Diversification of Search Results,” in SIGIR 2013 Workshop on Time-aware Information Access (TAIA 2013), Dublin, Ireland, 2013.

mehr

BibTeX

@inproceedings{Berberich2013g,
TITLE = {Temporal Diversification of Search Results},
AUTHOR = {Berberich, Klaus and Bedathur, Srikanta},
LANGUAGE = {eng},
URL = {http://research.microsoft.com/en-us/people/milads/taia2013.proceedings.final.pdf},
LOCALID = {Local-ID: F06E854555530CFBC1257C6E0023BA55-Berberich2013g},
PUBLISHER = {Microsoft Research},
YEAR = {2013},
BOOKTITLE = {SIGIR 2013 Workshop on Time-aware Information Access (TAIA 2013)},
EDITOR = {Diaz, Fernando and Dumais, Susan and Radinsky, Kira and de Rijke, Maarten and Shokouhi, Milad},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Berberich, Klaus
%A Bedathur, Srikanta
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Temporal Diversification of Search Results : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A73-B
%F OTHER: Local-ID: F06E854555530CFBC1257C6E0023BA55-Berberich2013g
%U http://research.microsoft.com/en-us/people/milads/taia2013.proceedings.final.pdf
%D 2013
%B SIGIR 2013 Workshop on Time-aware Information Access

%Z date of event: 2013-08-01 - 2013-08-01
%C Dublin, Ireland
%B SIGIR 2013 Workshop on Time-aware Information Access

%E Diaz, Fernando; Dumais, Susan; Radinsky, Kira; de Rijke, Maarten; Shokouhi, Milad
%I Microsoft Research
%U http://research.microsoft.com/en-us/people/milads/taia2013.proceedings.final.pdf

Conference paper

K. Berberich and S. Bedathur

“Computing n-gram Statistics in MapReduce,” in Advances in Database Technology (EDBT 2013), Genova, Italy, 2013.

mehr

BibTeX

@inproceedings{Berberich2013b,
TITLE = {Computing n-gram Statistics in {MapReduce}},
AUTHOR = {Berberich, Klaus and Bedathur, Srikanta},
LANGUAGE = {eng},
ISBN = {978-1-4503-1597-5},
DOI = {10.1145/2452376.2452389},
LOCALID = {Local-ID: 31F260D05B735433C1257B09003B1404-Berberich2013b},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {Advances in Database Technology (EDBT 2013)},
PAGES = {101--112},
ADDRESS = {Genova, Italy},
}

Endnote

%0 Conference Proceedings
%A Berberich, Klaus
%A Bedathur, Srikanta
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Computing n-gram Statistics in MapReduce : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-19CE-5
%F OTHER: Local-ID: 31F260D05B735433C1257B09003B1404-Berberich2013b
%R 10.1145/2452376.2452389
%D 2013
%B 16th International Conference on Extending Database Technology
%Z date of event: 2013-03-18 - 2013-03-22
%C Genova, Italy
%B Advances in Database Technology
%P 101 - 112
%I ACM
%@ 978-1-4503-1597-5

Conference paper

J. Biega, E. Kuzey, and F. M. Suchanek

“Inside YAGO2s: A Transparent Information Extraction Architecture,” in WWW’13, 22nd International Conference on World Wide Web, Rio de Janeiro, Brasil, 2013.

mehr

BibTeX

@inproceedings{Biega:2013:IYT:2487788.2487935,
TITLE = {Inside {YAGO2s}: A Transparent Information Extraction Architecture},
AUTHOR = {Biega, Joanna and Kuzey, Erdal and Suchanek, Fabian M.},
LANGUAGE = {eng},
ISBN = {978-1-4503-2038-2},
URL = {http://dl.acm.org/citation.cfm?id=2487788.2487935},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {WWW'13, 22nd International Conference on World Wide Web},
EDITOR = {Schwabe, Daniel and Almeida, Virgilio and Glaser, Hartmut and Baeza-Yates, Ricardo and Moon, Sue},
PAGES = {325--328},
ADDRESS = {Rio de Janeiro, Brasil},
}

Endnote

%0 Conference Proceedings
%A Biega, Joanna
%A Kuzey, Erdal
%A Suchanek, Fabian M.
%+ Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
%T Inside YAGO2s: A Transparent Information Extraction Architecture : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-54E3-C
%U http://dl.acm.org/citation.cfm?id=2487788.2487935
%D 2013
%B 22nd International Conference on World Wide Web
%Z date of event: 2013-05-13 - 2013-05-17
%C Rio de Janeiro, Brasil
%K information extraction, ontologies, yago
%B WWW'13
%E Schwabe, Daniel; Almeida, Virgilio; Glaser, Hartmut; Baeza-Yates, Ricardo; Moon, Sue
%P 325 - 328
%I ACM
%@ 978-1-4503-2038-2

Thesis

D5IMPR-CSD4

A. Boldyrev

“Dictionary-based Named Entity Recognition,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{BoldyrevMastersThesis2013,
TITLE = {Dictionary-based Named Entity Recognition},
AUTHOR = {Boldyrev, Artem},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
}

Endnote

%0 Thesis
%A Boldyrev, Artem
%Y Weikum, Gerhard
%A referee: Theobalt, Christian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Computer Graphics, MPI for Informatics, Max Planck Society
%T Dictionary-based Named Entity Recognition : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C74-F
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V master
%9 master

Conference paper

E. Cergani and P. Miettinen

“Discovering Relations Using Matrix Factorization Methods,” in CIKM’13, 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 2013.

mehr

Abstract

Traditional relation extraction methods work on manually defined

relations and typically expect manually labelled extraction patterns

for each relation. This strongly limits the scalability of these

systems. In Open Relation Extraction (ORE), the relations are

identified automatically based on co-occurrences of ``surface

relations'' (contexts) and entity pairs. The recently-proposed methods

for ORE use partition clustering to find the relations. In this work

we propose the use of matrix factorization methods instead of

clustering. Specifically, we study Non-Negative Matrix Factorization

(NMF) and Boolean Matrix Factorization (BMF). These methods overcome

many problems inherent in clustering and perform better than the

k-means clustering in our evaluation.

BibTeX

@inproceedings{cergani13discovering,
TITLE = {Discovering Relations Using Matrix Factorization Methods},
AUTHOR = {Cergani, Ervina and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-4503-2263-8},
DOI = {10.1145/2505515.2507841},
LOCALID = {Local-ID: B85EF949714E8A6EC1257C6A00608792-cergani13discovering},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Traditional relation extraction methods work on manually defined relations and typically expect manually labelled extraction patterns for each relation. This strongly limits the scalability of these systems. In Open Relation Extraction (ORE), the relations are identified automatically based on co-occurrences of ``surface relations'' (contexts) and entity pairs. The recently-proposed methods for ORE use partition clustering to find the relations. In this work we propose the use of matrix factorization methods instead of clustering. Specifically, we study Non-Negative Matrix Factorization (NMF) and Boolean Matrix Factorization (BMF). These methods overcome many problems inherent in clustering and perform better than the k-means clustering in our evaluation.},
BOOKTITLE = {CIKM{\textquoteright}13, 22nd ACM International Conference on Information \& Knowledge Management},
EDITOR = {Nejdl, Wolfgang and Pei, Jian and Rastogi, Rajeev},
PAGES = {1549--1552},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Cergani, Ervina
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Relations Using Matrix Factorization Methods : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-19DB-7
%F OTHER: Local-ID: B85EF949714E8A6EC1257C6A00608792-cergani13discovering
%R 10.1145/2505515.2507841
%D 2013
%B 22nd ACM International Conference on Information & Knowledge
Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%X Traditional relation extraction methods work on manually defined
relations and typically expect manually labelled extraction patterns
for each relation. This strongly limits the scalability of these
systems. In Open Relation Extraction (ORE), the relations are
identified automatically based on co-occurrences of ``surface
relations'' (contexts) and entity pairs. The recently-proposed methods
for ORE use partition clustering to find the relations. In this work
we propose the use of matrix factorization methods instead of
clustering. Specifically, we study Non-Negative Matrix Factorization
(NMF) and Boolean Matrix Factorization (BMF). These methods overcome
many problems inherent in clustering and perform better than the
k-means clustering in our evaluation.
%B CIKM&#8217;13
%E Nejdl, Wolfgang; Pei, Jian; Rastogi, Rajeev
%P 1549 - 1552
%I ACM
%@ 978-1-4503-2263-8

Proceedings

D. H. Chau, J. Vreeken, M. van Leeuwen, and C. Faloutsos

Eds., Proceedings of the ACM SIGKDD Full-day Workshop on Interactive Data Exploration and Analytics. ACM, 2013.

mehr

BibTeX

@proceedings{Chau2013a,
TITLE = {Proceedings of the ACM SIGKDD Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2013)},
EDITOR = {Chau, Duen Horn and Vreeken, Jilles and van Leeuwen, Matthijs and Faloutsos, Christos},
LANGUAGE = {eng},
ISBN = {978-1-4503-2329-1},
LOCALID = {Local-ID: 1F669F9DC4CC9410C1257C60005593D1-Chau2013a},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
PAGES = {103},
ADDRESS = {Chicago, IL, USA},
}

Endnote

%0 Conference Proceedings
%E Chau, Duen Horn
%E Vreeken, Jilles
%E van Leeuwen, Matthijs
%E Faloutsos, Christos
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Proceedings of the ACM SIGKDD Full-day Workshop on Interactive Data Exploration and Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-19E0-A
%F OTHER: Local-ID: 1F669F9DC4CC9410C1257C60005593D1-Chau2013a
%@ 978-1-4503-2329-1
%I ACM
%D 2013
%B ACM SIGKDD Full-day Workshop on Interactive Data Exploration and Analytics
%Z date of event: 2013-08-11 - 2013-08-11
%D 2013
%C Chicago, IL, USA
%P 103

Article

O. Čulo and G. de Melo

“Source-Path-Goal: Investigating the Cross-Linguistic Potential of Frame-Semantic Text Analysis,” Information Technology, vol. 54, no. 3, 2013.

mehr

BibTeX

@article{CuloDeMelo2012,
TITLE = {Source-Path-Goal: Investigating the Cross-Linguistic Potential of Frame-Semantic Text Analysis},
AUTHOR = {{\v C}ulo, Oliver and de Melo, Gerard},
LANGUAGE = {eng},
ISSN = {1611-2776},
LOCALID = {Local-ID: 4B73EA65B090D965C1257B11002D73A2-CuloDeMelo2012},
PUBLISHER = {Oldenbourg Wissenschaftsverlag},
ADDRESS = {M{\"u}nchen},
YEAR = {2013},
DATE = {2013},
JOURNAL = {Information Technology},
VOLUME = {54},
NUMBER = {3},
PAGES = {147--152},
}

Endnote

%0 Journal Article
%A &#268;ulo, Oliver
%A de Melo, Gerard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Source-Path-Goal: Investigating the Cross-Linguistic Potential of Frame-Semantic Text Analysis : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A44-6
%F OTHER: Local-ID: 4B73EA65B090D965C1257B11002D73A2-CuloDeMelo2012
%D 2013
%J Information Technology
%O it
%V 54
%N 3
%& 147
%P 147 - 152
%I Oldenbourg Wissenschaftsverlag
%C M&#252;nchen
%@ false

Article

M. Daivandy, D. Hünich, R. Jäkel, S. Metzger, R. Müller-Pfefferkorn, and B. Schuller

“Heterogeneous Resource Federation with a Centralized Security Model for Information Extraction,” Journal of Internet Services and Applications, vol. 4, 2013.

mehr

BibTeX

@article{MetzgerJISA2012,
TITLE = {Heterogeneous Resource Federation with a Centralized Security Model for Information Extraction},
AUTHOR = {Daivandy, Milad and H{\"u}nich, Denis and J{\"a}kel, Rene and Metzger, Steffen and M{\"u}ller-Pfefferkorn, Ralph and Schuller, Bernd},
LANGUAGE = {eng},
ISSN = {1869-0238},
DOI = {10.1186/1869-0238-4-10},
LOCALID = {Local-ID: 9D149AAF29E33BCCC1257B83000D0937-MetzgerJISA2012},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2013},
JOURNAL = {Journal of Internet Services and Applications},
VOLUME = {4},
PAGES = {1--14},
EID = {10},
}

Endnote

%0 Journal Article
%A Daivandy, Milad
%A H&#252;nich, Denis
%A J&#228;kel, Rene
%A Metzger, Steffen
%A M&#252;ller-Pfefferkorn, Ralph
%A Schuller, Bernd
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Heterogeneous Resource Federation with a Centralized Security Model for Information Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-6395-C
%R 10.1186/1869-0238-4-10
%F OTHER: Local-ID: 9D149AAF29E33BCCC1257B83000D0937-MetzgerJISA2012
%7 2013-03-20
%D 2013
%8 20.03.2013
%J Journal of Internet Services and Applications
%V 4
%& 1
%P 1 - 14
%Z sequence number: 10
%I Springer
%C New York, NY
%@ false
%U http://www.jisajournal.com/content/4/1/10

Conference paper

L. Del Corro and R. Gemulla

“ClausIE: Clause-Based Open Information Extraction,” in WWW’13, 22nd International Conference on World Wide Web, Rio do Janeiro, Brazil, 2013.

mehr

BibTeX

@inproceedings{ClausIE,
TITLE = {{ClausIE}: Clause-Based Open Information Extraction},
AUTHOR = {Del Corro, Luciano and Gemulla, Rainer},
LANGUAGE = {eng},
ISBN = {978-1-4503-2035-1},
URL = {http://dl.acm.org/citation.cfm?id=2488388.2488420},
LOCALID = {Local-ID: 937BBDB401D54B01C1257B10003FDEFF-ClausIE},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {WWW'13, 22nd International Conference on World Wide Web},
EDITOR = {Schwabe, Daniel and Almeida, Virgilio and Glaser, Hartmut and Baeza-Yates, Ricardo and Moon, Sue},
PAGES = {355--366},
ADDRESS = {Rio do Janeiro, Brazil},
}

Endnote

%0 Conference Proceedings
%A Del Corro, Luciano
%A Gemulla, Rainer
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T ClausIE: Clause-Based Open Information Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A3A-B
%F OTHER: Local-ID: 937BBDB401D54B01C1257B10003FDEFF-ClausIE
%U http://dl.acm.org/citation.cfm?id=2488388.2488420
%D 2013
%B 22nd International Conference on World Wide Web
%Z date of event: 2013-05-13 - 2013-05-17
%C Rio do Janeiro, Brazil
%B WWW'13
%E Schwabe, Daniel; Almeida, Virgilio; Glaser, Hartmut; Baeza-Yates, Ricardo; Moon, Sue
%P 355 - 366
%I ACM
%@ 978-1-4503-2035-1

Thesis

A. de Oliveira Melo

“Learning Rules With Categorical Attributes from Linked Data Sources,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{MeloMastersThesis2013,
TITLE = {Learning Rules With Categorical Attributes from Linked Data Sources},
AUTHOR = {de Oliveira Melo, Andr{\'e}},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
}

Endnote

%0 Thesis
%A de Oliveira Melo, Andr&#233;
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Learning Rules With Categorical Attributes from Linked Data Sources : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C54-8
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V master
%9 master

Article

S. Dutta, A. Narang, and S. K. Bera

“Streaming Quotient Filter: A Near Optimal Approximate Duplicate Detection Approach for Data Streams,” Proceedings of the VLDB Endowment (Proc. VLDB 2013), vol. 6, no. 8, 2013.

mehr

BibTeX

@article{SouVLDB2013,
TITLE = {Streaming Quotient Filter: A Near Optimal Approximate Duplicate Detection Approach for Data Streams},
AUTHOR = {Dutta, Sourav and Narang, Ankur and Bera, Suman K.},
LANGUAGE = {eng},
ISSN = {2150-8097},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2013},
DATE = {2013},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {6},
NUMBER = {8},
PAGES = {589--600},
BOOKTITLE = {Proccedings of the 39th International Conference on Very Large Data Bases (VLDB 2013)},
EDITOR = {B{\"o}hlen, Michael and Koch, Christoph},
}

Endnote

%0 Journal Article
%A Dutta, Sourav
%A Narang, Ankur
%A Bera, Suman K.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Streaming Quotient Filter: A Near Optimal Approximate
Duplicate Detection Approach for Data Streams : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-54B0-9
%D 2013
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 6
%N 8
%& 589
%P 589 - 600
%I ACM
%C New York, NY
%@ false
%B Proccedings of the 39th International Conference on Very Large Data Bases
%O Riva del Garda, Trento, Italy VLDB 2013
%U http://www.vldb.org/pvldb/vol6/p589-dutta.pdf

Article

M. Dylla, I. Miliaraki, and M. Theobald

“A Temporal-probabilistic Database Model for Information Extraction,” Proceedings of the VLDB Endowment (Proc. VLDB 2013), vol. 6, no. 14, 2013.

mehr

BibTeX

@article{DBLP:journals/pvldb/DyllaMT13,
TITLE = {A Temporal-probabilistic Database Model for Information Extraction},
AUTHOR = {Dylla, Maximilian and Miliaraki, Iris and Theobald, Martin},
LANGUAGE = {eng},
ISSN = {2150-8097},
URL = {http://www.vldb.org/pvldb/vol6/p1810-miliaraki.pdf},
LOCALID = {Local-ID: F77B765948DFB562C1257BEF002A1315-Dylla-VLDB2013},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2013},
DATE = {2013},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {6},
NUMBER = {14},
PAGES = {1810--1821},
BOOKTITLE = {Proceedings of the 39th International Conference on Very Large Data Bases (VLDB 2013)},
EDITOR = {B{\"o}hlen, Michael and Koch, Christoph},
}

Endnote

%0 Journal Article
%A Dylla, Maximilian
%A Miliaraki, Iris
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T A Temporal-probabilistic Database Model for Information Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1716-2
%F OTHER: Local-ID: F77B765948DFB562C1257BEF002A1315-Dylla-VLDB2013
%U http://www.vldb.org/pvldb/vol6/p1810-miliaraki.pdf
%7 2013
%D 2013
%J Proceedings of the VLDB Endowment
%O PVLDP
%V 6
%N 14
%& 1810
%P 1810 - 1821
%I ACM
%C New York, NY
%@ false
%B Proceedings of the 39th International Conference on Very Large Data Bases
%O Riva del Garda, Trento, Italy VLDB 2013

Conference paper

M. Dylla, I. Miliaraki, and M. Theobald

“Top-k Query Processing in Probabilistic Databases with Non-materialized Views,” in 29th International IEEE Conference on Data Engineering (ICDE 2013), Brisbane, Australia, 2013.

mehr

BibTeX

@inproceedings{DyllaICDE2013,
TITLE = {Top-k Query Processing in Probabilistic Databases with Non-materialized Views},
AUTHOR = {Dylla, Maximilian and Miliaraki, Iris and Theobald, Martin},
LANGUAGE = {eng},
ISBN = {978-1-4673-4909-3 ; 978-1-4673-4908-6},
DOI = {10.1109/ICDE.2013.6544819},
LOCALID = {Local-ID: 41ABA8E9D9176C38C1257B0C00538601-DyllaICDE2013},
PUBLISHER = {IEEE},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {29th International IEEE Conference on Data Engineering (ICDE 2013)},
PAGES = {122--133},
ADDRESS = {Brisbane, Australia},
}

Endnote

%0 Conference Proceedings
%A Dylla, Maximilian
%A Miliaraki, Iris
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Top-k Query Processing in Probabilistic Databases with Non-materialized Views : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-639B-F
%R 10.1109/ICDE.2013.6544819
%F OTHER: Local-ID: 41ABA8E9D9176C38C1257B0C00538601-DyllaICDE2013
%D 2013
%B 29th International IEEE Conference on Data Engineering
%Z date of event: 2013-04-08 - 2013-04-12
%C Brisbane, Australia
%B 29th International IEEE Conference on Data Engineering
%P 122 - 133
%I IEEE
%@ 978-1-4673-4909-3  978-1-4673-4908-6

Conference paper

D. Erdős and P. Miettinen

“Walk’n'Merge: A Scalable Algorithm for Boolean Tensor Factorization,” in IEEE 13th International Conference on Data Mining (ICDM 2013), Dallas, TX, USA, 2013.

mehr

Abstract

Tensors are becoming increasingly common in data mining, and consequently,

tensor factorizations are becoming more important tools for data miners. When

the data is binary, it is natural to ask if we can factorize it into binary

factors while simultaneously making sure that the reconstructed tensor is still

binary. Such factorizations, called Boolean tensor factorizations, can provide

improved interpretability and find Boolean structure that is hard to express

using normal factorizations. Unfortunately the algorithms for computing Boolean

tensor factorizations do not usually scale well. In this paper we present a

novel algorithm for finding Boolean CP and Tucker decompositions of large and

sparse binary tensors. In our experimental evaluation we show that our

algorithm can handle large tensors and accurately reconstructs the latent

Boolean structure.

BibTeX

@inproceedings{erdos13walknmerge,
TITLE = {{Walk'n'Merge}: A Scalable Algorithm for {Boolean} Tensor Factorization},
AUTHOR = {Erd{\H o}s, D{\'o}ra and Miettinen, Pauli},
LANGUAGE = {eng},
DOI = {10.1109/ICDM.2013.141},
LOCALID = {Local-ID: 4CE63F9DEBEF8E5EC1257C6A00610B3D-erdos13discovering},
PUBLISHER = {IEEE},
YEAR = {2013},
ABSTRACT = {Tensors are becoming increasingly common in data mining, and consequently, tensor factorizations are becoming more important tools for data miners. When the data is binary, it is natural to ask if we can factorize it into binary factors while simultaneously making sure that the reconstructed tensor is still binary. Such factorizations, called Boolean tensor factorizations, can provide improved interpretability and find Boolean structure that is hard to express using normal factorizations. Unfortunately the algorithms for computing Boolean tensor factorizations do not usually scale well. In this paper we present a novel algorithm for finding Boolean CP and Tucker decompositions of large and sparse binary tensors. In our experimental evaluation we show that our algorithm can handle large tensors and accurately reconstructs the latent Boolean structure.},
BOOKTITLE = {IEEE 13th International Conference on Data Mining (ICDM 2013)},
PAGES = {1037--1042},
ADDRESS = {Dallas, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Erd&#337;s, D&#243;ra
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Walk'n'Merge: A Scalable Algorithm for Boolean Tensor Factorization : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A48-B
%F OTHER: Local-ID: 4CE63F9DEBEF8E5EC1257C6A00610B3D-erdos13discovering
%R 10.1109/ICDM.2013.141
%D 2013
%8 31.10.2013
%B 13th International Conference on Data Mining
%Z date of event: 2013-10-07 - 2013-10-10
%C Dallas, TX, USA
%X Tensors are becoming increasingly common in data mining, and consequently, 
tensor factorizations are becoming more important tools for data miners. When 
the data is binary, it is natural to ask if we can factorize it into binary 
factors while simultaneously making sure that the reconstructed tensor is still 
binary. Such factorizations, called Boolean tensor factorizations, can provide 
improved interpretability and find Boolean structure that is hard to express 
using normal factorizations. Unfortunately the algorithms for computing Boolean 
tensor factorizations do not usually scale well. In this paper we present a 
novel algorithm for finding Boolean CP and Tucker decompositions of large and 
sparse binary tensors. In our experimental evaluation we show that our 
algorithm can handle large tensors and accurately reconstructs the latent 
Boolean structure.
%B IEEE 13th International Conference on Data Mining
%P 1037 - 1042
%I IEEE

Conference paper

D. Erdős and P. Miettinen

“Discovering Facts with Boolean Tensor Tucker Decomposition,” in CIKM’13, 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 2013.

mehr

Abstract

Open Information Extraction (Open IE) has gained increasing research

interest in recent years. The first step in Open IE is to extract raw

subject--predicate--object triples from the data. These raw triples are rarely

usable per se, and need additional post-processing. To that end, we proposed

the use of Boolean Tucker tensor decomposition to simultaneously find the

entity and relation synonyms and the facts connecting them from the raw

triples. Our method represents the synonym sets and facts using (sparse) binary

matrices and tensor that can be efficiently stored and manipulated.

We consider the presentation of the problem as a Boolean tensor decomposition

as one of this paper's main contributions. To study the validity of this

approach, we use a recent algorithm for scalable Boolean Tucker decomposition.

We validate the results with empirical evaluation on a new semi-synthetic data

set, generated to faithfully reproduce real-world data features, as well as

with real-world data from existing Open IE extractor. We show that our method

obtains high precision while the low recall can easily be remedied by

considering the original data together with the decomposition.

BibTeX

@inproceedings{erdos13discovering,
TITLE = {Discovering Facts with {B}oolean Tensor Tucker Decomposition},
AUTHOR = {Erd{\H o}s, D{\'o}ra and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-4503-2263-8},
DOI = {10.1145/2505515.2507846},
LOCALID = {Local-ID: 65F19E1E95609D3CC1257C6A0061B38E-erdos13walknmerge},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Open Information Extraction (Open IE) has gained increasing research interest in recent years. The first step in Open IE is to extract raw subject--predicate--object triples from the data. These raw triples are rarely usable per se, and need additional post-processing. To that end, we proposed the use of Boolean Tucker tensor decomposition to simultaneously find the entity and relation synonyms and the facts connecting them from the raw triples. Our method represents the synonym sets and facts using (sparse) binary matrices and tensor that can be efficiently stored and manipulated. We consider the presentation of the problem as a Boolean tensor decomposition as one of this paper's main contributions. To study the validity of this approach, we use a recent algorithm for scalable Boolean Tucker decomposition. We validate the results with empirical evaluation on a new semi-synthetic data set, generated to faithfully reproduce real-world data features, as well as with real-world data from existing Open IE extractor. We show that our method obtains high precision while the low recall can easily be remedied by considering the original data together with the decomposition.},
BOOKTITLE = {CIKM'13, 22nd ACM International Conference on Information \& Knowledge Management},
EDITOR = {Nejdl, Wolfgang and Pei, Jian and Rastogi, Rajeev},
PAGES = {1569--1572},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Erd&#337;s, D&#243;ra
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Facts with Boolean Tensor Tucker Decomposition : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A43-6
%F OTHER: Local-ID: 65F19E1E95609D3CC1257C6A0061B38E-erdos13walknmerge
%R 10.1145/2505515.2507846
%D 2013
%B 22nd ACM International Conference on Information & Knowledge Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%X Open Information Extraction (Open IE) has gained increasing research
interest in recent years. The first step in Open IE is to extract raw 
subject--predicate--object triples from the data. These raw triples are rarely 
usable per se, and need additional post-processing. To that end, we proposed 
the use of Boolean Tucker tensor decomposition to simultaneously find the 
entity and relation synonyms and the facts connecting them from the raw 
triples. Our method represents the synonym sets and facts using (sparse) binary 
matrices and tensor that can be efficiently stored and manipulated.

We consider the presentation of the problem as a Boolean tensor decomposition 
as one of this paper's main contributions. To study the validity of this 
approach, we use a recent algorithm for scalable Boolean Tucker decomposition. 
We validate the results with empirical evaluation on a new semi-synthetic data 
set, generated to faithfully reproduce real-world data features, as well as 
with real-world data from existing Open IE extractor. We show that our method 
obtains high precision while the low recall can easily be remedied by 
considering the original data together with the decomposition.
%B CIKM'13
%E Nejdl, Wolfgang; Pei, Jian; Rastogi, Rajeev
%P 1569 - 1572
%I ACM
%@ 978-1-4503-2263-8

Paper

D. Erdős and P. Miettinen

“Scalable Boolean Tensor Factorizations using Random Walks,” 2013. [Online]. Available: http://arxiv.org/abs/1310.4843.

mehr

Abstract

Tensors are becoming increasingly common in data mining, and consequently,

tensor factorizations are becoming more and more important tools for data

miners. When the data is binary, it is natural to ask if we can factorize it

into binary factors while simultaneously making sure that the reconstructed

tensor is still binary. Such factorizations, called Boolean tensor

factorizations, can provide improved interpretability and find Boolean

structure that is hard to express using normal factorizations. Unfortunately

the algorithms for computing Boolean tensor factorizations do not usually scale

well. In this paper we present a novel algorithm for finding Boolean CP and

Tucker decompositions of large and sparse binary tensors. In our experimental

evaluation we show that our algorithm can handle large tensors and accurately

reconstructs the latent Boolean structure.

BibTeX

@online{ErdosMiettinenarXiv2013,
TITLE = {Scalable Boolean Tensor Factorizations using Random Walks},
AUTHOR = {Erd{\H o}s, D{\'o}ra and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1310.4843},
EPRINT = {1310.4843},
EPRINTTYPE = {arXiv},
YEAR = {2013},
ABSTRACT = {Tensors are becoming increasingly common in data mining, and consequently, tensor factorizations are becoming more and more important tools for data miners. When the data is binary, it is natural to ask if we can factorize it into binary factors while simultaneously making sure that the reconstructed tensor is still binary. Such factorizations, called Boolean tensor factorizations, can provide improved interpretability and find Boolean structure that is hard to express using normal factorizations. Unfortunately the algorithms for computing Boolean tensor factorizations do not usually scale well. In this paper we present a novel algorithm for finding Boolean CP and Tucker decompositions of large and sparse binary tensors. In our experimental evaluation we show that our algorithm can handle large tensors and accurately reconstructs the latent Boolean structure.},
}

Endnote

%0 Report
%A Erd&#337;s, D&#243;ra
%A Miettinen, Pauli
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Scalable Boolean Tensor Factorizations using Random Walks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-4971-0
%U http://arxiv.org/abs/1310.4843
%D 2013
%X   Tensors are becoming increasingly common in data mining, and consequently,
tensor factorizations are becoming more and more important tools for data
miners. When the data is binary, it is natural to ask if we can factorize it
into binary factors while simultaneously making sure that the reconstructed
tensor is still binary. Such factorizations, called Boolean tensor
factorizations, can provide improved interpretability and find Boolean
structure that is hard to express using normal factorizations. Unfortunately
the algorithms for computing Boolean tensor factorizations do not usually scale
well. In this paper we present a novel algorithm for finding Boolean CP and
Tucker decompositions of large and sparse binary tensors. In our experimental
evaluation we show that our algorithm can handle large tensors and accurately
reconstructs the latent Boolean structure.

%K Computer Science, Data Structures and Algorithms, cs.DS

Conference paper

L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek

“AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases,” in WWW’13, 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013.

mehr

BibTeX

@inproceedings{amie2013,
TITLE = {{AMIE}: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases},
AUTHOR = {Gal{\'a}rraga, Luis and Teflioudi, Christina and Hose, Katja and Suchanek, Fabian M.},
LANGUAGE = {eng},
ISBN = {978-1-4503-2035-1},
URL = {http://dl.acm.org/citation.cfm?id=2488388.2488425},
LOCALID = {Local-ID:C1257ACD0050F94E-F2B50FB8A380EA8EC1257B16005F42C3-amie2013},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {WWW{\textquoteright}13, 22nd International Conference on World Wide Web},
EDITOR = {Schwabe, Daniel and Almeida, Virgilio and Glaser, Hartmut and Baeza-Yates, Ricardo and Moon, Sue},
PAGES = {413--422},
ADDRESS = {Rio de Janeiro, Brazil},
}

Endnote

%0 Conference Proceedings
%A Gal&#225;rraga, Luis
%A Teflioudi, Christina
%A Hose, Katja
%A Suchanek, Fabian M.
%+ Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
%T AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-544F-D
%U http://dl.acm.org/citation.cfm?id=2488388.2488425
%F OTHER: Local-ID:C1257ACD0050F94E-F2B50FB8A380EA8EC1257B16005F42C3-amie2013
%D 2013
%B 22nd International Conference on World Wide Web
%Z date of event: 2013-05-13 - 2013-05-17
%C Rio de Janeiro, Brazil
%B WWW&#8217;13
%E Schwabe, Daniel; Almeida, Virgilio; Glaser, Hartmut; Baeza-Yates, Ricardo; Moon, Sue
%P 413 - 422
%I ACM
%@ 978-1-4503-2035-1

Conference paper

L. Galárraga, N. Preda, and F. M. Suchanek

“Mining Rules to Align Knowledge Bases,” in AKBC’13, 22nd ACM International Conference on Information and Knowledge Management, San Francisco, CA, USA, 2013.

mehr

BibTeX

@inproceedings{rosaakbc2013,
TITLE = {Mining Rules to Align Knowledge Bases},
AUTHOR = {Gal{\'a}rraga, Luis and Preda, Nicoleta and Suchanek, Fabian M.},
LANGUAGE = {eng},
ISBN = {978-1-4503-2411-3},
DOI = {10.1145/2509558.2509566},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {AKBC'13, 22nd ACM International Conference on Information and Knowledge Management},
EDITOR = {Suchanek, Fabian and Riedel, Sebastian and Singh, Sameer and Talukdar, Partha P.},
PAGES = {43--48},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Gal&#225;rraga, Luis
%A Preda, Nicoleta
%A Suchanek, Fabian M.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
%T Mining Rules to Align Knowledge Bases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-B902-2
%R 10.1145/2509558.2509566
%D 2013
%B 22nd ACM International Conference on Information and Knowledge Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%B AKBC'13
%E Suchanek, Fabian; Riedel, Sebastian; Singh, Sameer; Talukdar, Partha P.
%P 43 - 48
%I ACM
%@ 978-1-4503-2411-3

Article

R. Gemulla, P. J. Haas, and W. Lehner

“Non-uniformity Issues and Workarounds in Bounded-size Sampling,” The VLDB Journal, vol. 22, no. 6, 2013.

mehr

Abstract

A variety of schemes have been proposed in the literature to speed up query processing and analytics by incrementally maintaining a bounded-size uniform sample from a dataset in the presence of a sequence of insertion, deletion, and update transactions. These algorithms vary according to whether the dataset is an ordinary set or a multiset and whether the transaction sequence consists only of insertions or can include deletions and updates. We report on subtle non-uniformity issues that we found in a number of these prior bounded-size sampling schemes, including some of our own. We provide workarounds that can avoid the non-uniformity problem; these workarounds are easy to implement and incur negligible additional cost. We also consider the impact of non-uniformity in practice and describe simple statistical tests that can help detect non-uniformity in new algorithms.

BibTeX

@article{Gemulla2012,
TITLE = {Non-uniformity Issues and Workarounds in Bounded-size Sampling},
AUTHOR = {Gemulla, Rainer and Haas, P. J. and Lehner, W.},
LANGUAGE = {eng},
ISSN = {1066-8888},
DOI = {10.1007/s00778-013-0307-0},
LOCALID = {Local-ID: AE61AAD9E8EE81FCC1257B0B00394134-Gemulla2012},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {A variety of schemes have been proposed in the literature to speed up query processing and analytics by incrementally maintaining a bounded-size uniform sample from a dataset in the presence of a sequence of insertion, deletion, and update transactions. These algorithms vary according to whether the dataset is an ordinary set or a multiset and whether the transaction sequence consists only of insertions or can include deletions and updates. We report on subtle non-uniformity issues that we found in a number of these prior bounded-size sampling schemes, including some of our own. We provide workarounds that can avoid the non-uniformity problem; these workarounds are easy to implement and incur negligible additional cost. We also consider the impact of non-uniformity in practice and describe simple statistical tests that can help detect non-uniformity in new algorithms.},
JOURNAL = {The VLDB Journal},
VOLUME = {22},
NUMBER = {6},
PAGES = {753--772},
}

Endnote

%0 Journal Article
%A Gemulla, Rainer
%A Haas, P. J.
%A Lehner, W.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Non-uniformity Issues and Workarounds in Bounded-size Sampling : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A4D-1
%F OTHER: Local-ID: AE61AAD9E8EE81FCC1257B0B00394134-Gemulla2012
%R 10.1007/s00778-013-0307-0
%7 2013-02-14
%D 2013
%X A variety of schemes have been proposed in the literature to speed up query processing and analytics by incrementally maintaining a bounded-size uniform sample from a dataset in the presence of a sequence of insertion, deletion, and update transactions. These algorithms vary according to whether the dataset is an ordinary set or a multiset and whether the transaction sequence consists only of insertions or can include deletions and updates. We report on subtle non-uniformity issues that we found in a number of these prior bounded-size sampling schemes, including some of our own. We provide workarounds that can avoid the non-uniformity problem; these workarounds are easy to implement and incur negligible additional cost. We also consider the impact of non-uniformity in practice and describe simple statistical tests that can help detect non-uniformity in new algorithms.
%K Database sampling, Reservoir sampling, Bernoulli sampling, Sample maintenance
%J The VLDB Journal
%V 22
%N 6
%& 753
%P 753 - 772
%I Springer
%C Berlin
%@ false

Article

F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh

“Set Covering with Our Eyes Closed,” SIAM Journal on Computing, vol. 42, no. 3, 2013.

mehr

Abstract

Given a universe $U$ of $n$ elements and a weighted collection $\mathscr{S}$ of $m$ subsets of $U$, the universal set cover problem is to a priori map each element $u \in U$ to a set $S(u) \in \mathscr{S}$ containing $u$ such that any set $X{\subseteq U}$ is covered by $S(X)=\cup_{u\in XS(u)$. The aim is to find a mapping such that the cost of $S(X)$ is as close as possible to the optimal set cover cost for $X$. (Such problems are also called oblivious or a priori optimization problems.) Unfortunately, for every universal mapping, the cost of $S(X)$ can be $\Omega(\sqrt{n})$ times larger than optimal if the set $X$ is adversarially chosen. In this paper we study the performance on average, when $X$ is a set of randomly chosen elements from the universe: we show how to efficiently find a universal map whose expected cost is $O(\log mn)$ times the expected optimal cost. In fact, we give a slightly improved analysis and show that this is the best possible. We generalize these ideas to weighted set cover and show similar guarantees to (nonmetric) facility location, where we have to balance the facility opening cost with the cost of connecting clients to the facilities. We show applications of our results to universal multicut and disc-covering problems and show how all these universal mappings give us algorithms for the stochastic online variants of the problems with the same competitive factors.

BibTeX

@article{grandoni13set,
TITLE = {Set Covering with Our Eyes Closed},
AUTHOR = {Grandoni, Fabrizio and Gupta, Anupam and Leonardi, Stefano and Miettinen, Pauli and Sankowski, Piotr and Singh, Mohit},
LANGUAGE = {eng},
ISSN = {0097-5397},
DOI = {10.1137/100802888},
LOCALID = {Local-ID: 53C36AED23EF085AC1257C6A005E9F4B-grandoni13set},
PUBLISHER = {SIAM},
ADDRESS = {Philadelphia, PA},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Given a universe $U$ of $n$ elements and a weighted collection $\mathscr{S}$ of $m$ subsets of $U$, the universal set cover problem is to a priori map each element $u \in U$ to a set $S(u) \in \mathscr{S}$ containing $u$ such that any set $X{\subseteq U}$ is covered by $S(X)=\cup_{u\in XS(u)$. The aim is to find a mapping such that the cost of $S(X)$ is as close as possible to the optimal set cover cost for $X$. (Such problems are also called oblivious or a priori optimization problems.) Unfortunately, for every universal mapping, the cost of $S(X)$ can be $\Omega(\sqrt{n})$ times larger than optimal if the set $X$ is adversarially chosen. In this paper we study the performance on average, when $X$ is a set of randomly chosen elements from the universe: we show how to efficiently find a universal map whose expected cost is $O(\log mn)$ times the expected optimal cost. In fact, we give a slightly improved analysis and show that this is the best possible. We generalize these ideas to weighted set cover and show similar guarantees to (nonmetric) facility location, where we have to balance the facility opening cost with the cost of connecting clients to the facilities. We show applications of our results to universal multicut and disc-covering problems and show how all these universal mappings give us algorithms for the stochastic online variants of the problems with the same competitive factors.},
JOURNAL = {SIAM Journal on Computing},
VOLUME = {42},
NUMBER = {3},
PAGES = {808--830},
}

Endnote

%0 Journal Article
%A Grandoni, Fabrizio
%A Gupta, Anupam
%A Leonardi, Stefano
%A Miettinen, Pauli
%A Sankowski, Piotr
%A Singh, Mohit
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Set Covering with Our Eyes Closed : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1C37-0
%R 10.1137/100802888
%F OTHER: Local-ID: 53C36AED23EF085AC1257C6A005E9F4B-grandoni13set
%7 2013-05-09
%D 2013
%X Given a universe $U$ of $n$ elements and a weighted collection $\mathscr{S}$ of $m$ subsets of $U$, the universal set cover problem is to a priori map each element $u \in U$ to a set $S(u) \in \mathscr{S}$ containing $u$ such that any set $X{\subseteq U}$ is covered by $S(X)=\cup_{u\in XS(u)$. The aim is to find a mapping such that the cost of $S(X)$ is as close as possible to the optimal set cover cost for $X$. (Such problems are also called oblivious or a priori optimization problems.) Unfortunately, for every universal mapping, the cost of $S(X)$ can be $\Omega(\sqrt{n})$ times larger than optimal if the set $X$ is adversarially chosen. In this paper we study the performance on average, when $X$ is a set of randomly chosen elements from the universe: we show how to efficiently find a universal map whose expected cost is $O(\log mn)$ times the expected optimal cost. In fact, we give a slightly improved analysis and show that this is the best possible. We generalize these ideas to weighted set cover and show similar guarantees to (nonmetric) facility location, where we have to balance the facility opening cost with the cost of connecting clients to the facilities. We show applications of our results to universal multicut and disc-covering problems and show how all these universal mappings give us algorithms for the stochastic online variants of the problems with the same competitive factors.
%J SIAM Journal on Computing
%V 42
%N 3
%& 808
%P 808 - 830
%I SIAM
%C Philadelphia, PA
%@ false

Conference paper

A. Grycner, P. Ernst, A. Siu, and G. Weikum

“Knowledge Discovery on Incompatibility of Medical Concepts,” in Computational Linguistics and Intelligent Text Processing (CICLing 2013), Samos, Greece, 2013.

mehr

Abstract

This work proposes a method for automatically discovering incompatible medical

concepts in text corpora. The approach is distantly supervised based on a seed

set of incompatible concept pairs like symptoms or conditions that rule each

other out. Two concepts are considered incompatible if their definitions match

a template, and contain an antonym pair derived from WordNet, VerbOcean, or a

hand-crafted lexicon. Our method creates templates from dependency parse trees

of definitional texts, using seed pairs. The templates are applied to a text

corpus, and the resulting candidate pairs are categorized and ranked by

statistical measures. Since experiments show that the results face semantic

ambiguity problems, we further cluster the results into different categories.

We applied this approach to the concepts in Unified Medical Language System,

Human Phenotype Ontology, and Mammalian Phenotype Ontology. Out of 77,496

definitions, 1,958 concept pairs were detected as incompatible with an average

precision of 0.80.

BibTeX

@inproceedings{Grycner2013,
TITLE = {Knowledge Discovery on Incompatibility of Medical Concepts},
AUTHOR = {Grycner, Adam and Ernst, Patrick and Siu, Amy and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-3-642-37246-9},
DOI = {10.1007/978-3-642-37247-6_10},
LOCALID = {Local-ID: 2C3D152169C55F01C1257B160035B6E6-Grycner2013},
PUBLISHER = {Springer},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {This work proposes a method for automatically discovering incompatible medical concepts in text corpora. The approach is distantly supervised based on a seed set of incompatible concept pairs like symptoms or conditions that rule each other out. Two concepts are considered incompatible if their definitions match a template, and contain an antonym pair derived from WordNet, VerbOcean, or a hand-crafted lexicon. Our method creates templates from dependency parse trees of definitional texts, using seed pairs. The templates are applied to a text corpus, and the resulting candidate pairs are categorized and ranked by statistical measures. Since experiments show that the results face semantic ambiguity problems, we further cluster the results into different categories. We applied this approach to the concepts in Unified Medical Language System, Human Phenotype Ontology, and Mammalian Phenotype Ontology. Out of 77,496 definitions, 1,958 concept pairs were detected as incompatible with an average precision of 0.80.},
BOOKTITLE = {Computational Linguistics and Intelligent Text Processing (CICLing 2013)},
EDITOR = {Gelbukh, Alexander},
PAGES = {114--125},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {7816},
ADDRESS = {Samos, Greece},
}

Endnote

%0 Conference Proceedings
%A Grycner, Adam
%A Ernst, Patrick
%A Siu, Amy
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Discovery on Incompatibility of Medical Concepts : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A54-F
%F OTHER: Local-ID: 2C3D152169C55F01C1257B160035B6E6-Grycner2013
%R 10.1007/978-3-642-37247-6_10
%D 2013
%B 14th International Conference on Computational Linguistics and Intelligent Text Processing
%Z date of event: 2013-03-24 - 2013-03-30
%C Samos, Greece
%X This work proposes a method for automatically discovering incompatible medical 
concepts in text corpora. The approach is distantly supervised based on a seed 
set of incompatible concept pairs like symptoms or conditions that rule each 
other out. Two concepts are considered incompatible if their definitions match 
a template, and contain an antonym pair derived from WordNet, VerbOcean, or a 
hand-crafted lexicon. Our method creates templates from dependency parse trees 
of definitional texts, using seed pairs. The templates are applied to a text 
corpus, and the resulting candidate pairs are categorized and ranked by 
statistical measures. Since experiments show that the results face semantic 
ambiguity problems, we further cluster the results into different categories. 
We applied this approach to the concepts in Unified Medical Language System, 
Human Phenotype Ontology, and Mammalian Phenotype Ontology. Out of 77,496 
definitions, 1,958 concept pairs were detected as incompatible with an average 
precision of 0.80.
%B Computational Linguistics and Intelligent Text Processing
%E Gelbukh, Alexander
%P 114 - 125
%I Springer
%@ 978-3-642-37246-9
%B Lecture Notes in Computer Science
%N 7816

Conference paper

A. Gubichev, S. Bedathur, and S. Seufert

“Sparqling Kleene - Fast Property Paths in RDF-3X,” in First International Workshop on Graph Data Management Experiences and Systems (GRADES 2013), New York, NY, USA, 2013.

mehr

BibTeX

@inproceedings{Gubichev2013,
TITLE = {Sparqling {Kleene} -- Fast Property Paths in {RDF-3X}},
AUTHOR = {Gubichev, Andrey and Bedathur, Srikanta and Seufert, Stephan},
LANGUAGE = {eng},
ISBN = {978-1-4503-2188-4},
DOI = {10.1145/2484425.2484443},
LOCALID = {Local-ID: 2307D92E4A8D0ABFC1257C680057DFE6-Gubichev2013},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {First International Workshop on Graph Data Management Experiences and Systems (GRADES 2013)},
EDITOR = {Boncz, Peter A. and Neumann, Thomas},
PAGES = {1--7},
EID = {14},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Gubichev, Andrey
%A Bedathur, Srikanta
%A Seufert, Stephan
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Sparqling Kleene - Fast Property Paths in RDF-3X : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A87-C
%R 10.1145/2484425.2484443
%F OTHER: Local-ID: 2307D92E4A8D0ABFC1257C680057DFE6-Gubichev2013
%D 2013
%B First International Workshop on Graph Data Management Experiences and Systems
%Z date of event: 2013-06-22 - 2013-06-27
%C New York, NY, USA
%B First International Workshop on Graph Data Management Experiences and Systems
%E Boncz, Peter A.; Neumann, Thomas
%P 1 - 7
%Z sequence number: 14
%I ACM
%@ 978-1-4503-2188-4

Conference paper

S. Gurajada, J. Kamps, A. Mishra, R. Schenkel, M. Theobald, and Q. Wang

“Overview of the INEX 2013 Linked Data Track,” in Working Notes for the CLEF 2013 Conference, Valencia, Spain, 2013.

mehr

Abstract

This paper provides an overview of the INEX Linked Data

Track, which went into its second iteration in 2013.

BibTeX

@inproceedings{INEX-LD-2012,
TITLE = {Overview of the {INEX} 2013 Linked Data Track},
AUTHOR = {Gurajada, Sairam and Kamps, Jaap and Mishra, Arunav and Schenkel, Ralf and Theobald, Martin and Wang, Qiuyue},
LANGUAGE = {eng},
LOCALID = {Local-ID: 60E4C9459DE8213AC1257BBB003DE4C9-INEX-LD-2012},
PUBLISHER = {CLEF Initiative},
YEAR = {2013},
ABSTRACT = {This paper provides an overview of the INEX Linked Data Track, which went into its second iteration in 2013.},
BOOKTITLE = {Working Notes for the CLEF 2013 Conference},
EDITOR = {Forner, Pamela and Navigli, Roberto and Tufis, Dan},
ADDRESS = {Valencia, Spain},
}

Endnote

%0 Conference Proceedings
%A Gurajada, Sairam
%A Kamps, Jaap
%A Mishra, Arunav
%A Schenkel, Ralf
%A Theobald, Martin
%A Wang, Qiuyue
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Overview of the INEX 2013 Linked Data Track : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A84-7
%F OTHER: Local-ID: 60E4C9459DE8213AC1257BBB003DE4C9-INEX-LD-2012
%D 2013
%B CLEF 2013 Evaluation Labs and Workshop
%Z date of event: 2013-09-23 - 2013-09-26
%C Valencia, Spain
%X This paper provides an overview of the INEX Linked Data
Track, which went into its second iteration in 2013.
%B Working Notes for the CLEF 2013 Conference
%E Forner, Pamela; Navigli, Roberto; Tufis, Dan
%I CLEF Initiative
%U http://www.clef-initiative.eu/documents/71612/2b349f08-de37-41a9-bb62-40c91f1daa0b

Article

J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum

“YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia,” Artificial Intelligence, vol. 194, 2013.

mehr

BibTeX

@article{yago2aij2013,
TITLE = {{YAGO2}: A Spatially and Temporally Enhanced Knowledge Base from {Wikipedia}},
AUTHOR = {Hoffart, Johannes and Suchanek, Fabian M. and Berberich, Klaus and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {0004-3702},
URL = {http://www.sciencedirect.com/science/article/pii/S0004370212000719},
DOI = {10.1016/j.artint.2012.06.001},
LOCALID = {Local-ID:C1257ACD0050F94E-8D0B6EF25CD7906FC1257B1600621DB6-yago2@aij2013},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2013},
DATE = {2013},
JOURNAL = {Artificial Intelligence},
VOLUME = {194},
PAGES = {28--61},
}

Endnote

%0 Journal Article
%A Hoffart, Johannes
%A Suchanek, Fabian M.
%A Berberich, Klaus
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-543A-C
%R 10.1016/j.artint.2012.06.001
%U http://www.sciencedirect.com/science/article/pii/S0004370212000719
%F OTHER: Local-ID:C1257ACD0050F94E-8D0B6EF25CD7906FC1257B1600621DB6-yago2@aij2013
%7 2012-06-18
%D 2013
%J Artificial Intelligence
%O AI
%V 194
%& 28
%P 28 - 61
%I Elsevier
%C Amsterdam
%@ false

Conference paper

J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum

“YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract,” in 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), Beijing, China, 2013.

mehr

Abstract

We present YAGO2, an extension of the YAGO knowledge base, in which entities,

facts, and events are anchored in both time and space. YAGO2 is built

automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million

facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95

of the facts in YAGO2. In this paper, we present the extraction methodology and

the integration of the spatio-temporal dimension.

BibTeX

@inproceedings{Hoffart2013ww,
TITLE = {{YAGO2:} {A} Spatially and Temporally Enhanced Knowledge Base from {Wikipedia}: Extended Abstract},
AUTHOR = {Hoffart, Johannes and Suchanek, Fabian M. and Berberich, Klaus and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-57735-633-2},
LOCALID = {Local-ID: 0F08380C815DF7A8C1257C6100731377-Hoffart2013ww},
PUBLISHER = {AAAI},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95 of the facts in YAGO2. In this paper, we present the extraction methodology and the integration of the spatio-temporal dimension.},
BOOKTITLE = {23rd International Joint Conference on Artificial Intelligence (IJCAI 2013)},
PAGES = {3161--3165},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%A Suchanek, Fabian M.
%A Berberich, Klaus
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1AC8-9
%F OTHER: Local-ID: 0F08380C815DF7A8C1257C6100731377-Hoffart2013ww
%D 2013
%B 23rd International Joint Conference on Artificial Intelligence
%Z date of event: 2013-08-03 - 2013-08-09
%C Beijing, China
%X We present YAGO2, an extension of the YAGO knowledge base, in which entities, 
facts, and events are anchored in both time and space. YAGO2 is built 
automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million 
facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95 
of the facts in YAGO2. In this paper, we present the extraction methodology and 
the integration of the spatio-temporal dimension.
%B 23rd International Joint Conference on Artificial Intelligence
%P 3161 - 3165
%I AAAI
%@ 978-1-57735-633-2 
%U http://ijcai.org/papers13/Papers/IJCAI13-478.pdf

Conference paper

J. Hoffart

“Discovering and Disambiguating Named Entities in Text,” in SIGMOD’13 PhD Symposium, New York, NY, USA, 2013.

mehr

Abstract

Disambiguating named entities in natural language texts maps ambiguous names to

canonical entities registered in a knowledge base such as DBpedia, Freebase, or

YAGO. Knowing the specific entity is an important asset for several other

tasks, e.g. entity-based information retrieval or higher-level information

extraction. Our approach to named entity disambiguation makes use of several

ingredients: the prior probability of an entity being mentioned, the similarity

between the context of the mention in the text and an entity, as well as the

coherence among the entities. Extending this method, we present a novel and

highly efficient measure to compute the semantic coherence between entities.

This measure is especially powerful for long-tail entities or such entities

that are not yet present in the knowledge base. Reliably identifying names in

the input text that are not part of the knowledge base is the current focus of

our work.

BibTeX

@inproceedings{Hoffart2013wk,
TITLE = {Discovering and Disambiguating Named Entities in Text},
AUTHOR = {Hoffart, Johannes},
LANGUAGE = {eng},
ISBN = {978-1-4503-2155-6},
DOI = {10.1145/2483574.2483582},
LOCALID = {Local-iD: CA2056C02ACB8EDDC1257C6100744944-Hoffart2013wk},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Disambiguating named entities in natural language texts maps ambiguous names to canonical entities registered in a knowledge base such as DBpedia, Freebase, or YAGO. Knowing the specific entity is an important asset for several other tasks, e.g. entity-based information retrieval or higher-level information extraction. Our approach to named entity disambiguation makes use of several ingredients: the prior probability of an entity being mentioned, the similarity between the context of the mention in the text and an entity, as well as the coherence among the entities. Extending this method, we present a novel and highly efficient measure to compute the semantic coherence between entities. This measure is especially powerful for long-tail entities or such entities that are not yet present in the knowledge base. Reliably identifying names in the input text that are not part of the knowledge base is the current focus of our work.},
BOOKTITLE = {SIGMOD{\textquoteright}13 PhD Symposium},
EDITOR = {Lei, Chen and Dong, Xin Luna},
PAGES = {43--48},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering and Disambiguating Named Entities in Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A8E-D
%F OTHER: Local-iD: CA2056C02ACB8EDDC1257C6100744944-Hoffart2013wk
%R 10.1145/2483574.2483582
%D 2013
%B SIGMOD/PODS PhD Symposium
%Z date of event: 2013-06-23 - 2013-06-23
%C New York, NY, USA
%X Disambiguating named entities in natural language texts maps ambiguous names to 
canonical entities registered in a knowledge base such as DBpedia, Freebase, or 
YAGO. Knowing the specific entity is an important asset for several other 
tasks, e.g. entity-based information retrieval or higher-level information 
extraction. Our approach to named entity disambiguation makes use of several 
ingredients: the prior probability of an entity being mentioned, the similarity 
between the context of the mention in the text and an entity, as well as the 
coherence among the entities. Extending this method, we present a novel and 
highly efficient measure to compute the semantic coherence between entities. 
This measure is especially powerful for long-tail entities or such entities 
that are not yet present in the knowledge base. Reliably identifying names in 
the input text that are not part of the knowledge base is the current focus of 
our work.
%B SIGMOD&#8217;13 PhD Symposium
%E Lei, Chen; Dong, Xin Luna
%P 43 - 48
%I ACM
%@ 978-1-4503-2155-6

Conference paper

K. Hose and R. Schenkel

“WARP: Workload-Aware Replication and Partitioning for RDF,” in 4th International Workshop on Data Engineering meets Semantic Web (DESWeb 2013), Brisbane, Australia, 2013.

mehr

BibTeX

@inproceedings{HoseSchenkel_DESWeb2013,
TITLE = {{WARP}: Workload-Aware Replication and Partitioning for {RDF}},
AUTHOR = {Hose, Katja and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1-4673-5303-8},
DOI = {10.1109/ICDEW.2013.6547414},
LOCALID = {Local-ID: 17425053968C448EC1257AD100350E0C-HoseSchenkel_DESWeb2013},
PUBLISHER = {IEEE},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {4th International Workshop on Data Engineering meets Semantic Web (DESWeb 2013)},
PAGES = {1--6},
ADDRESS = {Brisbane, Australia},
}

Endnote

%0 Conference Proceedings
%A Hose, Katja
%A Schenkel, Ralf
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T WARP: Workload-Aware Replication and Partitioning for RDF : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A99-4
%F OTHER: Local-ID: 17425053968C448EC1257AD100350E0C-HoseSchenkel_DESWeb2013
%R 10.1109/ICDEW.2013.6547414
%D 2013
%B 4th International Workshop on Data Engineering meets Semantic Web
%Z date of event: 2013-04-08 - 2013-04-12
%C Brisbane, Australia
%B 4th International Workshop on Data Engineering meets Semantic Web 
%P 1 - 6
%I IEEE
%@ 978-1-4673-5303-8

Conference paper

T. Huet, J. Biega, and F. Suchanek

“Mining History with Le Monde,” in AKBC’13, 22nd ACM International Conference on Information and Knowledge Management, San Francisco, CA, USA, 2013.

mehr

BibTeX

@inproceedings{Huet:2013,
TITLE = {Mining History with Le Monde},
AUTHOR = {Huet, Thomas and Biega, Joanna and Suchanek, Fabian},
LANGUAGE = {eng},
ISBN = {978-1-4503-2411-3},
DOI = {10.1145/2509558.2509567},
PUBLISHER = {ACM},
YEAR = {2013},
BOOKTITLE = {AKBC'13, 22nd ACM International Conference on Information and Knowledge Management},
EDITOR = {Suchanek, Fabian and Riedel, Sebastian and Singh, Sameer and Talukdar, Partha P.},
PAGES = {49--54},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Huet, Thomas
%A Biega, Joanna
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mining History with Le Monde : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5149-B
%R 10.1145/2509558.2509567
%D 2013
%8 27.10.2013
%B 22nd ACM International Conference on Information and Knowledge Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%K culturomics, knowledge base, le monde, yago
%B AKBC'13
%E Suchanek, Fabian; Riedel, Sebastian; Singh, Sameer; Talukdar, Partha P.
%P 49 - 54
%I ACM
%@ 978-1-4503-2411-3

Conference paper

E. Ilieva, S. Michel, and A. Stupar

“The Essence of Knowledge (bases) Through Entity Rankings,” in CIKM’13, 22nd ACM International Conference of Information & Knowledge Management, San Francisco, CA, USA, 2013.

mehr

Abstract

We consider the task of automatically phrasing and computing top-k rankings

over the information contained in common knowledge bases (KBs), such as YAGO or

DBPedia. We assemble the thematic focus and ranking criteria of rankings by

inspecting the present Subject, Predicate, Object (SPO) triples. Making use of

numerical attributes contained in the KB we are also able to compute the actual

ranking content, i.e., entities and their performances. We further discuss the

integration of existing rankings into the ranking generation process for

increased coverage and ranking quality. We report on first results obtained

using the YAGO knowledge base.

BibTeX

@inproceedings{Ilieva2013z,
TITLE = {The Essence of Knowledge (bases) Through Entity Rankings},
AUTHOR = {Ilieva, Evica and Michel, Sebastian and Stupar, Aleksandar},
LANGUAGE = {eng},
ISBN = {978-1-4503-2263-8},
DOI = {10.1145/2505515.2507838},
LOCALID = {Local-ID: 62BCC454FD2DBDEEC1257C690042AF3D-Ilieva2013z},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {We consider the task of automatically phrasing and computing top-k rankings over the information contained in common knowledge bases (KBs), such as YAGO or DBPedia. We assemble the thematic focus and ranking criteria of rankings by inspecting the present Subject, Predicate, Object (SPO) triples. Making use of numerical attributes contained in the KB we are also able to compute the actual ranking content, i.e., entities and their performances. We further discuss the integration of existing rankings into the ranking generation process for increased coverage and ranking quality. We report on first results obtained using the YAGO knowledge base.},
BOOKTITLE = {CIKM'13, 22nd ACM International Conference of Information \& Knowledge Management},
EDITOR = {Nejdl, Wolfgang and Pei, Jian and Rastogi, Rajeev},
PAGES = {1537--1540},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Ilieva, Evica
%A Michel, Sebastian
%A Stupar, Aleksandar
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T The Essence of Knowledge (bases) Through Entity Rankings : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1A33-A
%R 10.1145/2505515.2507838
%F OTHER: Local-ID: 62BCC454FD2DBDEEC1257C690042AF3D-Ilieva2013z
%D 2013
%B 22nd ACM International Conference of Information & Knowledge Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%X We consider the task of automatically phrasing and computing top-k rankings 
over the information contained in common knowledge bases (KBs), such as YAGO or 
DBPedia. We assemble the thematic focus and ranking criteria of rankings by 
inspecting the present Subject, Predicate, Object (SPO) triples. Making use of 
numerical attributes contained in the KB we are also able to compute the actual 
ranking content, i.e., entities and their performances. We further discuss the 
integration of existing rankings into the ranking generation process for 
increased coverage and ranking quality. We report on first results obtained 
using the YAGO knowledge base.
%B CIKM'13
%E Nejdl, Wolfgang; Pei, Jian; Rastogi, Rajeev
%P 1537 - 1540
%I ACM
%@ 978-1-4503-2263-8

Thesis

D5IMPR-CS

E. Ilieva

“Analyzing and Creating Top-k Entity Rankings,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{Ilieva2013,
TITLE = {Analyzing and Creating Top-k Entity Rankings},
AUTHOR = {Ilieva, Evica},
LANGUAGE = {eng},
LOCALID = {Local-ID: DDA2710C9D0C5B92C1257BF00027BC81-Ilieva2013z},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
}

Endnote

%0 Thesis
%A Ilieva, Evica
%Y Michel, Sebastian
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Analyzing and Creating Top-k Entity Rankings : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1AC1-8
%F OTHER: Local-ID: DDA2710C9D0C5B92C1257BF00027BC81-Ilieva2013z
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%P 67 p.
%V master
%9 master

Conference paper

L. Jiang, Y. Wang, J. Hoffart, and G. Weikum

“Crowdsourced Entity Markup,” in Proceedings of the 1st International Workshop on Crowdsourcing the Semantic Web co-located with 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia, 2013.

mehr

BibTeX

@inproceedings{Jiang2013,
TITLE = {Crowdsourced Entity Markup},
AUTHOR = {Jiang, Lili and Wang, Yafang and Hoffart, Johannes and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-1030-0},
LOCALID = {Local-ID: 4A6F03891D73CF9DC1257C68005859B2-Jiang2013},
PUBLISHER = {CEUR-WS.org},
YEAR = {2013},
BOOKTITLE = {Proceedings of the 1st International Workshop on Crowdsourcing the Semantic Web co-located with 12th International Semantic Web Conference (ISWC 2013)},
EDITOR = {Acosta, Maribel and Aroyo, Lora and Bernstein, Abraham and Lehmann, Jens and Noy, Natasha and Simperl, Elena},
PAGES = {65--68},
SERIES = {CEUR Workshop Proceedings},
EDITOR = {Acosta, Maribel},
VOLUME = {1030},
PAGES = {59--68},
ADDRESS = {Sydney, Australia},
}

Endnote

%0 Conference Proceedings
%A Jiang, Lili
%A Wang, Yafang
%A Hoffart, Johannes
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Crowdsourced Entity Markup : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1AD0-6
%F OTHER: Local-ID: 4A6F03891D73CF9DC1257C68005859B2-Jiang2013
%U urn:nbn:de:0074-1030-0
%D 2013
%8 06.09.2013
%B 1st International Workshop on Crowdsourcing the Semantic Web co-located with 12th International Semantic Web Conference
%Z date of event: 2013-10-21 - 2013-10-25
%C Sydney, Australia
%B Proceedings of the 1st International Workshop on Crowdsourcing the Semantic Web co-located with 12th International Semantic Web Conference
%E Acosta, Maribel; Aroyo, Lora; Bernstein, Abraham; Lehmann, Jens; Noy, Natasha; Simperl, Elena
%P 65 - 68
%I CEUR-WS.org
%B CEUR Workshop Proceedings
%Y Acosta, Maribel
%N 1030
%P 59 - 68
%@ false

Conference paper

L. Jiang, P. Luo, J. Wang, Y. Xiong, B. Lin, M. Wang, and N. An

“GRIAS: An Entity-Relation Graph Based Framework For Discovering Entity Aliases,” in IEEE 13th International Conference on Data Mining (ICDM 2013), Dallas, TX, USA, 2013.

mehr

BibTeX

@inproceedings{Jiang2013y,
TITLE = {{GRIAS}: An Entity-Relation Graph Based Framework For Discovering Entity Aliases},
AUTHOR = {Jiang, Lili and Luo, Ping and Wang, Jianyong and Xiong, Yuhong and Lin, Binduan and Wang, Min and An, Ning},
LANGUAGE = {eng},
ISSN = {1550-4786},
DOI = {10.1109/ICDM.2013.50},
LOCALID = {Local-ID: C5190A26C9118030C1257C68005B209A-Jiang2013y},
PUBLISHER = {IEEE},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {IEEE 13th International Conference on Data Mining (ICDM 2013)},
PAGES = {310--319},
ADDRESS = {Dallas, TX, USA},
}

Endnote

%0 Conference Proceedings
%A Jiang, Lili
%A Luo, Ping
%A Wang, Jianyong
%A Xiong, Yuhong
%A Lin, Binduan
%A Wang, Min
%A An, Ning
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
%T GRIAS: An Entity-Relation Graph Based Framework For Discovering Entity Aliases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1ADF-8
%R 10.1109/ICDM.2013.50
%F OTHER: Local-ID: C5190A26C9118030C1257C68005B209A-Jiang2013y
%D 2013
%B 13th International Conference on Data Mining
%Z date of event: 2013-12-07 - 2013-12-10
%C Dallas, TX, USA
%B IEEE 13th International Conference on Data Mining
%P 310 - 319
%I IEEE
%@ false

Thesis

S. Karaev

“Matrix Factorization over Max-times Algebra for Data Mining,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{KaraevMaster2013,
TITLE = {Matrix Factorization over Max-times Algebra for Data Mining},
AUTHOR = {Karaev, Sanjar},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
}

Endnote

%0 Thesis
%A Karaev, Sanjar
%Y Miettinen, Pauli
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Matrix Factorization over Max-times Algebra for Data Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-9DD1-8
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%P X, 57 p.
%V master
%9 master

Conference paper

S. K. Kondreddi, P. Triantafillou, and G. Weikum

“Human Computing Games for Knowledge Acquisition,” in CIKM’13, 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 2013.

mehr

Abstract

Automatic information extraction techniques for knowledge acquisition are known

to produce noise, incomplete or incorrect facts from textual sources. Human

computing offers a natural alternative to expand and complement the output of

automated information extraction methods, thereby enabling us to build

high-quality knowledge bases. However, relying solely on human inputs for

extraction can be prohibitively expensive in practice. We demonstrate human

computing games for knowledge acquisition that employ human computing to

overcome the limitations in automated fact acquisition methods. We provide a

combined approach that tightly integrates automated extraction techniques with

human computing for effective gathering of facts. The methods we provide gather

facts in the form of relationships between entities. The games we demonstrate

are specifically designed to capture hard-to-extract relations between entities

in narrative text -- a task that automated systems find challenging.

BibTeX

@inproceedings{Kondreddi2013b,
TITLE = {Human Computing Games for Knowledge Acquisition},
AUTHOR = {Kondreddi, Sarath Kumar and Triantafillou, Peter and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2263-8},
DOI = {10.1145/2505515.2508213},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Automatic information extraction techniques for knowledge acquisition are known to produce noise, incomplete or incorrect facts from textual sources. Human computing offers a natural alternative to expand and complement the output of automated information extraction methods, thereby enabling us to build high-quality knowledge bases. However, relying solely on human inputs for extraction can be prohibitively expensive in practice. We demonstrate human computing games for knowledge acquisition that employ human computing to overcome the limitations in automated fact acquisition methods. We provide a combined approach that tightly integrates automated extraction techniques with human computing for effective gathering of facts. The methods we provide gather facts in the form of relationships between entities. The games we demonstrate are specifically designed to capture hard-to-extract relations between entities in narrative text -- a task that automated systems find challenging.},
BOOKTITLE = {CIKM'13, 22nd ACM International Conference on Information \& Knowledge Management},
EDITOR = {Nejdl, Wolfgang and Pei, Jian and Rastogi, Rajeev},
PAGES = {2513--2516},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Kondreddi, Sarath Kumar
%A Triantafillou, Peter
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Human Computing Games for Knowledge Acquisition : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1C65-8
%@ 978-1-4503-2263-8
%R 10.1145/2505515.2508213
%D 2013
%B 22nd ACM International Conference on Information & Knowledge
Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%X Automatic information extraction techniques for knowledge acquisition are known 
to produce noise, incomplete or incorrect facts from textual sources. Human 
computing offers a natural alternative to expand and complement the output of 
automated information extraction methods, thereby enabling us to build 
high-quality knowledge bases. However, relying solely on human inputs for 
extraction can be prohibitively expensive in practice. We demonstrate human 
computing games for knowledge acquisition that employ human computing to 
overcome the limitations in automated fact acquisition methods. We provide a 
combined approach that tightly integrates automated extraction techniques with 
human computing for effective gathering of facts. The methods we provide gather 
facts in the form of relationships between entities. The games we demonstrate 
are specifically designed to capture hard-to-extract relations between entities 
in narrative text -- a task that automated systems find challenging.
%B CIKM'13
%E Nejdl, Wolfgang; Pei, Jian; Rastogi, Rajeev
%P 2513 - 2516
%I ACM

Conference paper

S. K. Kondreddi, P. Triantafillou, and G. Weikum

“HIGGINS: Knowledge Acquisition Meets the Crowds,” in WWW’13, 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013.

mehr

Abstract

We present HIGGINS, a system for \em Knowledge Acquisition (KA)}, placing

emphasis on its architecture. The distinguishing characteristic and novelty of

HIGGINS lies in its blending of two engines: an automated {\em Information

Extraction (IE)} engine, aided by {\em semantic resources} and {\em

statistics}, and a game-based {\em Human Computing (HC) engine. We focus on KA

from web pages and text sources and, in particular, on deriving relationships

between entities. As a running application we utilize movie narratives, from

which we wish to derive relationships among movie characters.

BibTeX

@inproceedings{Kondreddi2013a,
TITLE = {{HIGGINS}: Knowledge Acquisition Meets the Crowds},
AUTHOR = {Kondreddi, Sarath Kumar and Triantafillou, Peter and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2038-2},
URL = {http://dl.acm.org/citation.cfm?id=2487788.2487825},
LOCALID = {Local-ID: 6A913522403405EBC1257B3A003B5625-Kondreddi2013a},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {We present HIGGINS, a system for \em Knowledge Acquisition (KA)}, placing emphasis on its architecture. The distinguishing characteristic and novelty of HIGGINS lies in its blending of two engines: an automated {\em Information Extraction (IE)} engine, aided by {\em semantic resources} and {\em statistics}, and a game-based {\em Human Computing (HC) engine. We focus on KA from web pages and text sources and, in particular, on deriving relationships between entities. As a running application we utilize movie narratives, from which we wish to derive relationships among movie characters.},
BOOKTITLE = {WWW'13, 22nd International Conference on World Wide Web},
EDITOR = {Schwabe, Daniel and Almeida, Virgilio and Glaser, Hartmut and Baeza-Yates, Ricardo and Moon, Sue},
PAGES = {85--86},
ADDRESS = {Rio de Janeiro, Brazil},
}

Endnote

%0 Conference Proceedings
%A Kondreddi, Sarath Kumar
%A Triantafillou, Peter
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T HIGGINS: Knowledge Acquisition Meets the Crowds : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1B79-4
%F OTHER: Local-ID: 6A913522403405EBC1257B3A003B5625-Kondreddi2013a
%U http://dl.acm.org/citation.cfm?id=2487788.2487825
%D 2013
%B 22nd International Conference on World Wide Web
%Z date of event: 2013-05-13 - 2013-05-17
%C Rio de Janeiro, Brazil
%X We present HIGGINS, a system for \em Knowledge Acquisition (KA)}, placing 
emphasis on its architecture. The distinguishing characteristic and novelty of 
HIGGINS lies in its blending of two engines: an automated {\em Information 
Extraction (IE)} engine, aided by {\em semantic resources} and {\em 
statistics}, and a game-based {\em Human Computing (HC) engine. We focus on KA 
from web pages and text sources and, in particular, on deriving relationships 
between entities. As a running application we utilize movie narratives, from 
which we wish to derive relationships among movie characters.
%B WWW'13
%E Schwabe, Daniel; Almeida, Virgilio; Glaser, Hartmut; Baeza-Yates, Ricardo; Moon, Sue
%P 85 - 86
%I ACM
%@ 978-1-4503-2038-2

Conference paper

K.-N. Kontonasios, J. Vreeken, and T. De Bie

“Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-valued Data,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2013), Prague, Czech Republic, 2013.

mehr

BibTeX

@inproceedings{Konto2013a,
TITLE = {Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-valued Data},
AUTHOR = {Kontonasios, Kleanthis-Nikolaos and Vreeken, Jilles and De Bie, Tijl},
LANGUAGE = {eng},
ISBN = {978-3-642-33485-6},
DOI = {10.1007/978-3-642-40991-2_17},
LOCALID = {Local-ID: ED5813E38D4C4066C1257C6000547D94-Konto2013a},
PUBLISHER = {Springer},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2013)},
EDITOR = {Blockeel, Hendrik and Kersting, Kristian and Nijssen, Siegfried and {\v Z}elenzn{\'y}, Filip},
PAGES = {256--271},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {8189},
ADDRESS = {Prague, Czech Republic},
}

Endnote

%0 Conference Proceedings
%A Kontonasios, Kleanthis-Nikolaos
%A Vreeken, Jilles
%A De Bie, Tijl
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-valued Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1CCC-4
%F OTHER: Local-ID: ED5813E38D4C4066C1257C6000547D94-Konto2013a
%R 10.1007/978-3-642-40991-2_17
%D 2013
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2013-09-23 - 2013-09-27
%C Prague, Czech Republic
%B Machine Learning and Knowledge Discovery in Databases
%E Blockeel, Hendrik; Kersting, Kristian; Nijssen, Siegfried; &#381;elenzn&#253;, Filip
%P 256 - 271
%I Springer
%@ 978-3-642-33485-6
%B Lecture Notes in Artificial Intelligence
%N 8189

Report

D5D1

F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, and M. Sozio

“A Distributed Algorithm for Large-scale Generalized Matching,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2013-5-002, 2013.

mehr

Abstract

Generalized matching problems arise in a number of applications, including

computational advertising, recommender systems, and trade markets. Consider,

for example, the problem of recommending multimedia items (e.g., DVDs) to

users such that (1) users are recommended items that they are likely to be

interested in, (2) every user gets neither too few nor too many

recommendations, and (3) only items available in stock are recommended to

users. State-of-the-art matching algorithms fail at coping with large

real-world instances, which may involve millions of users and items. We

propose the first distributed algorithm for computing near-optimal solutions

to large-scale generalized matching problems like the one above. Our algorithm

is designed to run on a small cluster of commodity nodes (or in a MapReduce

environment), has strong approximation guarantees, and requires only a

poly-logarithmic number of passes over the input. In particular, we propose a

novel distributed algorithm to approximately solve mixed packing-covering

linear programs, which include but are not limited to generalized matching

problems. Experiments on real-world and synthetic data suggest that our

algorithm scales to very large problem sizes and can be orders of magnitude

faster than alternative approaches.

BibTeX

@techreport{MakariAwerbuchGemullaKhandekarMestreSozio2013,
TITLE = {A Distributed Algorithm for Large-scale Generalized Matching},
AUTHOR = {Makari, Faraz and Awerbuch, Baruch and Gemulla, Rainer and Khandekar, Rohit and Mestre, Julian and Sozio, Mauro},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2013-5-002},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
ABSTRACT = {Generalized matching problems arise in a number of applications, including computational advertising, recommender systems, and trade markets. Consider, for example, the problem of recommending multimedia items (e.g., DVDs) to users such that (1) users are recommended items that they are likely to be interested in, (2) every user gets neither too few nor too many recommendations, and (3) only items available in stock are recommended to users. State-of-the-art matching algorithms fail at coping with large real-world instances, which may involve millions of users and items. We propose the first distributed algorithm for computing near-optimal solutions to large-scale generalized matching problems like the one above. Our algorithm is designed to run on a small cluster of commodity nodes (or in a MapReduce environment), has strong approximation guarantees, and requires only a poly-logarithmic number of passes over the input. In particular, we propose a novel distributed algorithm to approximately solve mixed packing-covering linear programs, which include but are not limited to generalized matching problems. Experiments on real-world and synthetic data suggest that our algorithm scales to very large problem sizes and can be orders of magnitude faster than alternative approaches.},
TYPE = {Research Reports},
}

Endnote

%0 Report
%A Makari, Faraz
%A Awerbuch, Baruch
%A Gemulla, Rainer
%A Khandekar, Rohit
%A Mestre, Julian
%A Sozio, Mauro
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Algorithms and Complexity, MPI for Informatics, Max Planck Society
Algorithms and Complexity, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Distributed Algorithm for Large-scale Generalized Matching : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-03B4-3
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2013
%P 39 p.
%X Generalized matching problems arise in a number of applications, including
 computational advertising, recommender systems, and trade markets. Consider,
 for example, the problem of recommending multimedia items (e.g., DVDs) to
 users such that (1) users are recommended items that they are likely to be
 interested in, (2) every user gets neither too few nor too many
 recommendations, and (3) only items available in stock are recommended to
 users. State-of-the-art matching algorithms fail at coping with large
 real-world instances, which may involve millions of users and items. We
 propose the first distributed algorithm for computing near-optimal solutions
 to large-scale generalized matching problems like the one above. Our algorithm
 is designed to run on a small cluster of commodity nodes (or in a MapReduce
 environment), has strong approximation guarantees, and requires only a
 poly-logarithmic number of passes over the input. In particular, we propose a
 novel distributed algorithm to approximately solve mixed packing-covering
 linear programs, which include but are not limited to generalized matching
 problems. Experiments on real-world and synthetic data suggest that our
 algorithm scales to very large problem sizes and can be orders of magnitude
 faster than alternative approaches.
%B Research Reports
%@ false

Conference paper

F. Makari and R. Gemulla

“A Distributed Approximation Algorithm for Mixed Packing-covering Linear Programs,” in Proceedings of the NIPS Workshop on Big Learning, Lake Tahoe, NV, USA, 2013.

mehr

BibTeX

@inproceedings{MakariG13,
TITLE = {A Distributed Approximation Algorithm for Mixed Packing-covering Linear Programs},
AUTHOR = {Makari, Faraz and Gemulla, Rainer},
LANGUAGE = {eng},
URL = {http://biglearn.org/2013/files/papers/biglearning2013_submission_14.pdf},
PUBLISHER = {NIPS},
YEAR = {2013},
BOOKTITLE = {Proceedings of the NIPS Workshop on Big Learning},
ADDRESS = {Lake Tahoe, NV, USA},
}

Endnote

%0 Conference Proceedings
%A Makari, Faraz
%A Gemulla, Rainer
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Distributed Approximation Algorithm for Mixed Packing-covering Linear Programs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-9CC6-A
%U http://biglearn.org/2013/files/papers/biglearning2013_submission_14.pdf
%D 2013
%B NIPS 2013 Workshop on Big Learning
%Z date of event: 2013-12-09 - 2013-12-09
%C Lake Tahoe, NV, USA
%B Proceedings of the NIPS Workshop on Big Learning
%I NIPS

Article

D5D1

F. Makari, B. Awerbuch, R. Gemula, R. Khandekar, J. Mestre, and M. Sozio

“A Distributed Algorithm for Large-scale Generalized Matching,” Proceedings of the VLDB Endowment (Proc. VLDB 2013), vol. 6, no. 9, 2013.

mehr

BibTeX

@article{MakariAGKMS13,
TITLE = {A Distributed Algorithm for Large-scale Generalized Matching},
AUTHOR = {Makari, Faraz and Awerbuch, Baruch and Gemula, Rainer and Khandekar, Rohit and Mestre, Juli{\'a}n and Sozio, Mauro},
LANGUAGE = {eng},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2013},
DATE = {2013},
JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)},
VOLUME = {6},
NUMBER = {9},
PAGES = {613--624},
BOOKTITLE = {Proceedings of the 39th International Conference on Very Large Data Bases (VLDB 2013)},
EDITOR = {B{\"o}hlen, Michael and Koch, Christoph},
}

Endnote

%0 Journal Article
%A Makari, Faraz
%A Awerbuch, Baruch
%A Gemula, Rainer
%A Khandekar, Rohit
%A Mestre, Juli&#225;n
%A Sozio, Mauro
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Algorithms and Complexity, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Distributed Algorithm for Large-scale Generalized Matching : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-9CB4-1
%7 2013
%D 2013
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 6
%N 9
%& 613
%P 613 - 624
%I ACM
%C New York, NY
%B Proceedings of the 39th International Conference on Very Large Data Bases
%O August 26th - 30th 2013, Riva del Garda, Trento, Italy VLDB 2013
%U http://www.vldb.org/pvldb/vol6/p613-makarimanshadi.pdf

Article

M. Mampaey and J. Vreeken

“Summarizing Categorical Data by Clustering Attributes,” Data Mining and Knowledge Discovery, vol. 26, no. 1, 2013.

mehr

BibTeX

@article{Mampaey2013a,
TITLE = {Summarizing Categorical Data by Clustering Attributes},
AUTHOR = {Mampaey, Michael and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {1384-5810},
DOI = {10.1007/s10618-011-0246-6},
LOCALID = {Local-ID: 4366BFBB9FB411E9C1257C6000528295-Mampaey2013a},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2013},
DATE = {2013},
JOURNAL = {Data Mining and Knowledge Discovery},
VOLUME = {26},
NUMBER = {1},
PAGES = {130--173},
}

Endnote

%0 Journal Article
%A Mampaey, Michael
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Summarizing Categorical Data by Clustering Attributes : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1CD3-1
%R 10.1007/s10618-011-0246-6
%F OTHER: Local-ID: 4366BFBB9FB411E9C1257C6000528295-Mampaey2013a
%7 2013-01
%D 2013
%J Data Mining and Knowledge Discovery
%V 26
%N 1
%& 130
%P 130 - 173
%I Springer
%C Berlin
%@ false

Conference paper

S. Metzger, R. Schenkel, and M. Sydow

“QBEES: Query by Entity Examples,” in CIKM’13, 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 2013.

mehr

Abstract

Structured knowledge bases are an increasingly important

way for storing and retrieving information. Within such

knowledge bases, an important search task is finding similar

entities based on one or more example entities. We

present QBEES, a novel framework for defining entity similarity

based only on structural features, so-called aspects,

of the entities, that includes query-dependent and query-independent entity

ranking components. We present evaluation

results with a number of existing entity list completion

benchmarks, comparing to several state-of-the-art baselines.

BibTeX

@inproceedings{MetzgerSS_CIKM2013,
TITLE = {{QBEES}: Query by Entity Examples},
AUTHOR = {Metzger, Steffen and Schenkel, Ralf and Sydow, Marcin},
LANGUAGE = {eng},
ISBN = {978-1-4503-2263-8},
DOI = {10.1145/2505515.2507873},
LOCALID = {Local-ID: D07B27BEBFE9E7D8C1257BB10024BD7C-MetzgerSS_CIKM2013},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Structured knowledge bases are an increasingly important way for storing and retrieving information. Within such knowledge bases, an important search task is finding similar entities based on one or more example entities. We present QBEES, a novel framework for defining entity similarity based only on structural features, so-called aspects, of the entities, that includes query-dependent and query-independent entity ranking components. We present evaluation results with a number of existing entity list completion benchmarks, comparing to several state-of-the-art baselines.},
BOOKTITLE = {CIKM'13, 22nd ACM International Conference on Information \& Knowledge Management},
EDITOR = {Nejdl, Wolfgang and Pei, Jian and Rastogi, Rajeev},
PAGES = {1829--1832},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Metzger, Steffen
%A Schenkel, Ralf
%A Sydow, Marcin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T QBEES: Query by Entity Examples : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1D0B-E
%F OTHER: Local-ID: D07B27BEBFE9E7D8C1257BB10024BD7C-MetzgerSS_CIKM2013
%R 10.1145/2505515.2507873
%D 2013
%B 22nd ACM International Conference on Information & Knowledge Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%X Structured knowledge bases are an increasingly important
way for storing and retrieving information. Within such
knowledge bases, an important search task is finding similar
entities based on one or more example entities. We
present QBEES, a novel framework for defining entity similarity
based only on structural features, so-called aspects,
of the entities, that includes query-dependent and query-independent entity 
ranking components. We present evaluation
results with a number of existing entity list completion
benchmarks, comparing to several state-of-the-art baselines.
%B CIKM'13
%E Nejdl, Wolfgang; Pei, Jian; Rastogi, Rajeev
%P 1829 - 1832
%I ACM 
%@ 978-1-4503-2263-8

Conference paper

P. Miettinen

“Fully Dynamic Quasi-Biclique Edge Covers via Boolean Matrix Factorizations,” in 1st ACM SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM 2013), New York, NY, USA, 2013.

mehr

Abstract

An important way of summarizing a bipartite graph is to give a set of (quasi-)

bicliques that contain (almost) all of its edges. These quasi-bicliques are

somewhat similar to clustering of the nodes, giving sets of similar nodes.

Unlike clustering, however, the quasi-bicliques are not required to partition

the nodes, allowing greater flexibility when creating them. When we identify

the bipartite graph with its bi-adjacency matrix, the problem of finding these

quasi-bicliques turns into the problem of finding the Boolean matrix

factorization of the bi-adjacency matrix -- a problem that has received

increasing research interest in data mining in recent years. But many

real-world graphs are dynamic and evolve over time. How can we update our

bicliques without having to re-compute them from the scratch?

An algorithm was recently proposed for this task (Miettinen, ICMD 2012). The

algorithm, however, is only able to handle the case where the new 1s are

added to the matrix~--~it cannot handle the removal of existing 1s.

Furthermore, the algorithm cannot adjust the rank of the factorization.

This paper extends said algorithm with the capability of working in fully

dynamic setting (with both additions and deletions) and with capability of

adjusting its rank dynamically, as well. The behaviour and performance of the

algorithm is studied in experiments conducted with both real-world and

synthetic data.

BibTeX

@inproceedings{miettinen13fully,
TITLE = {Fully Dynamic Quasi-Biclique Edge Covers via {Boolean} Matrix Factorizations},
AUTHOR = {Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-4503-2209-6},
DOI = {10.1145/2489247.2489250},
LOCALID = {Local-ID: CA06050F2A3AFCF0C1257C6A005F39ED-miettinen13fully},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {An important way of summarizing a bipartite graph is to give a set of (quasi-) bicliques that contain (almost) all of its edges. These quasi-bicliques are somewhat similar to clustering of the nodes, giving sets of similar nodes. Unlike clustering, however, the quasi-bicliques are not required to partition the nodes, allowing greater flexibility when creating them. When we identify the bipartite graph with its bi-adjacency matrix, the problem of finding these quasi-bicliques turns into the problem of finding the Boolean matrix factorization of the bi-adjacency matrix -- a problem that has received increasing research interest in data mining in recent years. But many real-world graphs are dynamic and evolve over time. How can we update our bicliques without having to re-compute them from the scratch? An algorithm was recently proposed for this task (Miettinen, ICMD 2012). The algorithm, however, is only able to handle the case where the new 1s are added to the matrix~--~it cannot handle the removal of existing 1s. Furthermore, the algorithm cannot adjust the rank of the factorization. This paper extends said algorithm with the capability of working in fully dynamic setting (with both additions and deletions) and with capability of adjusting its rank dynamically, as well. The behaviour and performance of the algorithm is studied in experiments conducted with both real-world and synthetic data.},
BOOKTITLE = {1st ACM SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM 2013)},
PAGES = {17--24},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Fully Dynamic Quasi-Biclique Edge Covers via Boolean Matrix Factorizations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0018-EF87-D
%F OTHER: Local-ID: CA06050F2A3AFCF0C1257C6A005F39ED-miettinen13fully
%R 10.1145/2489247.2489250
%D 2013
%B 1st ACM SIGMOD Workshop on Dynamic Networks Management and Mining
%Z date of event: 2013-06-23 - 2013-06-23
%C New York, NY, USA
%X An important way of summarizing a bipartite graph is to give a set of (quasi-) 
bicliques that contain (almost) all of its edges. These quasi-bicliques are 
somewhat similar to clustering of the nodes, giving sets of similar nodes. 
Unlike clustering, however, the quasi-bicliques are not required to partition 
the nodes, allowing greater flexibility when creating them. When we identify 
the bipartite graph with its bi-adjacency matrix, the problem of finding these 
quasi-bicliques turns into the problem of finding the Boolean matrix 
factorization of the bi-adjacency matrix -- a problem that has received 
increasing research interest in data mining in recent years. But many 
real-world graphs are dynamic and evolve over time. How can we update our 
bicliques without having to re-compute them from the scratch?

An algorithm was recently proposed for this task (Miettinen, ICMD 2012). The 
algorithm, however, is only able to handle the case where the new 1s are 
added to the matrix~--~it cannot handle the removal of existing 1s. 
Furthermore, the algorithm cannot adjust the rank of the factorization.

This paper extends said algorithm with the capability of working in fully 
dynamic setting (with both additions and deletions) and with capability of 
adjusting its rank dynamically, as well. The behaviour and performance of the 
algorithm is studied in experiments conducted with both real-world and 
synthetic data.
%B 1st ACM SIGMOD Workshop on Dynamic Networks Management and Mining
%P 17 - 24
%I ACM
%@ 978-1-4503-2209-6

Thesis

D. Milchevski

“Entity Recommendation Based on Wikipedia,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{MilchevskiMaster2013,
TITLE = {Entity Recommendation Based on {Wikipedia}},
AUTHOR = {Milchevski, Dragan},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
}

Endnote

%0 Thesis
%A Milchevski, Dragan
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Entity Recommendation Based on Wikipedia : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-CE39-E
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%P XII, 121 p.
%V master
%9 master

Conference paper

I. Miliaraki, K. Berberich, R. Gemulla, and S. Zoupanos

“Mind the Gap: Large-scale Frequent Sequence Mining,” in SIGMOD’13, ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 2013.

mehr

Abstract

Frequent sequence mining is one of the fundamental building blocks in data

mining. While the problem has been extensively studied, few of the available

techniques are suffciently scalable to handle datasets with billions of

sequences; such large-scale datasets arise, for instance, in text mining and

session analysis. In this paper, we propose PFSM, a scalable algorithm for

frequent sequence mining on MapReduce. PFSM can handle so-called ``gap

constraints'', which can be used to limit the output to a controlled set of

frequent sequences. At its heart, PFSM partitions the input database in a way

that allows us to mine each partition independently using any existing frequent

sequence mining algorithm. We introduce the notion of w-equivalency, which is

a generalization of the notion of a ``projected database'' used by many

frequent pattern mining algorithms. We also present a number of optimization

techniques that minimize partition size, and therefore computational and

communication costs, while still maintaining correctness. Our extensive

experimental study in the context of text mining suggests that PFSM is

significantly more efficient and scalable than alternative approaches.

BibTeX

@inproceedings{Miliaraki2013,
TITLE = {Mind the Gap: Large-scale Frequent Sequence Mining},
AUTHOR = {Miliaraki, Iris and Berberich, Klaus and Gemulla, Rainer and Zoupanos, Spyros},
LANGUAGE = {eng},
ISBN = {978-1-4503-2037-5},
DOI = {10.1145/2463676.2465285},
LOCALID = {Local-ID: 086027E8ABA46DC6C1257B0F003D8C96-Miliaraki2013},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are suffciently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this paper, we propose PFSM, a scalable algorithm for frequent sequence mining on MapReduce. PFSM can handle so-called ``gap constraints'', which can be used to limit the output to a controlled set of frequent sequences. At its heart, PFSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of w-equivalency, which is a generalization of the notion of a ``projected database'' used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our extensive experimental study in the context of text mining suggests that PFSM is significantly more efficient and scalable than alternative approaches.},
BOOKTITLE = {SIGMOD'13, ACM SIGMOD International Conference on Management of Data},
EDITOR = {Ross, Kenneth and Srivastava, Divesh and Papadias, Dimitris and Papadopoulos, Stavros},
PAGES = {797--808},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Miliaraki, Iris
%A Berberich, Klaus
%A Gemulla, Rainer
%A Zoupanos, Spyros
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Mind the Gap: Large-scale Frequent Sequence Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1D76-9
%F OTHER: Local-ID: 086027E8ABA46DC6C1257B0F003D8C96-Miliaraki2013
%R 10.1145/2463676.2465285
%D 2013
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2013-06-22 - 2013-06-27
%C New York, NY, USA
%X Frequent sequence mining is one of the fundamental building blocks in data 
mining. While the problem has been extensively studied, few of the available 
techniques are suffciently scalable to handle datasets with billions of 
sequences; such large-scale datasets arise, for instance, in text mining and 
session analysis. In this paper, we propose PFSM, a scalable algorithm for 
frequent sequence mining on MapReduce. PFSM can handle so-called ``gap 
constraints'', which can be used to limit the output to a controlled set of 
frequent sequences. At its heart, PFSM partitions the input database in a way 
that allows us to mine each partition independently using any existing frequent 
sequence mining algorithm. We introduce the notion of w-equivalency, which is 
a generalization of the notion of a ``projected database'' used by many 
frequent pattern mining algorithms. We also present a number of optimization 
techniques that minimize partition size, and therefore computational and 
communication costs, while still maintaining correctness. Our extensive 
experimental study in the context of text mining suggests that PFSM is 
significantly more efficient and scalable than alternative approaches.
%B SIGMOD'13
%E Ross, Kenneth; Srivastava, Divesh; Papadias, Dimitris; Papadopoulos, Stavros
%P 797 - 808
%I ACM
%@ 978-1-4503-2037-5

Thesis

A. Mishra

“Design and Evaluation of an IR-Benchmark for SPARQL Fulltext Queries,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{MishraMastersThesis2013,
TITLE = {Design and Evaluation of an {IR}-Benchmark for {SPARQL} Fulltext Queries},
AUTHOR = {Mishra, Arunav},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
}

Endnote

%0 Thesis
%A Mishra, Arunav
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Design and Evaluation of an IR-Benchmark for SPARQL Fulltext Queries : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C5C-7
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V master
%9 master

Conference paper

A. Mishra, S. Gurajada, and M. Theobald

“SPAR-Key: Processing SPARQL-Fulltext Queries to Solve Jeopardy! Clues,” in Working Notes for the CLEF 2013 Conference, Valencia, Spain, 2013.

mehr

Abstract

We describe our SPAR-Key query engine that implements indexing,

ranking, and query processing techniques to run a new kind of SPARQL-fulltext

queries that were provided in the context of the INEX 2013 Jeopardy task.

BibTeX

@inproceedings{Theobald2012,
TITLE = {{SPAR}-Key: Processing {SPARQL}-Fulltext Queries to Solve {J}eopardy! Clues},
AUTHOR = {Mishra, Arunav and Gurajada, Sairam and Theobald, Martin},
LANGUAGE = {eng},
LOCALID = {Local-ID: 598B0731B76FB570C1257BBB003DA461-Theobald2012},
PUBLISHER = {CLEF Initiative},
YEAR = {2013},
ABSTRACT = {We describe our SPAR-Key query engine that implements indexing, ranking, and query processing techniques to run a new kind of SPARQL-fulltext queries that were provided in the context of the INEX 2013 Jeopardy task.},
BOOKTITLE = {Working Notes for the CLEF 2013 Conference},
EDITOR = {Forner, Pamela and Navigli, Roberto and Tufis, Dan},
ADDRESS = {Valencia, Spain},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Gurajada, Sairam
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T SPAR-Key: Processing SPARQL-Fulltext Queries to Solve Jeopardy! Clues : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A87-1
%F OTHER: Local-ID: 598B0731B76FB570C1257BBB003DA461-Theobald2012
%D 2013
%B CLEF 2013 Evaluation Labs and Workshop
%Z date of event: 2013-09-23 - 2013-09-26
%C Valencia, Spain
%X We describe our SPAR-Key query engine that implements indexing,
ranking, and query processing techniques to run a new kind of SPARQL-fulltext 
queries that were provided in the context of the INEX 2013 Jeopardy task.
%B Working Notes for the CLEF 2013 Conference
%E Forner, Pamela; Navigli, Roberto; Tufis, Dan
%I CLEF Initiative
%U http://www.clef-initiative.eu/documents/71612/69505b5f-455b-4ce6-a699-28e2268d4d84

Conference paper

N. Nakashole, T. Tylenda, and G. Weikum

“Fine-grained Semantic Typing of Emerging Entities,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, 2013.

mehr

Abstract

Methods for information extraction (IE) and knowledge base (KB) construction

have been intensively studied. However, a largely under-explored case is

tapping into highly dynamic sources like news streams and social media, where

new entities are continuously emerging. In this paper, we present a method for

discovering and semantically typing newly emerging out-of-

KB entities, thus improving the freshness and recall of ontology-based IE and

improving the precision and semantic rigor of open IE. Our method is based on a

probabilistic model that feeds weights into integer linear programs that

leverage type signatures of relational phrases and type correlation or

disjointness constraints. Our experimental evaluation, based on crowdsourced

user studies, show our method performing significantly better than prior work.

BibTeX

@inproceedings{NakasholeTW2013,
TITLE = {Fine-grained Semantic Typing of Emerging Entities},
AUTHOR = {Nakashole, Ndapandula and Tylenda, Tomasz and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-50-3},
URL = {http://www.aclweb.org/anthology/P13-1146},
LOCALID = {Local-ID: FC162FD7AA65180CC1257C690054174F-NakasholeTW2013},
PUBLISHER = {ACL},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied. However, a largely under-explored case is tapping into highly dynamic sources like news streams and social media, where new entities are continuously emerging. In this paper, we present a method for discovering and semantically typing newly emerging out-of- KB entities, thus improving the freshness and recall of ontology-based IE and improving the precision and semantic rigor of open IE. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disjointness constraints. Our experimental evaluation, based on crowdsourced user studies, show our method performing significantly better than prior work.},
BOOKTITLE = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)},
PAGES = {1488--1497},
ADDRESS = {Sofia, Bulgaria},
}

Endnote

%0 Conference Proceedings
%A Nakashole, Ndapandula
%A Tylenda, Tomasz
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Fine-grained Semantic Typing of Emerging Entities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1D86-5
%F OTHER: Local-ID: FC162FD7AA65180CC1257C690054174F-NakasholeTW2013
%U http://www.aclweb.org/anthology/P13-1146
%D 2013
%B 51st Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2013-08-04 - 2013-08-04
%C Sofia, Bulgaria
%X Methods for information extraction (IE) and knowledge base (KB) construction 
have been intensively studied. However, a largely under-explored case is 
tapping into highly dynamic sources like news streams and social media, where 
new entities are continuously emerging. In this paper, we present a method for 
discovering and semantically typing newly emerging out-of-
KB entities, thus improving the freshness and recall of ontology-based IE and 
improving the precision and semantic rigor of open IE. Our method is based on a 
probabilistic model that feeds weights into integer linear programs that 
leverage type signatures of relational phrases and type correlation or 
disjointness constraints. Our experimental evaluation, based on crowdsourced 
user studies, show our method performing significantly better than prior work.
%B Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics
%P 1488 - 1497
%I ACL
%@ 978-1-937284-50-3

Article

N. Nakashole, G. Weikum, and F. M. Suchanek

“Discovering Semantic Relations From the Web and Organizing Them with PATTY,” ACM SIGMOD Record, vol. 42, no. 2, 2013.

mehr

Abstract

PATTY is a system for automatically distilling relational patterns from the

Web, for example, the pattern "X covered Y" between a singer and someone else's

song. We have extracted a large collection of such patterns and organized them

in a taxonomic manner, similar in style to the WordNet thesaurus but capturing

relations (binary predicates) instead of concepts and classes (unary

predicates). The patterns are organized by semantic types and synonyms, and

they form a hierarchy based on subsumptions. For example, "X covered Y" is

subsumed by "X sang Y", which in turn is subsumed by "X performed Y" (where X

can be any musician, not just a singer). In this paper we give an overview of

the PATTY system and the resulting collections of relational patterns. We

discuss the four main components of PATTY's architecture and a variety of use

cases, including the paraphrasing of relations, and semantic search over

subjectpredicate- object triples. This kind of search can handle entities,

relations, semantic types, noun phrases, and relational phrases.

BibTeX

@article{Nakashole2013,
TITLE = {Discovering Semantic Relations From the Web and Organizing Them with {PATTY}},
AUTHOR = {Nakashole, Ndapandula and Weikum, Gerhard and Suchanek, Fabian M.},
LANGUAGE = {eng},
ISSN = {0163-5808},
DOI = {10.1145/2503792.2503799},
LOCALID = {Local-ID: 44317A4E27B1A909C1257C69004C8654-Nakashole2013},
PUBLISHER = {ACM},
ADDRESS = {New York, USA},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {PATTY is a system for automatically distilling relational patterns from the Web, for example, the pattern "X covered Y" between a singer and someone else's song. We have extracted a large collection of such patterns and organized them in a taxonomic manner, similar in style to the WordNet thesaurus but capturing relations (binary predicates) instead of concepts and classes (unary predicates). The patterns are organized by semantic types and synonyms, and they form a hierarchy based on subsumptions. For example, "X covered Y" is subsumed by "X sang Y", which in turn is subsumed by "X performed Y" (where X can be any musician, not just a singer). In this paper we give an overview of the PATTY system and the resulting collections of relational patterns. We discuss the four main components of PATTY's architecture and a variety of use cases, including the paraphrasing of relations, and semantic search over subjectpredicate- object triples. This kind of search can handle entities, relations, semantic types, noun phrases, and relational phrases.},
JOURNAL = {ACM SIGMOD Record},
VOLUME = {42},
NUMBER = {2},
PAGES = {29--34},
}

Endnote

%0 Journal Article
%A Nakashole, Ndapandula
%A Weikum, Gerhard
%A Suchanek, Fabian M.
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Discovering Semantic Relations From the Web and Organizing Them with PATTY : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1D7F-8
%F OTHER: Local-ID: 44317A4E27B1A909C1257C69004C8654-Nakashole2013
%R 10.1145/2503792.2503799
%7 2013-05
%D 2013
%X PATTY is a system for automatically distilling relational patterns from the 
Web, for example, the pattern "X covered Y" between a singer and someone else's 
song. We have extracted a large collection of such patterns and organized them 
in a taxonomic manner, similar in style to the WordNet thesaurus but capturing 
relations (binary predicates) instead of concepts and classes (unary 
predicates). The patterns are organized by semantic types and synonyms, and 
they form a hierarchy based on subsumptions. For example, "X covered Y" is 
subsumed by "X sang Y", which in turn is subsumed by "X performed Y" (where X 
can be any musician, not just a singer). In this paper we give an overview of 
the PATTY system and the resulting collections of relational patterns. We 
discuss the four main components of PATTY's architecture and a variety of use 
cases, including the paraphrasing of relations, and semantic search over 
subjectpredicate- object triples. This kind of search can handle entities, 
relations, semantic types, noun phrases, and relational phrases.
%J ACM SIGMOD Record
%V 42
%N 2
%& 29
%P 29 - 34
%I ACM
%C New York, USA
%@ false

Conference paper

B. Paudel, A. Anand, and K. Berberich

“User-defined Redundancy in Web Archives,” in Proceedings of the 10th International Workshop on Large-Scale and Distributed Systems for Information Retrieval (LSDS-IR 2013), Rome, Itay, 2013.

mehr

BibTeX

@inproceedings{Berberich2013a,
TITLE = {User-defined Redundancy in Web Archives},
AUTHOR = {Paudel, Bibek and Anand, Avishek and Berberich, Klaus},
LANGUAGE = {eng},
LOCALID = {Local-ID: B1F68A61E42E6170C1257B09003A6105-Berberich2013a},
PUBLISHER = {lsdsir.org},
YEAR = {2013},
BOOKTITLE = {Proceedings of the 10th International Workshop on Large-Scale and Distributed Systems for Information Retrieval (LSDS-IR 2013)},
ADDRESS = {Rome, Itay},
}

Endnote

%0 Conference Proceedings
%A Paudel, Bibek
%A Anand, Avishek
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T User-defined Redundancy in Web Archives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A91-9
%F OTHER: Local-ID: B1F68A61E42E6170C1257B09003A6105-Berberich2013a
%D 2013
%B 10th International Workshop on Large-Scale and Distributed Systems for Information Retrieval
%Z date of event: 2013-02-05 - 2013-02-05
%C Rome, Itay
%B Proceedings of the 10th International Workshop on Large-Scale and Distributed Systems for Information Retrieval
%I lsdsir.org
%U http://www.lsdsir.org/wp-content/uploads/2013/02/LSDS-IR-2013-Proceedings.pdf

Conference paper

N. Preda, F. M. Suchanek, W. Yuan, and G. Weikum

“SUSIE: Search Using Services and Information Extraction,” in 29th IEEE International Conference on Data Engineering (ICDE 2013), Brisbane, Australia, 2013.

mehr

Abstract

The API of a Web service restricts the ypes of queries that the service can answer. For example, a Web service might provide a method that returns the songs of a given singer, but it might not provide a method that returns the singers of a given song. If the user asks for the singer of some specic song, then the Web service cannot be called � even though the underlying database might have the desired piece of information.

This asymmetry is particularly problematic if the service is used in a Web service orchestration system. In this paper, we propose to use on-the-y information

extraction to collect values that can be used as parameter bindings for the Web service. We show how this idea can be

integrated into a Web service orchestration system. Our approach is fully implemented in a prototype called SUSIE. We present

experiments with real-life data and services to demonstrate the practical viability and good performance of our approach.

BibTeX

@inproceedings{susie,
TITLE = {{SUSIE}: Search Using Services and Information Extraction},
AUTHOR = {Preda, Nicoleta and Suchanek, Fabian M. and Yuan, Wenjun and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4673-4909-3},
DOI = {10.1109/ICDE.2013.6544827},
LOCALID = {Local-ID:C1257ACD0050F94E-0C3E76F7AF652AEEC1257AD70062003A-susie},
PUBLISHER = {IEEE},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {The API of a Web service restricts the ypes of queries that the service can answer. For example, a Web service might provide a method that returns the songs of a given singer, but it might not provide a method that returns the singers of a given song. If the user asks for the singer of some specic song, then the Web service cannot be called {\diamond} even though the underlying database might have the desired piece of information. This asymmetry is particularly problematic if the service is used in a Web service orchestration system. In this paper, we propose to use on-the-y information extraction to collect values that can be used as parameter bindings for the Web service. We show how this idea can be integrated into a Web service orchestration system. Our approach is fully implemented in a prototype called SUSIE. We present experiments with real-life data and services to demonstrate the practical viability and good performance of our approach.},
BOOKTITLE = {29th IEEE International Conference on Data Engineering (ICDE 2013)},
PAGES = {218--229},
ADDRESS = {Brisbane, Australia},
}

Endnote

%0 Conference Proceedings
%A Preda, Nicoleta
%A Suchanek, Fabian M.
%A Yuan, Wenjun
%A Weikum, Gerhard
%+ External Organizations
Ontologies, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T SUSIE: Search Using Services and Information Extraction :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-546D-9
%R 10.1109/ICDE.2013.6544827
%F OTHER: Local-ID:C1257ACD0050F94E-0C3E76F7AF652AEEC1257AD70062003A-susie
%D 2013
%B 29th IEEE International Conference on Data Engineering
%Z date of event: 2013-04-08 - 2013-04-11
%C Brisbane, Australia
%X The API of a Web service restricts the ypes of queries that the service can answer. For example, a Web service might provide a method that returns the songs of a given singer, but it might not provide a method that returns the singers of a given song. If the user asks for the singer of some specic song, then the Web service cannot be called &#65533; even though the underlying database might have the desired piece of information.
This asymmetry is particularly problematic if the service is used in a Web service orchestration system. In this paper, we propose to use on-the-y information
extraction to collect values that can be used as parameter bindings for the Web service. We show how this idea can be
integrated into a Web service orchestration system. Our approach is fully implemented in a prototype called SUSIE. We present
experiments with real-life data and services to demonstrate the practical viability and good performance of our approach.
%B 29th IEEE International Conference on Data Engineering
%P 218 - 229
%I IEEE
%@ 978-1-4673-4909-3

Thesis

D5IMPR-CS

L. Qu

“Sentiment Analysis with Limited Training Data,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

Abstract

Sentiments are positive and negative emotions, evaluations and stances. This
dissertation focuses on learning based systems for automatic analysis of
sentiments and comparisons in natural language text. The proposed approach
consists of three contributions:

1. Bag-of-opinions model: For predicting document-level polarity and intensity,
we proposed the bag-of-opinions model by modeling each document as a bag of
sentiments, which can explore the syntactic structures of sentiment-bearing
phrases for improved rating prediction of online reviews.
2. Multi-experts model: Due to the sparsity of manually-labeled training data,
we designed the multi-experts model for sentence-level analysis of sentiment
polarity and intensity by fully exploiting any available sentiment indicators,
such as phrase-level predictors and sentence similarity measures.
3. LSSVMrae model: To understand the sentiments regarding entities, we proposed
LSSVMrae model for extracting sentiments and comparisons of entities at both
sentence and subsentential level.

Different granularity of analysis leads to different model complexity, the
finer the more complex. All proposed models aim to minimize the use of
hand-labeled data by maximizing the use of the freely available resources.
These models explore also different feature representations to capture the
compositional semantics inherent in sentiment-bearing expressions. Our
experimental results on real-world data showed that all models significantly
outperform the state-of-the-art methods on the respective tasks.

BibTeX

@phdthesis{Qu2013,
TITLE = {Sentiment Analysis with Limited Training Data},
AUTHOR = {Qu, Lizhen},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-56150},
DOI = {10.22028/D291-26552},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Sentiments are positive and negative emotions, evaluations and stances. This <br>dissertation focuses on learning based systems for automatic analysis of <br>sentiments and comparisons in natural language text. The proposed approach <br>consists of three contributions:<br><br>1. Bag-of-opinions model: For predicting document-level polarity and intensity, <br>we proposed the bag-of-opinions model by modeling each document as a bag of <br>sentiments, which can explore the syntactic structures of sentiment-bearing <br>phrases for improved rating prediction of online reviews.<br>2. Multi-experts model: Due to the sparsity of manually-labeled training data, <br>we designed the multi-experts model for sentence-level analysis of sentiment <br>polarity and intensity by fully exploiting any available sentiment indicators, <br>such as phrase-level predictors and sentence similarity measures. <br>3. LSSVMrae model: To understand the sentiments regarding entities, we proposed <br>LSSVMrae model for extracting sentiments and comparisons of entities at both <br>sentence and subsentential level.<br><br>Different granularity of analysis leads to different model complexity, the <br>finer the more complex. All proposed models aim to minimize the use of <br>hand-labeled data by maximizing the use of the freely available resources. <br>These models explore also different feature representations to capture the <br>compositional semantics inherent in sentiment-bearing expressions. Our <br>experimental results on real-world data showed that all models significantly <br>outperform the state-of-the-art methods on the respective tasks.},
}

Endnote

%0 Thesis
%A Qu, Lizhen
%Y Weikum, Gerhard
%A referee: Gemulla, Rainer
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Sentiment Analysis with Limited Training Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-9796-9
%R 10.22028/D291-26552
%U urn:nbn:de:bsz:291-scidok-56150
%F OTHER: hdl:20.500.11880/26608
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%P 133 p.
%V phd
%9 phd
%X Sentiments are positive and negative emotions, evaluations and stances. This <br>dissertation focuses on learning based systems for automatic analysis of <br>sentiments and comparisons in natural language text. The proposed approach <br>consists of three contributions:<br><br>1. Bag-of-opinions model: For predicting document-level polarity and intensity, <br>we proposed the bag-of-opinions model by modeling each document as a bag of <br>sentiments, which can explore the syntactic structures of sentiment-bearing <br>phrases for improved rating prediction of online reviews.<br>2. Multi-experts model: Due to the sparsity of manually-labeled training data, <br>we designed the multi-experts model for sentence-level analysis of sentiment <br>polarity and intensity by fully exploiting any available sentiment indicators, <br>such as phrase-level predictors and sentence similarity measures. <br>3. LSSVMrae model: To understand the sentiments regarding entities, we proposed <br>LSSVMrae model for extracting sentiments and comparisons of entities at both <br>sentence and subsentential level.<br><br>Different granularity of analysis leads to different model complexity, the <br>finer the more complex. All proposed models aim to minimize the use of <br>hand-labeled data by maximizing the use of the freely available resources. <br>These models explore also different feature representations to capture the <br>compositional semantics inherent in sentiment-bearing expressions. Our <br>experimental results on real-world data showed that all models significantly <br>outperform the state-of-the-art methods on the respective tasks.
%U http://scidok.sulb.uni-saarland.de/volltexte/2013/5615/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

J. Ramon, P. Miettinen, and J. Vreeken

“Detecting Bicliques in GF[q],” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2013), Prague, Czech Republic, 2013.

mehr

BibTeX

@inproceedings{ramon13detecting,
TITLE = {Detecting Bicliques in {GF[q]}},
AUTHOR = {Ramon, Jan and Miettinen, Pauli and Vreeken, Jilles},
LANGUAGE = {eng},
ISBN = {978-3-642-40987-5},
DOI = {10.1007/978-3-642-40988-2_33},
LOCALID = {Local-ID: CEC81211FFBADD4DC1257C6000544E39-ramon13detecting},
PUBLISHER = {Springer},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2013)},
EDITOR = {Blockeel, Hendrik and Kersting, Kristian and Nijssen, Siegfried and {\v Z}elezn{\'y}, Filip},
PAGES = {509--524},
SERIES = {Lecture Notes in Artificial Intelligence},
VOLUME = {8188},
PAGES = {509--524},
ADDRESS = {Prague, Czech Republic},
}

Endnote

%0 Conference Proceedings
%A Ramon, Jan
%A Miettinen, Pauli
%A Vreeken, Jilles
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Detecting Bicliques in GF[q] : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1D9A-C
%F OTHER: Local-ID: CEC81211FFBADD4DC1257C6000544E39-ramon13detecting
%R 10.1007/978-3-642-40988-2_33
%D 2013
%B European Conference on Machine Learning and Knowledge Discovery in Databases
%Z date of event: 2013-09-23 - 2013-09-27
%C Prague, Czech Republic
%B Machine Learning and Knowledge Discovery in Databases
%E Blockeel, Hendrik; Kersting, Kristian; Nijssen, Siegfried; &#381;elezn&#253;, Filip
%P 509 - 524
%I Springer
%@ 978-3-642-40987-5
%B Lecture Notes in Artificial Intelligence
%N 8188
%P 509 - 524

Conference paper

S. Seufert, A. Anand, S. Bedathur, and G. Weikum

“FERRARI: Flexible and Efficient Reachability Range Assignment for Graph Indexing,” in 29th IEEE International Conference on Data Engineering (ICDE 2013), Brisbane, Australia, 2013.

mehr

Abstract

In this paper, we propose a scalable and highly efficient index structure for

the reachability

problem over graphs. We build on the well-known node interval labeling scheme

where the set of

vertices reachable from a particular node is compactly encoded as a

collection of node identifier

ranges. We impose an explicit bound on the size of the index and flexibly

assign approximate

reachability ranges to nodes of the graph such that the number of index

probes to answer a query

is minimized. The resulting tunable index structure generates a better range

labeling if the space

budget is increased, thus providing a direct control over the trade off

between index size and the

query processing performance. By using a fast recursive querying method in

conjunction with our

index structure, we show that in practice, reachability queries can be

answered in the order of

microseconds on an off-the-shelf computer -- even for the case of

massive-scale real world graphs.

Our claims are supported by an extensive set of experimental results using a

multitude of

benchmark and real-world web-scale graph datasets.

BibTeX

@inproceedings{Seufert2013,
TITLE = {{FERRARI}: Flexible and Efficient Reachability Range Assignment for Graph Indexing},
AUTHOR = {Seufert, Stephan and Anand, Avishek and Bedathur, Srikanta and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4673-4909-3},
DOI = {10.1109/ICDE.2013.6544893},
LOCALID = {Local-ID: 0E395B1E701B8498C1257B0900346A0B-Seufert2013},
PUBLISHER = {IEEE},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that in practice, reachability queries can be answered in the order of microseconds on an off-the-shelf computer -- even for the case of massive-scale real world graphs. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.},
BOOKTITLE = {29th IEEE International Conference on Data Engineering (ICDE 2013)},
PAGES = {1009--1020},
ADDRESS = {Brisbane, Australia},
}

Endnote

%0 Conference Proceedings
%A Seufert, Stephan
%A Anand, Avishek
%A Bedathur, Srikanta
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T FERRARI: Flexible and Efficient Reachability Range Assignment for Graph Indexing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-36CC-1
%F OTHER: Local-ID: 0E395B1E701B8498C1257B0900346A0B-Seufert2013
%R 10.1109/ICDE.2013.6544893
%D 2013
%B 29th IEEE International Conference on Data Engineering
%Z date of event: 2013-04-08 - 2013-04-12
%C Brisbane, Australia
%X In this paper, we propose a scalable and highly efficient index structure for 
the reachability
problem over graphs. We build on the well-known node interval labeling scheme 
where the set of
vertices reachable from a particular node is compactly encoded as a 
collection of node identifier
ranges. We impose an explicit bound on the size of the index and flexibly 
assign approximate
reachability ranges to nodes of the graph such that the number of index 
probes to answer a query
is minimized. The resulting tunable index structure generates a better range 
labeling if the space
budget is increased, thus providing a direct control over the trade off 
between index size and the
query processing performance. By using a fast recursive querying method in 
conjunction with our
index structure, we show that in practice, reachability queries can be 
answered in the order of
microseconds on an off-the-shelf computer -- even for the case of 
massive-scale real world graphs.
Our claims are supported by an extensive set of experimental results using a 
multitude of
benchmark and real-world web-scale graph datasets.
%B 29th IEEE International Conference on Data Engineering
%P 1009 - 1020
%I IEEE
%@ 978-1-4673-4909-3

Conference paper

S. Seufert

“RDF-4G: Algorithmic Building Blocks for Large-scale Graph Analytics,” in SIGMOD’13 PhD Symposium, New York, NY, USA, 2013.

mehr

BibTeX

@inproceedings{Seufert2013a,
TITLE = {{RDF-4G}: Algorithmic Building Blocks for Large-scale Graph Analytics},
AUTHOR = {Seufert, Stephan},
LANGUAGE = {eng},
ISBN = {978-1-4503-2155-6},
DOI = {10.1145/2483574.2483581},
LOCALID = {Local-ID: 24060B7FE63CE27FC1257B71003994B8-Seufert2013az},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {SIGMOD{\textquoteright}13 PhD Symposium},
EDITOR = {Chen, Lei and Dong, Xin Luna},
PAGES = {67--72},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Seufert, Stephan
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T RDF-4G: Algorithmic Building Blocks for Large-scale Graph Analytics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A68-5
%R 10.1145/2483574.2483581
%F OTHER: Local-ID: 24060B7FE63CE27FC1257B71003994B8-Seufert2013az
%D 2013
%B SIGMOD/PODS PhD Symposium
%Z date of event: 2013-06-23 - 2013-06-23
%C New York, NY, USA
%B SIGMOD&#8217;13 PhD Symposium
%E Chen, Lei; Dong, Xin Luna
%P 67 - 72
%I ACM
%@ 978-1-4503-2155-6

Conference poster

S. Seufert, S. Bedathur, J. Hoffart, A. Gubichev, and K. Berberich

“Efficient Computation of Relationship-centrality in Large Entity-relationship Graphs,” International Semantic Web Conference (Posters & Demos) 2013. CEUR-WS.org, Aachen, 2013.

mehr

BibTeX

@inproceedings{Seufert2013az,
TITLE = {Efficient Computation of Relationship-centrality in Large Entity-relationship Graphs},
AUTHOR = {Seufert, Stephan and Bedathur, Srikanta and Hoffart, Johannes and Gubichev, Andrey and Berberich, Klaus},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {http://ceur-ws.org/Vol-1035/iswc2013_poster_22.pdf},
LOCALID = {Local-ID: 1DEEABDE0FBFA169C1257C6800586017-Seufert2013a},
PUBLISHER = {CEUR-WS.org},
YEAR = {2013},
BOOKTITLE = {International Semantic Web Conference (Posters \& Demos) 2013},
EDITOR = {Blomqvist, Eva and Groza, Tudor},
PAGES = {265--268},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {1035},
ADDRESS = {Sydney, Australia},
}

Endnote

%0 Generic
%A Seufert, Stephan
%A Bedathur, Srikanta
%A Hoffart, Johannes
%A Gubichev, Andrey
%A Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficient Computation of Relationship-centrality in Large Entity-relationship Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-36D7-7
%F OTHER: Local-ID: 1DEEABDE0FBFA169C1257C6800586017-Seufert2013a
%U http://ceur-ws.org/Vol-1035/iswc2013_poster_22.pdf
%D 2013
%Z name of event: ISWC-PD 2013
%Z date of event: 2013-10-23 - 2013-10-23
%Z place of event: Sydney, Australia
%B International Semantic Web Conference (Posters & Demos) 2013
%E Blomqvist, Eva; Groza, Tudor
%P 265 - 268
%B CEUR Workshop Proceedings
%N 1035
%@ false

Conference paper

A. Siu, D. B. Nguyen, and G. Weikum

“Fast Entity Recognition in Biomedical Text,” in Workshop on Data Mining for Healthcare at the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-DMH 2013), Chicago, IL, USA, 2013.

mehr

BibTeX

@inproceedings{Siu13,
TITLE = {Fast Entity Recognition in Biomedical Text},
AUTHOR = {Siu, Amy and Nguyen, Dat Ba and Weikum, Gerhard},
LANGUAGE = {eng},
PUBLISHER = {ACM},
YEAR = {2013},
BOOKTITLE = {Workshop on Data Mining for Healthcare at the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-DMH 2013)},
ADDRESS = {Chicago, IL, USA},
}

Endnote

%0 Conference Proceedings
%A Siu, Amy
%A Nguyen, Dat Ba
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Fast Entity Recognition in Biomedical Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A4E-1
%F OTHER: 6466922B5A48D9CBC1257BCF0033414F-Siu13
%D 2013
%B 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%Z date of event: 2013-08-11 - 2013-08-11
%C Chicago, IL, USA
%B Workshop on Data Mining for Healthcare at the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
%I ACM
%U https://sites.google.com/site/kdd2013dmh/doc/dmh3192_Siu.pdf?attredirects=0&d=1

Conference paper

M. Spaniol, N. Prytkova, and G. Weikum

“Knowledge Linking for Online Statistics,” in Proceedings of the 59th ISI World Statistics Congress (WSC 2013), Hong Kong, China, 2013.

mehr

BibTeX

@inproceedings{SPWe13,
TITLE = {Knowledge Linking for Online Statistics},
AUTHOR = {Spaniol, Marc and Prytkova, Natalia and Weikum, Gerhard},
LANGUAGE = {eng},
YEAR = {2014},
BOOKTITLE = {Proceedings of the 59th ISI World Statistics Congress (WSC 2013)},
ADDRESS = {Hong Kong, China},
}

Endnote

%0 Conference Proceedings
%A Spaniol, Marc
%A Prytkova, Natalia
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Linking for Online Statistics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5E1D-5
%D 2013
%B 59th ISI World Statistics Congress
%Z date of event: 2014-08-22 - 2014-08-30
%C Hong Kong, China
%B Proceedings of the 59th ISI World Statistics Congress
%U http://www.statistics.gov.hk/wsc/STS018-P2-S.pdf

Thesis

D5IMPR-CS

A. Stupar

“Soundtrack Recommendation for Images,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

Abstract

The drastic increase in production of multimedia content has emphasized the
research concerning its organization and retrieval. In this thesis, we address
the problem of music retrieval when a set of images is given as input query,
i.e., the problem of soundtrack recommendation for images. The task at hand is
to recommend appropriate music to be played during the presentation of a given
set of query images. To tackle this problem, we formulate a hypothesis that the
knowledge appropriate for the task is contained in publicly available
contemporary movies. Our approach, Picasso, employs similarity search
techniques inside the image and music domains, harvesting movies to form a link
between the domains. To achieve a fair and unbiased comparison between
different soundtrack recommendation approaches, we proposed an evaluation
benchmark. The evaluation results are reported for Picasso and the baseline
approach, using the proposed benchmark. We further address two efficiency
aspects that arise from the Picasso approach. First, we investigate the problem
of processing top-K queries with set-defined selections and propose an index
structure that aims at minimizing the query answering latency. Second, we
address the problem of similarity search in high-dimensional spaces and propose
two enhancements to the Locality Sensitive Hashing (LSH) scheme. We also
investigate the prospects of a distributed similarity search algorithm based on
LSH using the MapReduce framework. Finally, we give an overview of the
PicasSound|a smartphone application based on the Picasso approach.

BibTeX

@phdthesis{Stupar2012,
TITLE = {Soundtrack Recommendation for Images},
AUTHOR = {Stupar, Aleksandar},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-55267},
DOI = {10.22028/D291-26540},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {The drastic increase in production of multimedia content has emphasized the <br>research concerning its organization and retrieval. In this thesis, we address <br>the problem of music retrieval when a set of images is given as input query, <br>i.e., the problem of soundtrack recommendation for images. The task at hand is <br>to recommend appropriate music to be played during the presentation of a given <br>set of query images. To tackle this problem, we formulate a hypothesis that the <br>knowledge appropriate for the task is contained in publicly available <br>contemporary movies. Our approach, Picasso, employs similarity search <br>techniques inside the image and music domains, harvesting movies to form a link <br>between the domains. To achieve a fair and unbiased comparison between <br>different soundtrack recommendation approaches, we proposed an evaluation <br>benchmark. The evaluation results are reported for Picasso and the baseline <br>approach, using the proposed benchmark. We further address two efficiency <br>aspects that arise from the Picasso approach. First, we investigate the problem <br>of processing top-K queries with set-defined selections and propose an index <br>structure that aims at minimizing the query answering latency. Second, we <br>address the problem of similarity search in high-dimensional spaces and propose <br>two enhancements to the Locality Sensitive Hashing (LSH) scheme. We also <br>investigate the prospects of a distributed similarity search algorithm based on <br>LSH using the MapReduce framework. Finally, we give an overview of the <br>PicasSound|a smartphone application based on the Picasso approach.},
}

Endnote

%0 Thesis
%A Stupar, Aleksandar
%Y Michel, Sebastian
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Soundtrack Recommendation for Images : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-9794-D
%R 10.22028/D291-26540
%U urn:nbn:de:bsz:291-scidok-55267
%F OTHER: hdl:20.500.11880/26596
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%P 149 p.
%V phd
%9 phd
%X The drastic increase in production of multimedia content has emphasized the <br>research concerning its organization and retrieval. In this thesis, we address <br>the problem of music retrieval when a set of images is given as input query, <br>i.e., the problem of soundtrack recommendation for images. The task at hand is <br>to recommend appropriate music to be played during the presentation of a given <br>set of query images. To tackle this problem, we formulate a hypothesis that the <br>knowledge appropriate for the task is contained in publicly available <br>contemporary movies. Our approach, Picasso, employs similarity search <br>techniques inside the image and music domains, harvesting movies to form a link <br>between the domains. To achieve a fair and unbiased comparison between <br>different soundtrack recommendation approaches, we proposed an evaluation <br>benchmark. The evaluation results are reported for Picasso and the baseline <br>approach, using the proposed benchmark. We further address two efficiency <br>aspects that arise from the Picasso approach. First, we investigate the problem <br>of processing top-K queries with set-defined selections and propose an index <br>structure that aims at minimizing the query answering latency. Second, we <br>address the problem of similarity search in high-dimensional spaces and propose <br>two enhancements to the Locality Sensitive Hashing (LSH) scheme. We also <br>investigate the prospects of a distributed similarity search algorithm based on <br>LSH using the MapReduce framework. Finally, we give an overview of the <br>PicasSound|a smartphone application based on the Picasso approach.
%U http://scidok.sulb.uni-saarland.de/volltexte/2013/5526/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

F. M. Suchanek and G. Weikum

“Knowledge Harvesting in the Big-Data Era,” in SIGMOD’13, ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 2013.

mehr

BibTeX

@inproceedings{Suchanek:2013:KHB:2463676.2463724,
TITLE = {Knowledge Harvesting in the Big-Data Era},
AUTHOR = {Suchanek, Fabian M. and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2037-5},
URL = {http://doi.acm.org/10.1145/2463676.2463724},
DOI = {10.1145/2463676.2463724},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {SIGMOD{\textquoteright}13, ACM SIGMOD International Conference on Management of Data},
EDITOR = {Ross, Kenneth and Srivastava, Divesh and Papadias, Dimitris and Papadopoulos, Stavros},
PAGES = {933--938},
ADDRESS = {New York, NY, USA},
}

Endnote

%0 Conference Proceedings
%A Suchanek, Fabian M.
%A Weikum, Gerhard
%+ Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Harvesting in the Big-Data Era : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-555B-9
%U http://doi.acm.org/10.1145/2463676.2463724
%R 10.1145/2463676.2463724
%D 2013
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2013-06-22 - 2013-06-27
%C New York, NY, USA
%B SIGMOD&#8217;13
%E Ross, Kenneth; Srivastava, Divesh; Papadias, Dimitris; Papadopoulos, Stavros
%P 933 - 938
%I ACM
%@ 978-1-4503-2037-5

Conference paper

F. M. Suchanek and G. Weikum

“Knowledge Harvesting from Text and Web Sources,” in 29th IEEE International Conference on Data Engineering (ICDE 2013), Brisbane, Australia, 2013.

mehr

BibTeX

@inproceedings{SuchanekICDE2013,
TITLE = {Knowledge Harvesting from Text and Web Sources},
AUTHOR = {Suchanek, Fabian M. and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1063-6382},
ISBN = {978-1-4673-4909-3; 978-1-4673-4908-6},
DOI = {10.1109/ICDE.2013.6544916},
LOCALID = {Local-ID: 29838011C99BB159C1257C6900498A91-Suchanek2013},
PUBLISHER = {IEEE},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {29th IEEE International Conference on Data Engineering (ICDE 2013)},
PAGES = {1250--1253},
ADDRESS = {Brisbane, Australia},
}

Endnote

%0 Conference Proceedings
%A Suchanek, Fabian M.
%A Weikum, Gerhard
%+ Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Knowledge Harvesting from Text and Web Sources : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5565-2
%R 10.1109/ICDE.2013.6544916
%F OTHER: Local-ID: 29838011C99BB159C1257C6900498A91-Suchanek2013
%D 2013
%B 29th IEEE International Conference on Data Engineering
%Z date of event: 2013-04-08 - 2013-04-11
%C Brisbane, Australia
%B 29th IEEE International Conference on Data Engineering
%P 1250 - 1253
%I IEEE
%@ false

Conference paper

F. M. Suchanek, J. Hoffart, E. Kuzey, and E. Lewis-Kelham

“YAGO2s: Modular High-quality Information Extraction with an Application to Flight Planning,” in Datenbanksysteme für Business, Technologie und Web (BTW 2013), Magdeburg, Germany, 2013.

mehr

Abstract

Abstract: In this paper, we present YAGO2s, the new edition of the YAGO

ontology. The software architecture has been refactored from scratch,

yielding a design that modularizes both code and data. This modularization

enables

us to add in new data sources more easily, while still maintaining the high

accuracy

and coherence of the ontology. Thus, we believe that YAGO2s occupies a sweetspot

between a centralized design and a completely distributed design.

In this demo, we present an application of this design to the task of planning a

ight. Our proposed system nds ights between all airports close to the

departure

city to all airports close to the destination city.

BibTeX

@inproceedings{yago2sdemo,
TITLE = {{YAGO2s}: Modular High-quality Information Extraction with an Application to Flight Planning},
AUTHOR = {Suchanek, Fabian M. and Hoffart, Johannes and Kuzey, Erdal and Lewis-Kelham, Edwin},
LANGUAGE = {eng},
ISBN = {978-3-88579-608-4},
LOCALID = {Local-ID:C1257ACD0050F94E-5BBCDB0912AAEA29C1257AD70061912D-yago2sdemo},
PUBLISHER = {GI},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Abstract: In this paper, we present YAGO2s, the new edition of the YAGO ontology. The software architecture has been refactored from scratch, yielding a design that modularizes both code and data. This modularization enables us to add in new data sources more easily, while still maintaining the high accuracy and coherence of the ontology. Thus, we believe that YAGO2s occupies a sweetspot between a centralized design and a completely distributed design. In this demo, we present an application of this design to the task of planning a ight. Our proposed system nds ights between all airports close to the departure city to all airports close to the destination city.},
BOOKTITLE = {Datenbanksysteme f{\"u}r Business, Technologie und Web (BTW 2013)},
EDITOR = {Markl, Volker},
PAGES = {515--518},
SERIES = {Lecture Notes in Informatics},
VOLUME = {P-214},
ADDRESS = {Magdeburg, Germany},
}

Endnote

%0 Conference Proceedings
%A Suchanek, Fabian M.
%A Hoffart, Johannes
%A Kuzey, Erdal
%A Lewis-Kelham, Edwin
%+ Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T YAGO2s: Modular High-quality Information Extraction with an Application to Flight Planning : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-547D-5
%F OTHER: Local-ID:C1257ACD0050F94E-5BBCDB0912AAEA29C1257AD70061912D-yago2sdemo
%D 2013
%B 15. GI-Fachtagung Datenbanksysteme f&#252;r Business, Technologie und Web
%Z date of event: 2013-03-11 - 2013-03-15
%C Magdeburg, Germany
%X Abstract: In this paper, we present YAGO2s, the new edition of the YAGO 
ontology. The software architecture has been refactored from scratch,
yielding a design that modularizes both code and data. This modularization 
enables
us to add in new data sources more easily, while still maintaining the high 
accuracy
and coherence of the ontology. Thus, we believe that YAGO2s occupies a sweetspot
between a centralized design and a completely distributed design.
In this demo, we present an application of this design to the task of planning a
ight. Our proposed system nds ights between all airports close to the 
departure
city to all airports close to the destination city.
%B Datenbanksysteme f&#252;r Business, Technologie und Web
%E Markl, Volker
%P 515 - 518
%I GI
%@ 978-3-88579-608-4
%B Lecture Notes in Informatics
%N P-214
%U http://www.btw-2013.de/proceedings/YAGO2s%20Modular%20HighQuality%20Information%20Extraction%20with%20an%20Application%20to%20Flight%20Planning.pdf

Article

M. Sydow, M. Pikula, and R. Schenkel

“The Notion of Diversity in Graphical Entity Summarisation on Semantic Knowledge Graphs,” Intelligent Information Systems, vol. 41, no. 2, 2013.

mehr

Abstract

Given an entity represented by a single node q in semantic knowledge graph D,

the Graphical Entity Summarisation problem (GES) consists in selecting out of D

a very small surrounding graph S that constitutes a generic summary of the

information concerning the entity q with given limit on size of S. This article

concerns the role of diversity in this quite novel problem. It gives an

overview of the diversity concept in information retrieval, and proposes how to

adapt it to GES. A measure of diversity for GES, called ALC, is defined and two

algorithms presented, baseline, diversity-oblivious PRECIS and diversity-aware

DIVERSUM. A reported experiment shows that DIVERSUM actually achieves higher

values of the ALC diversity measure than PRECIS. Next, an objective evaluation

experiment demonstrates that diversity-aware algorithm is superior to the

diversity-oblivious one in terms of fact selection. More precisely, DIVERSUM

clearly achieves higher recall than PRECIS on ground truth reference entity

summaries extracted from Wikipedia. We also report another intrinsic

experiment, in which the output of diversity-aware algorithm is significantly

preferred by human expert evaluators. Importantly, the user feedback clearly

indicates that the notion of diversity is the key reason for the preference. In

addition, the experiment is repeated twice on an anonymous sample of broad

population of Internet users by means of a crowd-sourcing platform, that

further confirms the results mentioned above.

BibTeX

@article{SydowPS_IIS2013,
TITLE = {The Notion of Diversity in Graphical Entity Summarisation on Semantic Knowledge Graphs},
AUTHOR = {Sydow, Marcin and Pikula, Mariusz and Schenkel, Ralf},
LANGUAGE = {eng},
ISSN = {0925-9902},
DOI = {10.1007/s10844-013-0239-6},
LOCALID = {Local-ID: D5ACDA4FC2994BF7C1257B390032A07E-SydowPS_IIS2013},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Given an entity represented by a single node q in semantic knowledge graph D, the Graphical Entity Summarisation problem (GES) consists in selecting out of D a very small surrounding graph S that constitutes a generic summary of the information concerning the entity q with given limit on size of S. This article concerns the role of diversity in this quite novel problem. It gives an overview of the diversity concept in information retrieval, and proposes how to adapt it to GES. A measure of diversity for GES, called ALC, is defined and two algorithms presented, baseline, diversity-oblivious PRECIS and diversity-aware DIVERSUM. A reported experiment shows that DIVERSUM actually achieves higher values of the ALC diversity measure than PRECIS. Next, an objective evaluation experiment demonstrates that diversity-aware algorithm is superior to the diversity-oblivious one in terms of fact selection. More precisely, DIVERSUM clearly achieves higher recall than PRECIS on ground truth reference entity summaries extracted from Wikipedia. We also report another intrinsic experiment, in which the output of diversity-aware algorithm is significantly preferred by human expert evaluators. Importantly, the user feedback clearly indicates that the notion of diversity is the key reason for the preference. In addition, the experiment is repeated twice on an anonymous sample of broad population of Internet users by means of a crowd-sourcing platform, that further confirms the results mentioned above.},
JOURNAL = {Intelligent Information Systems},
VOLUME = {41},
NUMBER = {2},
PAGES = {109--149},
}

Endnote

%0 Journal Article
%A Sydow, Marcin
%A Pikula, Mariusz
%A Schenkel, Ralf
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T The Notion of Diversity in Graphical Entity Summarisation on Semantic Knowledge Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3728-B
%F OTHER: Local-ID: D5ACDA4FC2994BF7C1257B390032A07E-SydowPS_IIS2013
%R 10.1007/s10844-013-0239-6
%7 2013-03-12
%D 2013
%X Given an entity represented by a single node q in semantic knowledge graph D, 
the Graphical Entity Summarisation problem (GES) consists in selecting out of D 
a very small surrounding graph S that constitutes a generic summary of the 
information concerning the entity q with given limit on size of S. This article 
concerns the role of diversity in this quite novel problem. It gives an 
overview of the diversity concept in information retrieval, and proposes how to 
adapt it to GES. A measure of diversity for GES, called ALC, is defined and two 
algorithms presented, baseline, diversity-oblivious PRECIS and diversity-aware 
DIVERSUM. A reported experiment shows that DIVERSUM actually achieves higher 
values of the ALC diversity measure than PRECIS. Next, an objective evaluation 
experiment demonstrates that diversity-aware algorithm is superior to the 
diversity-oblivious one in terms of fact selection. More precisely, DIVERSUM 
clearly achieves higher recall than PRECIS on ground truth reference entity 
summaries extracted from Wikipedia. We also report another intrinsic 
experiment, in which the output of diversity-aware algorithm is significantly 
preferred by human expert evaluators. Importantly, the user feedback clearly 
indicates that the notion of diversity is the key reason for the preference. In 
addition, the experiment is repeated twice on an anonymous sample of broad 
population of Internet users by means of a crowd-sourcing platform, that 
further confirms the results mentioned above.
%J Intelligent Information Systems
%V 41
%N 2
%& 109
%P 109 - 149
%I Springer
%C Berlin
%@ false

Conference paper

B. Taneva, T. Cheng, K. Chakrabarti, and Y. He

“Mining Acronym Expansions and Their Meanings Using Query Click Log,” in WWW’13, 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 2013.

mehr

BibTeX

@inproceedings{TanevaWWW2013,
TITLE = {Mining Acronym Expansions and Their Meanings Using Query Click Log},
AUTHOR = {Taneva, Bilyana and Cheng, Tao and Chakrabarti, Kaushik and He, Yeye},
LANGUAGE = {eng},
ISBN = {978-1-4503-2035-1},
URL = {http://dl.acm.org/ft_gateway.cfm?id=2488498&ftid=1374081&dwn=1&CFID=408707560&CFTOKEN=75186124},
LOCALID = {Local-ID: 277D3F616907C539C1257B7000743251-TanevaWWW2013},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {WWW{\textquoteright}13, 22nd International Conference on World Wide Web},
EDITOR = {Almeida, Virg{\'i}lio and Schwabe, Daniel and Glaser, Hartmut and Baeza-Yates, Ricardo and Moon, Sue},
PAGES = {1261--1272},
ADDRESS = {Rio de Janeiro, Brazil},
}

Endnote

%0 Conference Proceedings
%A Taneva, Bilyana
%A Cheng, Tao
%A Chakrabarti, Kaushik
%A He, Yeye
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Mining Acronym Expansions and Their Meanings Using Query Click Log : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-372F-E
%F OTHER: Local-ID: 277D3F616907C539C1257B7000743251-TanevaWWW2013
%U http://dl.acm.org/ft_gateway.cfm?id=2488498&ftid=1374081&dwn=1&CFID=408707560&CFTOKEN=75186124
%D 2013
%B 22nd International Conference on World Wide Web
%Z date of event: 2013-05-13 - 2013-05-17
%C Rio de Janeiro, Brazil
%B WWW&#8217;13
%E Almeida, Virg&#237;lio; Schwabe, Daniel; Glaser, Hartmut; Baeza-Yates, Ricardo; Moon, Sue
%P 1261 - 1272
%I ACM
%@ 978-1-4503-2035-1

Conference paper

B. Taneva and G. Weikum

“Gem-based Entity-knowledge Maintenance,” in CIKM’13, 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 2013.

mehr

BibTeX

@inproceedings{TanevaCIKM2013,
TITLE = {Gem-based Entity-knowledge Maintenance},
AUTHOR = {Taneva, Bilyana and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2263-8},
DOI = {10.1145/2505515.2505715},
LOCALID = {Local-ID: A84DFB4035CA43A2C1257BD300555742-TanevaCIKM2013},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {CIKM'13, 22nd ACM International Conference on Information \& Knowledge Management},
EDITOR = {Nejdl, Wolfgang and Pei, Jian and Rastogi, Rajeev},
PAGES = {149--158},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Taneva, Bilyana
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Gem-based Entity-knowledge Maintenance : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3758-E
%R 10.1145/2505515.2505715
%F OTHER: Local-ID: A84DFB4035CA43A2C1257BD300555742-TanevaCIKM2013
%D 2013
%B 22nd ACM International Conference on Information & Knowledge
Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%B CIKM'13
%E Nejdl, Wolfgang; Pei, Jian; Rastogi, Rajeev
%P 149 - 158
%I ACM
%@ 978-1-4503-2263-8

Thesis

D5IMPR-CS

B. Taneva

“Automatic Population of Knowledge Bases with Multimodal Data about Named Entities,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

Abstract

Knowledge bases are of great importance for Web search, recommendations, and
many Information Retrieval tasks. However, maintaining them for not so popular
entities is often a bottleneck. Typically, such entities have limited textual
coverage and only a few ontological facts. Moreover, these entities are not
well populated with multimodal data, such as images, videos, or audio
recordings.
The goals in this thesis are (1) to populate a given knowledge base with
multimodal data about entities, such as images or audio recordings, and (2) to
ease the task of maintaining and expanding the textual knowledge about a given
entity, by recommending valuable text excerpts to the contributors of knowledge
bases.
The thesis makes three main contributions. The first two contributions
concentrate on finding images of named entities with high precision, high
recall, and high visual diversity. Our main focus are less popular entities,
for which the image search engines fail to retrieve good results. Our methods
utilize background knowledge about the entity, such as ontological facts or a
short description, and a visual-based image similarity to rank and diversify a
set of candidate images.
Our third contribution is an approach for extracting text contents related to a
given entity. It leverages a language-model-based similarity between a short
description of the entity and the text sources, and solves a budget-constraint
optimization program without any assumptions on the text structure. Moreover,
our approach is also able to reliably extract entity related audio excerpts
from news podcasts. We derive the time boundaries from the usually very noisy
audio transcriptions.

BibTeX

@phdthesis{TanevaPhDThesis,
TITLE = {Automatic Population of Knowledge Bases with Multimodal Data about Named Entities},
AUTHOR = {Taneva, Bilyana},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-54839},
DOI = {10.22028/D291-26530},
LOCALID = {Local-ID: 28FC9CE2EBDB4763C1257BD40056934A-TanevaPhDThesis},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Knowledge bases are of great importance for Web search, recommendations, and <br>many Information Retrieval tasks. However, maintaining them for not so popular <br>entities is often a bottleneck. Typically, such entities have limited textual <br>coverage and only a few ontological facts. Moreover, these entities are not <br>well populated with multimodal data, such as images, videos, or audio <br>recordings. <br>The goals in this thesis are (1) to populate a given knowledge base with <br>multimodal data about entities, such as images or audio recordings, and (2) to <br>ease the task of maintaining and expanding the textual knowledge about a given <br>entity, by recommending valuable text excerpts to the contributors of knowledge <br>bases. <br>The thesis makes three main contributions. The first two contributions <br>concentrate on finding images of named entities with high precision, high <br>recall, and high visual diversity. Our main focus are less popular entities, <br>for which the image search engines fail to retrieve good results. Our methods <br>utilize background knowledge about the entity, such as ontological facts or a <br>short description, and a visual-based image similarity to rank and diversify a <br>set of candidate images. <br>Our third contribution is an approach for extracting text contents related to a <br>given entity. It leverages a language-model-based similarity between a short <br>description of the entity and the text sources, and solves a budget-constraint <br>optimization program without any assumptions on the text structure. Moreover, <br>our approach is also able to reliably extract entity related audio excerpts <br>from news podcasts. We derive the time boundaries from the usually very noisy <br>audio transcriptions.},
}

Endnote

%0 Thesis
%A Taneva, Bilyana
%Y Weikum, Gerhard
%A referee: Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Automatic Population of Knowledge Bases with Multimodal Data about Named Entities : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-389C-E
%U urn:nbn:de:bsz:291-scidok-54839
%F OTHER: Local-ID: 28FC9CE2EBDB4763C1257BD40056934A-TanevaPhDThesis
%R 10.22028/D291-26530
%F OTHER: hdl:20.500.11880/26586
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V phd
%9 phd
%X Knowledge bases are of great importance for Web search, recommendations, and <br>many Information Retrieval tasks. However, maintaining them for not so popular <br>entities is often a bottleneck. Typically, such entities have limited textual <br>coverage and only a few ontological facts. Moreover, these entities are not <br>well populated with multimodal data, such as images, videos, or audio <br>recordings. <br>The goals in this thesis are (1) to populate a given knowledge base with <br>multimodal data about entities, such as images or audio recordings, and (2) to <br>ease the task of maintaining and expanding the textual knowledge about a given <br>entity, by recommending valuable text excerpts to the contributors of knowledge <br>bases. <br>The thesis makes three main contributions. The first two contributions <br>concentrate on finding images of named entities with high precision, high <br>recall, and high visual diversity. Our main focus are less popular entities, <br>for which the image search engines fail to retrieve good results. Our methods <br>utilize background knowledge about the entity, such as ontological facts or a <br>short description, and a visual-based image similarity to rank and diversify a <br>set of candidate images. <br>Our third contribution is an approach for extracting text contents related to a <br>given entity. It leverages a language-model-based similarity between a short <br>description of the entity and the text sources, and solves a budget-constraint <br>optimization program without any assumptions on the text structure. Moreover, <br>our approach is also able to reliably extract entity related audio excerpts <br>from news podcasts. We derive the time boundaries from the usually very noisy <br>audio transcriptions.
%U http://scidok.sulb.uni-saarland.de/volltexte/2013/5483/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

M. Theobald, L. De Raedt, M. Dylla, A. Kimming, and I. Miliaraki

“10 Years of Probabilistic Querying - What Next?,” in Advances in Databases and Information Systems (ADBIS 2013), Genoa, Italy, 2013.

mehr

BibTeX

@inproceedings{ADBIS-10YEARS-2013,
TITLE = {10 Years of Probabilistic Querying -- What Next?},
AUTHOR = {Theobald, Martin and De Raedt, Luc and Dylla, Maximilian and Kimming, Angelika and Miliaraki, Iris},
LANGUAGE = {eng},
ISSN = {0302-9743},
ISBN = {978-3-642-40682-9},
DOI = {10.1007/978-3-642-40683-6_1; 10.1007/978-3-642-40683-6},
LOCALID = {Local-ID: E3E0114D619DE0C6C1257BBB003F3F70-ADBIS-10YEARS-2013},
PUBLISHER = {Springer},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {Advances in Databases and Information Systems (ADBIS 2013)},
EDITOR = {Catania, Barbara and Guerrini, Giovanna and Pokorn{\'y}, Jaroslav},
PAGES = {1--13},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {8133},
ADDRESS = {Genoa, Italy},
}

Endnote

%0 Conference Proceedings
%A Theobald, Martin
%A De Raedt, Luc
%A Dylla, Maximilian
%A Kimming, Angelika
%A Miliaraki, Iris
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T 10 Years of Probabilistic Querying - What Next? : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-6387-C
%R 10.1007/978-3-642-40683-6_1
%F OTHER: Local-ID: E3E0114D619DE0C6C1257BBB003F3F70-ADBIS-10YEARS-2013
%D 2013
%B 17th East European Conference on Advances in Databases and Information Systems
%Z date of event: 2013-09-01 - 2013-09-04
%C Genoa, Italy
%B Advances in Databases and Information Systems
%E Catania, Barbara; Guerrini, Giovanna; Pokorn&#253;, Jaroslav
%P 1 - 13
%I Springer
%@ 978-3-642-40682-9
%B Lecture Notes in Computer Science
%N 8133
%@ false

Conference paper

Y. Wang, L. Jiang, J. Hoffart, and G. Weikum

“YaLi: a Crowdsourcing Plug-in for NERD,” in SIGIR’13, 36th International ACM SIGIR Conference on Research & Development in Information Retrieval, Dublin, Ireland, 2013.

mehr

BibTeX

@inproceedings{Jiang2013z,
TITLE = {{YaLi}: a Crowdsourcing Plug-in for {NERD}},
AUTHOR = {Wang, Yafang and Jiang, Lili and Hoffart, Johannes and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2034-4},
DOI = {10.1145/2484028.2484206},
LOCALID = {Local-ID: 2B21B82FC9973F1CC1257C6800596F07-Jiang2013z},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
BOOKTITLE = {SIGIR{\textquoteright}13, 36th International ACM SIGIR Conference on Research \& Development in Information Retrieval},
PAGES = {1111--1112},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Wang, Yafang
%A Jiang, Lili
%A Hoffart, Johannes
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T YaLi: a Crowdsourcing Plug-in for NERD : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-38B3-9
%R 10.1145/2484028.2484206
%F OTHER: Local-ID: 2B21B82FC9973F1CC1257C6800596F07-Jiang2013z
%D 2013
%B 36th International ACM SIGIR Conference on Research & Development in Information Retrieval
%Z date of event: 2013-07-28 - 2013-08-01
%C Dublin, Ireland
%B SIGIR&#8217;13
%P 1111 - 1112
%I ACM
%@ 978-1-4503-2034-4

Thesis

D5IMPR-CS

Y. Wang

“Methods and Tools for Temporal Knowledge Harvesting,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

Abstract

\chapterAbstract}

To extend the traditional knowledge base with temporal dimension, this thesis
offers methods and tools for harvesting temporal facts from both
semi-structured and textual sources.
Our contributions are briefly summarized as follows.

\begin{enumerate}
\item{\bf Timely YAGO:} A temporal knowledge base called Timely YAGO
(T-YAGO) which extends YAGO with temporal attributes is built. We define a
simple RDF-style data model to support temporal knowledge.

\item{\bf PRAVDA:} To be able to harvest as many temporal facts from
free-text as possible, we develop a system PRAVDA.
It utilizes a graph-based semi-supervised learning algorithm to extract fact
observations, which are further cleaned up by an Integer Linear Program based
constraint solver.
We also attempt to harvest spatio-temporal facts to track a person's
trajectory.

\item{\bf PRAVDA-live:} A user-centric interactive knowledge harvesting
system, called PRAVDA-live, is developed for extracting facts from natural
language free-text. It is built on the framework of PRAVDA.
It supports fact extraction of user-defined relations from ad-hoc selected text
documents
and ready-to-use RDF exports.

\item{\bf T-URDF:} We present a simple and efficient representation
model for time-dependent uncertainty in combination with first-order
inference rules and recursive queries over RDF-like knowledge bases.
We adopt the common possible-worlds semantics known from probabilistic
databases and extend it towards histogram-like confidence distributions that
capture the validity of facts across time.

\end{enumerate

All of these components are fully implemented systems, which together form an
integrative architecture.
PRAVDA and PRAVDA-live aim at gathering new facts (particularly temporal facts),
and then T-URDF reconciles them.
Finally these facts are stored in a (temporal) knowledge base, called T-YAGO.
A SPARQL-like time-aware querying language, together with a visualization tool,
are designed for T-YAGO.
Temporal knowledge can also be applied for document summarization.

BibTeX

@phdthesis{Wang-thesis2013,
TITLE = {Methods and Tools for Temporal Knowledge Harvesting},
AUTHOR = {Wang, Yafang},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-50967},
DOI = {10.22028/D291-26419},
LOCALID = {Local-ID: 142737B17504ED10C1257B19006B30E4-Wang-thesis2013},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {\chapterAbstract}<br><br>To extend the traditional knowledge base with temporal dimension, this thesis <br>offers methods and tools for harvesting temporal facts from both <br>semi-structured and textual sources.<br>Our contributions are briefly summarized as follows.<br><br>\begin{enumerate}<br> \item{\bf Timely YAGO:} A temporal knowledge base called Timely YAGO <br>(T-YAGO) which extends YAGO with temporal attributes is built. We define a <br>simple RDF-style data model to support temporal knowledge.<br><br> \item{\bf PRAVDA:} To be able to harvest as many temporal facts from <br>free-text as possible, we develop a system PRAVDA.<br>It utilizes a graph-based semi-supervised learning algorithm to extract fact <br>observations, which are further cleaned up by an Integer Linear Program based <br>constraint solver.<br>We also attempt to harvest spatio-temporal facts to track a person's <br>trajectory.<br><br> \item{\bf PRAVDA-live:} A user-centric interactive knowledge harvesting <br>system, called PRAVDA-live, is developed for extracting facts from natural <br>language free-text. It is built on the framework of PRAVDA.<br>It supports fact extraction of user-defined relations from ad-hoc selected text <br>documents<br>and ready-to-use RDF exports.<br><br> \item{\bf T-URDF:} We present a simple and efficient representation <br>model for time-dependent uncertainty in combination with first-order<br>inference rules and recursive queries over RDF-like knowledge bases.<br>We adopt the common possible-worlds semantics known from probabilistic <br>databases and extend it towards histogram-like confidence distributions that <br>capture the validity of facts across time.<br><br><br>\end{enumerate<br><br>All of these components are fully implemented systems, which together form an <br>integrative architecture.<br>PRAVDA and PRAVDA-live aim at gathering new facts (particularly temporal facts),<br>and then T-URDF reconciles them.<br>Finally these facts are stored in a (temporal) knowledge base, called T-YAGO.<br>A SPARQL-like time-aware querying language, together with a visualization tool, <br>are designed for T-YAGO.<br>Temporal knowledge can also be applied for document summarization.},
}

Endnote

%0 Thesis
%A Wang, Yafang
%Y Weikum, Gerhard
%A referee: Berberich, Klaus
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Methods and Tools for Temporal Knowledge Harvesting : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3892-2
%F OTHER: Local-ID: 142737B17504ED10C1257B19006B30E4-Wang-thesis2013
%U urn:nbn:de:bsz:291-scidok-50967
%R 10.22028/D291-26419
%F OTHER: hdl:20.500.11880/26475
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V phd
%9 phd
%X \chapterAbstract}<br><br>To extend the traditional knowledge base with temporal dimension, this thesis <br>offers methods and tools for harvesting temporal facts from both <br>semi-structured and textual sources.<br>Our contributions are briefly summarized as follows.<br><br>\begin{enumerate}<br>	\item{\bf Timely YAGO:} A temporal knowledge base called Timely YAGO <br>(T-YAGO) which extends YAGO with temporal attributes is built. We define a <br>simple RDF-style data model to support temporal knowledge.<br><br>	\item{\bf PRAVDA:} To be able to harvest as many temporal facts from <br>free-text as possible, we develop a system PRAVDA.<br>It utilizes a graph-based semi-supervised learning algorithm to extract fact <br>observations, which are further cleaned up by an Integer Linear Program based <br>constraint solver.<br>We also attempt to harvest spatio-temporal facts to track a person's <br>trajectory.<br><br>	\item{\bf PRAVDA-live:} A user-centric interactive knowledge harvesting <br>system, called PRAVDA-live, is developed for extracting facts from natural <br>language free-text. It is built on the framework of PRAVDA.<br>It supports fact extraction of user-defined relations from ad-hoc selected text <br>documents<br>and ready-to-use RDF exports.<br><br>	\item{\bf T-URDF:} We present a simple and efficient representation <br>model for time-dependent uncertainty in combination with first-order<br>inference rules and recursive queries over RDF-like knowledge bases.<br>We adopt the common possible-worlds semantics known from probabilistic <br>databases and extend it towards histogram-like confidence distributions that <br>capture the validity of facts across time.<br><br><br>\end{enumerate<br><br>All of these components are fully implemented systems, which together form an <br>integrative architecture.<br>PRAVDA and PRAVDA-live aim at gathering new facts (particularly temporal facts),<br>and then T-URDF reconciles them.<br>Finally these facts are stored in a (temporal) knowledge base, called T-YAGO.<br>A SPARQL-like time-aware querying language, together with a visualization tool, <br>are designed for T-YAGO.<br>Temporal knowledge can also be applied for document summarization.
%U http://scidok.sulb.uni-saarland.de/volltexte/2013/5096/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

M. Yahya, K. Berberich, S. Elbassuoni, and G. Weikum

“Robust Question Answering Over the Web of Linked Data,” in CIKM’13, 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 2013.

mehr

Abstract

Knowledge bases and the Web of Linked Data have become important assets for

search, recommendation, and analytics. Natural-language questions are a

user-friendly mode of tapping this wealth of knowledge and data. However,

question answering technology does not work robustly in this setting as

questions have to be translated into structured queries and users have to be

careful in phrasing their questions. This paper advocates a new approach that

allows questions to be partially translated into relaxed queries, covering the

essential but not necessarily all aspects of the user's input. To compensate

for the omissions, we exploit textual sources associated with entities and

relational facts. Our system translates user questions into an extended form of

structured SPARQL queries, with text predicates attached to triple patterns.

Our solution is based on a novel optimization model, cast into an integer

linear program, for joint decomposition and disambiguation of the user

question. We demonstrate the quality of our methods through experiments with

the QALD benchmark.

BibTeX

@inproceedings{Yahya2013,
TITLE = {Robust Question Answering Over the Web of Linked Data},
AUTHOR = {Yahya, Mohamed and Berberich, Klaus and Elbassuoni, Shady and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-2263-8},
DOI = {10.1145/2505515.2505677},
LOCALID = {Local-ID: CDFE3DD23E904B5DC1257C63006B46B7-Yahya2013},
PUBLISHER = {ACM},
YEAR = {2013},
DATE = {2013},
ABSTRACT = {Knowledge bases and the Web of Linked Data have become important assets for search, recommendation, and analytics. Natural-language questions are a user-friendly mode of tapping this wealth of knowledge and data. However, question answering technology does not work robustly in this setting as questions have to be translated into structured queries and users have to be careful in phrasing their questions. This paper advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the user's input. To compensate for the omissions, we exploit textual sources associated with entities and relational facts. Our system translates user questions into an extended form of structured SPARQL queries, with text predicates attached to triple patterns. Our solution is based on a novel optimization model, cast into an integer linear program, for joint decomposition and disambiguation of the user question. We demonstrate the quality of our methods through experiments with the QALD benchmark.},
BOOKTITLE = {CIKM'13, 22nd ACM International Conference on Information \& Knowledge Management},
EDITOR = {Nejdl, Wolfgang and Pei, Jian and Rastogi, Rajeev},
PAGES = {1107--1116},
ADDRESS = {San Francisco, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Yahya, Mohamed
%A Berberich, Klaus
%A Elbassuoni, Shady
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Robust Question Answering Over the Web of Linked Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-38A9-0
%F OTHER: Local-ID: CDFE3DD23E904B5DC1257C63006B46B7-Yahya2013
%R 10.1145/2505515.2505677
%D 2013
%B 22nd ACM International Conference on Information & Knowledge
Management
%Z date of event: 2013-10-27 - 2013-11-01
%C San Francisco, CA, USA
%X Knowledge bases and the Web of Linked Data have become important assets for 
search, recommendation, and analytics. Natural-language questions are a 
user-friendly mode of tapping this wealth of knowledge and data. However, 
question answering technology does not work robustly in this setting as 
questions have to be translated into structured queries and users have to be 
careful in phrasing their questions. This paper advocates a new approach that 
allows questions to be partially translated into relaxed queries, covering the 
essential but not necessarily all aspects of the user's input. To compensate 
for the omissions, we exploit textual sources associated with entities and 
relational facts. Our system translates user questions into an extended form of 
structured SPARQL queries, with text predicates attached to triple patterns. 
Our solution is based on a novel optimization model, cast into an integer 
linear program, for joint decomposition and disambiguation of the user 
question. We demonstrate the quality of our methods through experiments with 
the QALD benchmark.
%B CIKM'13
%E Nejdl, Wolfgang; Pei, Jian; Rastogi, Rajeev
%P 1107 - 1116
%I ACM
%@ 978-1-4503-2263-8

Conference paper

M. Yahya, K. Berberich, M. Ramanath, and G. Weikum

“On the SPOT: Question Answering over Temporally Enhanced Structured Data,” in SIGIR 2013 Workshop on Time-aware Information Access (TAIA 2013), Dublin, Ireland, 2013.

mehr

Abstract

Natural-language question answering is a convenient way for humans to discover

relevant information in structured Web data such as knowledge bases or Linked

Open Data sources. This paper focuses on data with a temporal dimension, and

discusses the problem of mapping natural-language questions into extended

SPARQL queries over RDF-structured data. We specifically address the issue of

disambiguating temporal phrases in the question into temporal entities like

dates and named events and temporal predicates. For the situation where the

data has only partial coverage of the time dimension but is augmented with

textual descriptions of entities and facts, we also discuss how to generate

queries that combine structured search with keyword conditions.

BibTeX

@inproceedings{Yahya2013a,
TITLE = {On the {SPOT}: Question Answering over Temporally Enhanced Structured Data},
AUTHOR = {Yahya, Mohamed and Berberich, Klaus and Ramanath, Maya and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://research.microsoft.com/en-us/people/milads/taia2013.proceedings.final.pdf},
LOCALID = {Local-ID: A4DC91D6C5159745C1257C63007FBCED-Yahya2013a},
PUBLISHER = {Microsoft Research},
YEAR = {2013},
ABSTRACT = {Natural-language question answering is a convenient way for humans to discover relevant information in structured Web data such as knowledge bases or Linked Open Data sources. This paper focuses on data with a temporal dimension, and discusses the problem of mapping natural-language questions into extended SPARQL queries over RDF-structured data. We specifically address the issue of disambiguating temporal phrases in the question into temporal entities like dates and named events and temporal predicates. For the situation where the data has only partial coverage of the time dimension but is augmented with textual descriptions of entities and facts, we also discuss how to generate queries that combine structured search with keyword conditions.},
BOOKTITLE = {SIGIR 2013 Workshop on Time-aware Information Access (TAIA 2013)},
EDITOR = {Diaz, Fernando and Dumais, Susan and Radinsky, Kira and de Rijke, Maarten and Shokouhi, Milad},
ADDRESS = {Dublin, Ireland},
}

Endnote

%0 Conference Proceedings
%A Yahya, Mohamed
%A Berberich, Klaus
%A Ramanath, Maya
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On the SPOT: Question Answering over Temporally Enhanced Structured Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A40-E
%F OTHER: Local-ID: A4DC91D6C5159745C1257C63007FBCED-Yahya2013a
%U http://research.microsoft.com/en-us/people/milads/taia2013.proceedings.final.pdf
%D 2013
%B SIGIR 2013 Workshop on Time-aware Information Access
%Z date of event: 2013-08-16 - 2013-08-16
%C Dublin, Ireland
%X Natural-language question answering is a convenient way for humans to discover 
relevant information in structured Web data such as knowledge bases or Linked 
Open Data sources. This paper focuses on data with a temporal dimension, and 
discusses the problem of mapping natural-language questions into extended 
SPARQL queries over RDF-structured data. We specifically address the issue of 
disambiguating temporal phrases in the question into temporal entities like 
dates and named events and temporal predicates. For the situation where the 
data has only partial coverage of the time dimension but is augmented with 
textual descriptions of entities and facts, we also discuss how to generate 
queries that combine structured search with keyword conditions.
%B SIGIR 2013 Workshop on Time-aware Information Access
%E Diaz, Fernando; Dumais, Susan; Radinsky, Kira; de Rijke, Maarten; Shokouhi, Milad
%I Microsoft Research

Conference paper

M. A. Yosef, S. Bauer, J. Hoffart, M. Spaniol, and G. Weikum

“HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, 2013.

mehr

BibTeX

@inproceedings{YosefACL2013,
TITLE = {{HYENA}-live: Fine-Grained Online Entity Type Classification from Natural-language Text},
AUTHOR = {Yosef, Mohamed Amir and Bauer, Sandro and Hoffart, Johannes and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://aclweb.org/anthology/P/P13/P13-4023.pdf},
LOCALID = {Local-ID: 3D9574E0A6116F32C1257BC000491295-YosefACL2013},
PUBLISHER = {ACL},
YEAR = {2013},
BOOKTITLE = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)},
PAGES = {133--138},
ADDRESS = {Sofia, Bulgaria},
}

Endnote

%0 Conference Proceedings
%A Yosef, Mohamed Amir
%A Bauer, Sandro
%A Hoffart, Johannes
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-374E-8
%F OTHER: Local-ID: 3D9574E0A6116F32C1257BC000491295-YosefACL2013
%U http://aclweb.org/anthology/P/P13/P13-4023.pdf
%D 2013
%B 51st Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2013-08-04 - 2013-08-09
%C Sofia, Bulgaria
%B Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics
%P 133 - 138
%I ACL

Thesis

B. Zeini

“Design and Evaluation of an Incremental Ranking Model for Large-scale Data Stored in HBase,” Universität des Saarlandes, Saarbrücken, 2013.

mehr

BibTeX

@mastersthesis{Zeini-Jahromi2013,
TITLE = {Design and Evaluation of an Incremental Ranking Model for Large-scale Data Stored in {HB}ase},
AUTHOR = {Zeini, Behrang},
LANGUAGE = {eng},
LOCALID = {Local-ID: 2BDF9CB23FFD67ADC1257B790034B6B6-Zeini-Jahromi2013},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2013},
DATE = {2013-05},
}

Endnote

%0 Thesis
%A Zeini, Behrang
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Design and Evaluation of an Incremental Ranking Model for Large-scale Data Stored in HBase : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-3A8C-8
%F OTHER: Local-ID: 2BDF9CB23FFD67ADC1257B790034B6B6-Zeini-Jahromi2013
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2013
%V master
%9 master

2012

Article

W. Ahrendt and M. Dylla

“A System for Compositional Verification of Asynchronous Objects,” Science of Computer Programming, vol. 77, no. 12, 2012.

mehr

BibTeX

@article{AhrendtDylla2012,
TITLE = {A System for Compositional Verification of Asynchronous Objects},
AUTHOR = {Ahrendt, Wolfgang and Dylla, Maximilian},
LANGUAGE = {eng},
URL = {http://dx.doi.org/10.1016/j.scico.2010.08.003},
DOI = {10.1016/j.scico.2010.08.003},
LOCALID = {Local-ID: C1256DBF005F876D-1160815AFE569B22C1257AEA0067F6F1-AhrendtDylla2012},
PUBLISHER = {Elsevier},
ADDRESS = {Amsterdam},
YEAR = {2012},
DATE = {2012},
JOURNAL = {Science of Computer Programming},
VOLUME = {77},
NUMBER = {12},
PAGES = {1289--1309},
}

Endnote

%0 Journal Article
%A Ahrendt, Wolfgang
%A Dylla, Maximilian
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A System for Compositional Verification of Asynchronous Objects : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FB2-6
%F EDOC: 647527
%R 10.1016/j.scico.2010.08.003
%U http://dx.doi.org/10.1016/j.scico.2010.08.003
%F OTHER: Local-ID: C1256DBF005F876D-1160815AFE569B22C1257AEA0067F6F1-AhrendtDylla2012
%7 2012
%D 2012
%* Review method: peer-reviewed
%J Science of Computer Programming
%V 77
%N 12
%& 1289
%P 1289 - 1309
%I Elsevier
%C Amsterdam

Conference paper

F. Alvanaki, S. Michel, K. Ramamritham, and G. Weikum

“See What’s enBlogue - Real-time Emergent Topic Identification in Social Media,” in Advances in Database Technology - EDBT 2012, Berlin, Germany, 2012.

mehr

BibTeX

@inproceedings{Alvanaki2012a,
TITLE = {See What's {enBlogue} -- Real-time Emergent Topic Identification in Social Media},
AUTHOR = {Alvanaki, Foteini and Michel, Sebastian and Ramamritham, Krithi and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-0790-1},
URL = {http://doi.acm.org/10.1145/2247596.2247636},
DOI = {10.1145/2247596.2247636},
LOCALID = {Local-ID: C1256DBF005F876D-87590022B919374CC1257981002805CB-Alvanaki2012a},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Advances in Database Technology -- EDBT 2012},
EDITOR = {Rundensteiner, Elke A. and Markl, Volker and Manolescu, Ioana and Amer-Yahia, Sihem and Naumann, Felix and Ari, Ismail},
PAGES = {336--347},
ADDRESS = {Berlin, Germany},
}

Endnote

%0 Conference Proceedings
%A Alvanaki, Foteini
%A Michel, Sebastian
%A Ramamritham, Krithi
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T See What's enBlogue - Real-time Emergent Topic Identification in Social Media : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-577D-B
%F EDOC: 647458
%R 10.1145/2247596.2247636
%U http://doi.acm.org/10.1145/2247596.2247636
%F OTHER: Local-ID: C1256DBF005F876D-87590022B919374CC1257981002805CB-Alvanaki2012a
%D 2012
%B 15th International Conference on Extending Database Technology
%Z date of event: 2012-03-27 - 2012-03-30
%C Berlin, Germany
%B Advances in Database Technology - EDBT 2012
%E Rundensteiner, Elke A.; Markl, Volker; Manolescu, Ioana; Amer-Yahia, Sihem; Naumann, Felix; Ari, Ismail
%P 336 - 347
%I ACM
%@ 978-1-4503-0790-1

Report

F. Alvanaki, S. Michel, and A. Stupar

“Building and Maintaining Halls of Fame Over a Database,” Max-Plankc-Institute für Informatik, Saarbrücken, MPI-I-2012-5-004, 2012.

mehr

Abstract

Halls of Fame are fascinating constructs. They represent the elite of an often

very large amount of entities|persons, companies, products, countries etc.

Beyond their practical use as static rankings, changes to them are particularly

interesting|for decision making processes, as input to common media or

novel narrative science applications, or simply consumed by users. In this

work, we aim at detecting events that can be characterized by changes to a

Hall of Fame ranking in an automated way. We describe how the schema and

data of a database can be used to generate Halls of Fame. In this database

scenario, by Hall of Fame we refer to distinguished tuples; entities, whose

characteristics set them apart from the majority. We dene every Hall of

Fame as one specic instance of an SQL query, such that a change in its

result is considered a noteworthy event. Identied changes (i.e., events) are

ranked using lexicographic tradeos over event and query properties and

presented to users or fed in higher-level applications. We have implemented

a full-edged prototype system that uses either database triggers or a Java

based middleware for event identication. We report on an experimental

evaluation using a real-world dataset of basketball statistics.

BibTeX

@techreport{AlvanakiMichelStupar2012,
TITLE = {Building and Maintaining Halls of Fame Over a Database},
AUTHOR = {Alvanaki, Foteini and Michel, Sebastian and Stupar, Aleksandar},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2012-5-004},
INSTITUTION = {Max-Plankc-Institute f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
ABSTRACT = {Halls of Fame are fascinating constructs. They represent the elite of an often very large amount of entities|persons, companies, products, countries etc. Beyond their practical use as static rankings, changes to them are particularly interesting|for decision making processes, as input to common media or novel narrative science applications, or simply consumed by users. In this work, we aim at detecting events that can be characterized by changes to a Hall of Fame ranking in an automated way. We describe how the schema and data of a database can be used to generate Halls of Fame. In this database scenario, by Hall of Fame we refer to distinguished tuples; entities, whose characteristics set them apart from the majority. We dene every Hall of Fame as one specic instance of an SQL query, such that a change in its result is considered a noteworthy event. Identied changes (i.e., events) are ranked using lexicographic tradeos over event and query properties and presented to users or fed in higher-level applications. We have implemented a full-edged prototype system that uses either database triggers or a Java based middleware for event identication. We report on an experimental evaluation using a real-world dataset of basketball statistics.},
TYPE = {Research Reports},
}

Endnote

%0 Report
%A Alvanaki, Foteini
%A Michel, Sebastian
%A Stupar, Aleksandar
%+ Cluster of Excellence Multimodal Computing and Interaction
Databases and Information Systems, MPI for Informatics, Max Planck Society
Cluster of Excellence Multimodal Computing and Interaction
%T Building and Maintaining Halls of Fame Over a Database : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-03E9-D
%Y Max-Plankc-Institute f&#252;r Informatik
%C Saarbr&#252;cken
%D 2012
%X Halls of Fame are fascinating constructs. They represent the elite of an often
very large amount of entities|persons, companies, products, countries etc.
Beyond their practical use as static rankings, changes to them are particularly
interesting|for decision making processes, as input to common media or
novel narrative science applications, or simply consumed by users. In this
work, we aim at detecting events that can be characterized by changes to a
Hall of Fame ranking in an automated way. We describe how the schema and
data of a database can be used to generate Halls of Fame. In this database
scenario, by Hall of Fame we refer to distinguished tuples; entities, whose
characteristics set them apart from the majority. We dene every Hall of
Fame as one specic instance of an SQL query, such that a change in its
result is considered a noteworthy event. Identied changes (i.e., events) are
ranked using lexicographic tradeos over event and query properties and
presented to users or fed in higher-level applications. We have implemented
a full-edged prototype system that uses either database triggers or a Java
based middleware for event identication. We report on an experimental
evaluation using a real-world dataset of basketball statistics.
%B Research Reports
%@ false

Conference paper

D5IMPR-CS

A. Anand, S. Bedathur, K. Berberich, and R. Schenkel

“Index Maintenance for Time-Travel Text Search,” in SIGIR’12, International ACM SIGIR Conference on Research & Development in Information Retrieval, Portland, Oregon, 2012.

mehr

Abstract

Time-travel text search enriches standard text search by temporal predicates,

so that users of web archives can easily retrieve document versions that are

considered relevant to a given keyword query and existed during a given time

interval. Different index structures have been proposed to effciently support

time-travel text search. None of them, however, can easily be updated as the

Web evolves and new document versions are added to the web archive.

In this work, we describe a novel index structure that effciently supports

time-travel text search and can be maintained

incrementally as new document versions are added to the web archive. Our

solution uses a sharded index organization, bounds the number of spuriously

read index entries per shard, and can be maintained using small in-memory

buffers and append-only operations. We present experiments on two large-scale

real-world datasets demonstrating that maintaining our novel index structure is

an order of magnitude more efficient than periodically rebuilding one of the

existing index structures, while query-processing performance is not adversely

affected.

BibTeX

@inproceedings{AnandBBS_SIGIR2012,
TITLE = {Index Maintenance for Time-Travel Text Search},
AUTHOR = {Anand, Avishek and Bedathur, Srikanta and Berberich, Klaus and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1-4503-1658-3},
URL = {http://doi.acm.org/10.1145/2348283.2348318},
DOI = {10.1145/2348283.2348318},
LOCALID = {Local-ID: C1256DBF005F876D-391B4FB15D087619C12579F0005154F0-AnandBBS_SIGIR2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to effciently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that effciently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.},
BOOKTITLE = {SIGIR'12, International ACM SIGIR Conference on Research \& Development in Information Retrieval},
EDITOR = {Callan, Jamie and Hersh, William and Maarek, Yoelle and Sanderson, Mark},
PAGES = {235--244},
ADDRESS = {Portland, Oregon},
}

Endnote

%0 Conference Proceedings
%A Anand, Avishek
%A Bedathur, Srikanta
%A Berberich, Klaus
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Index Maintenance for Time-Travel Text Search : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59CF-5
%F EDOC: 647502
%R 10.1145/2348283.2348318
%U http://doi.acm.org/10.1145/2348283.2348318
%F OTHER: Local-ID: C1256DBF005F876D-391B4FB15D087619C12579F0005154F0-AnandBBS_SIGIR2012
%D 2012
%B International ACM SIGIR Conference on Research & Development in Information Retrieval
%Z date of event: 2012-08-12 - 2012-08-16
%C Portland, Oregon
%X Time-travel text search enriches standard text search by temporal predicates, 
so that users of web archives can easily retrieve document versions that are 
considered relevant to a given keyword query and existed during a given time 
interval. Different index structures have been proposed to effciently support 
time-travel text search. None of them, however, can easily be updated as the 
Web evolves and new document versions are added to the web archive.

In this work, we describe a novel index structure that effciently supports 
time-travel text search and can be maintained
incrementally as new document versions are added to the web archive. Our 
solution uses a sharded index organization, bounds the number of spuriously 
read index entries per shard, and can be maintained using small in-memory 
buffers and append-only operations. We present experiments on two large-scale 
real-world datasets demonstrating that maintaining our novel index structure is 
an order of magnitude more efficient than periodically rebuilding one of the 
existing index structures, while query-processing performance is not adversely 
affected.
%B SIGIR'12
%E Callan, Jamie; Hersh, William; Maarek, Yoelle; Sanderson, Mark
%P 235 - 244
%I ACM
%@ 978-1-4503-1658-3

Article

R. Angelova, G. Kasneci, and G. Weikum

“Graffiti: Graph-based Classification in Heterogeneous Networks,” World Wide Web, vol. 15, no. 2, 2012.

mehr

Abstract

We address the problem of multi-label classification in heterogeneous graphs,

where nodes belong to different types and different types have different sets

of classification labels. We present a novel approach that aims to classify

nodes based on their neighborhoods. We model the mutual influence of nodes as a

random walk in which the random surfer aims at distributing class labels to

nodes while walking through the graph. When viewing class labels as “colors”,

the random surfer is essentially spraying different node types with different

color palettes; hence the name Graffiti of our method. In contrast to previous

work on topic-based random surfer models, our approach captures and exploits

the mutual influence of nodes of the same type based on their connections to

nodes of other types. We show important properties of our algorithm such as

convergence and scalability. We also confirm the practical viability of

Graffiti by an experimental study on subsets of the popular social networks

Flickr and LibraryThing. We demonstrate the superiority of our approach by

comparing it to three other state-of-the-art techniques for graph-based

classification.

BibTeX

@article{AngelovaWWW2012,
TITLE = {Graffiti: Graph-based Classification in Heterogeneous Networks},
AUTHOR = {Angelova, Ralitsa and Kasneci, Gjergji and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1573-1413},
DOI = {10.1007/s11280-011-0126-4},
LOCALID = {Local-ID: C1256DBF005F876D-30F84FC35DA3883CC1257AE900575E65-AngelovaWWW2012},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {We address the problem of multi-label classification in heterogeneous graphs, where nodes belong to different types and different types have different sets of classification labels. We present a novel approach that aims to classify nodes based on their neighborhoods. We model the mutual influence of nodes as a random walk in which the random surfer aims at distributing class labels to nodes while walking through the graph. When viewing class labels as {\textquotedblleft}colors{\textquotedblright}, the random surfer is essentially spraying different node types with different color palettes; hence the name Graffiti of our method. In contrast to previous work on topic-based random surfer models, our approach captures and exploits the mutual influence of nodes of the same type based on their connections to nodes of other types. We show important properties of our algorithm such as convergence and scalability. We also confirm the practical viability of Graffiti by an experimental study on subsets of the popular social networks Flickr and LibraryThing. We demonstrate the superiority of our approach by comparing it to three other state-of-the-art techniques for graph-based classification.},
JOURNAL = {World Wide Web},
VOLUME = {15},
NUMBER = {2},
PAGES = {139--170},
}

Endnote

%0 Journal Article
%A Angelova, Ralitsa
%A Kasneci, Gjergji
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Graffiti: Graph-based Classification in Heterogeneous Networks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59F8-7
%F EDOC: 647488
%R 10.1007/s11280-011-0126-4
%F OTHER: Local-ID: C1256DBF005F876D-30F84FC35DA3883CC1257AE900575E65-AngelovaWWW2012
%D 2012
%* Review method: peer-reviewed
%X We address the problem of multi-label classification in heterogeneous graphs, 
where nodes belong to different types and different types have different sets 
of classification labels. We present a novel approach that aims to classify 
nodes based on their neighborhoods. We model the mutual influence of nodes as a 
random walk in which the random surfer aims at distributing class labels to 
nodes while walking through the graph. When viewing class labels as &#8220;colors&#8221;, 
the random surfer is essentially spraying different node types with different 
color palettes; hence the name Graffiti of our method. In contrast to previous 
work on topic-based random surfer models, our approach captures and exploits 
the mutual influence of nodes of the same type based on their connections to 
nodes of other types. We show important properties of our algorithm such as 
convergence and scalability. We also confirm the practical viability of 
Graffiti by an experimental study on subsets of the popular social networks 
Flickr and LibraryThing. We demonstrate the superiority of our approach by 
comparing it to three other state-of-the-art techniques for graph-based 
classification.
%J World Wide Web
%V 15
%N 2
%& 139
%P 139 - 170
%I Springer
%C New York, NY
%@ false

Conference paper

D5IMPR-CS

R. Awadallah, M. Ramanath, and G. Weikum

“Harmony and Dissonance: Organizing the People’s Voices on Political Controversies,” in Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM 2012), Seattle, Washington, USA, 2012.

mehr

Abstract

The wikileaks documents about the death of Osama Bin Laden and the debates

about the economic crisis in Greece and other European countries are some of

the controversial topics being played on the news everyday. Each of these

topics has many different aspects, and there is no absolute, simple truth in

answering questions such as: should the EU guarantee the financial stability of

each member country, or should the countries themselves be solely responsible?

To understand the landscape of opinions, it would be helpful to know which

politician or other stakeholder takes which position - support or opposition -

on these aspects of controversial topics.

BibTeX

@inproceedings{AwadallahWSDM2012,
TITLE = {Harmony and Dissonance: Organizing the People's Voices on Political Controversies},
AUTHOR = {Awadallah, Rawia and Ramanath, Maya and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-0747-5},
URL = {http://doi.acm.org/10.1145/2124295.2124359},
DOI = {10.1145/2124295.2124359},
LOCALID = {Local-ID: C1256DBF005F876D-581C212B51B4810AC1257AE90053EBD7-AwadallahWSDM2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {The wikileaks documents about the death of Osama Bin Laden and the debates about the economic crisis in Greece and other European countries are some of the controversial topics being played on the news everyday. Each of these topics has many different aspects, and there is no absolute, simple truth in answering questions such as: should the EU guarantee the financial stability of each member country, or should the countries themselves be solely responsible? To understand the landscape of opinions, it would be helpful to know which politician or other stakeholder takes which position -- support or opposition -- on these aspects of controversial topics.},
BOOKTITLE = {Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM 2012)},
EDITOR = {Adar, Eytan and Agichtein, Eugene and Maarek, Yoelle and Teevan, Jaime},
PAGES = {523--532},
ADDRESS = {Seattle, Washington, USA},
}

Endnote

%0 Conference Proceedings
%A Awadallah, Rawia
%A Ramanath, Maya
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Harmony and Dissonance: Organizing the People's Voices on Political Controversies : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59E0-C
%F EDOC: 647486
%R 10.1145/2124295.2124359
%U http://doi.acm.org/10.1145/2124295.2124359
%F OTHER: Local-ID: C1256DBF005F876D-581C212B51B4810AC1257AE90053EBD7-AwadallahWSDM2012
%D 2012
%B 5th ACM International Conference on Web Search and Data Mining
%Z date of event: 2012-02-08 - 2012-02-12
%C Seattle, Washington, USA
%X The wikileaks documents about the death of Osama Bin Laden and the debates 
about the economic crisis in Greece and other European countries are some of 
the controversial topics being played on the news everyday. Each of these 
topics has many different aspects, and there is no absolute, simple truth in 
answering questions such as: should the EU guarantee the financial stability of 
each member country, or should the countries themselves be solely responsible? 
To understand the landscape of opinions, it would be helpful to know which 
politician or other stakeholder takes which position - support or opposition - 
on these aspects of controversial topics.
%B Proceedings of the 5th ACM International Conference on Web Search and Data Mining
%E Adar, Eytan; Agichtein, Eugene; Maarek, Yoelle; Teevan, Jaime
%P 523 - 532
%I ACM
%@ 978-1-4503-0747-5

Thesis

D5IMPR-CS

R. Awadallah

“Methods for Constructing an Opinion Network for Politically Controversial Topics,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

Abstract

The US presidential race, the re-election of President Hugo Chavez, and the
economic crisis in Greece and other European countries are some of the
controversial topics being played on the news everyday. To understand the
landscape of opinions on political controversies, it would be helpful to know
which politician or other stakeholder takes which position - support or
opposition - on specific aspects of these topics. The work described in this
thesis aims to automatically derive a map of the opinions-people network from
news and other Web documents. The focus is on acquiring opinions held by
various stakeholders on politically controversial topics. This opinions-people
network serves as a knowledge-base of opinions in the form of hopinion holderi
hopinioni htopici triples.
Our system to build this knowledge-base makes use of online news sources in
order to extract opinions from text snippets. These sources come with a set of
unique challenges. For example, processing text snippets involves not just
identifying the topic and the opinion, but also attributing that opinion to a
specific
opinion holder. This requires making use of deep parsing and analyzing the
parse tree. Moreover, in order to ensure uniformity, both the topic as well the
opinion holder should be mapped to canonical strings, and the topics should
also be organized into a hierarchy. Our system relies on two main components:
i) acquiring opinions which uses a combination of techniques to extract opinions
from online news sources, and ii) organizing topics which crawls and extracts
debates from online sources, and organizes these debates in a hierarchy of
political controversial topics. We present systematic evaluations of the
different components of our system, and show their high accuracies. We also
present some of the different kinds of applications that require political
analysis. We present some
application requires political analysis such as identifying flip-floppers,
political
bias, and dissenters. Such applications can make use of the knowledge-base of
opinions.

BibTeX

@phdthesis{AwadallahPhd2012,
TITLE = {Methods for Constructing an Opinion Network for Politically Controversial Topics},
AUTHOR = {Awadallah, Rawia},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-50372},
DOI = {10.22028/D291-26410},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {The US presidential race, the re-election of President Hugo Chavez, and the <br>economic crisis in Greece and other European countries are some of the <br>controversial topics being played on the news everyday. To understand the <br>landscape of opinions on political controversies, it would be helpful to know <br>which politician or other stakeholder takes which position -- support or <br>opposition -- on specific aspects of these topics. The work described in this <br>thesis aims to automatically derive a map of the opinions-people network from <br>news and other Web documents. The focus is on acquiring opinions held by <br>various stakeholders on politically controversial topics. This opinions-people <br>network serves as a knowledge-base of opinions in the form of hopinion holderi <br>hopinioni htopici triples.<br>Our system to build this knowledge-base makes use of online news sources in<br>order to extract opinions from text snippets. These sources come with a set of<br>unique challenges. For example, processing text snippets involves not just <br>identifying the topic and the opinion, but also attributing that opinion to a <br>specific<br>opinion holder. This requires making use of deep parsing and analyzing the<br>parse tree. Moreover, in order to ensure uniformity, both the topic as well the<br>opinion holder should be mapped to canonical strings, and the topics should<br>also be organized into a hierarchy. Our system relies on two main components:<br>i) acquiring opinions which uses a combination of techniques to extract opinions<br>from online news sources, and ii) organizing topics which crawls and extracts <br>debates from online sources, and organizes these debates in a hierarchy of <br>political controversial topics. We present systematic evaluations of the <br>different components of our system, and show their high accuracies. We also <br>present some of the different kinds of applications that require political <br>analysis. We present some<br>application requires political analysis such as identifying flip-floppers, <br>political<br>bias, and dissenters. Such applications can make use of the knowledge-base of<br>opinions.},
}

Endnote

%0 Thesis
%A Awadallah, Rawia
%Y Weikum, Gerhard
%A referee: Rauber, Andreas
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Methods for Constructing an Opinion Network for Politically Controversial
Topics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0026-CC92-8
%R 10.22028/D291-26410
%U urn:nbn:de:bsz:291-scidok-50372
%F OTHER: hdl:20.500.11880/26466
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V phd
%9 phd
%X The US presidential race, the re-election of President Hugo Chavez, and the <br>economic crisis in Greece and other European countries are some of the <br>controversial topics being played on the news everyday. To understand the <br>landscape of opinions on political controversies, it would be helpful to know <br>which politician or other stakeholder takes which position - support or <br>opposition - on specific aspects of these topics. The work described in this <br>thesis aims to automatically derive a map of the opinions-people network from <br>news and other Web documents. The focus is on acquiring opinions held by <br>various stakeholders on politically controversial topics. This opinions-people <br>network serves as a knowledge-base of opinions in the form of hopinion holderi <br>hopinioni htopici triples.<br>Our system to build this knowledge-base makes use of online news sources in<br>order to extract opinions from text snippets. These sources come with a set of<br>unique challenges. For example, processing text snippets involves not just <br>identifying the topic and the opinion, but also attributing that opinion to a <br>specific<br>opinion holder. This requires making use of deep parsing and analyzing the<br>parse tree. Moreover, in order to ensure uniformity, both the topic as well the<br>opinion holder should be mapped to canonical strings, and the topics should<br>also be organized into a hierarchy. Our system relies on two main components:<br>i) acquiring opinions which uses a combination of techniques to extract opinions<br>from online news sources, and ii) organizing topics which crawls and extracts <br>debates from online sources, and organizes these debates in a hierarchy of <br>political controversial topics. We present systematic evaluations of the <br>different components of our system, and show their high accuracies. We also <br>present some of the different kinds of applications that require political <br>analysis. We present some<br>application requires political analysis such as identifying flip-floppers, <br>political<br>bias, and dissenters. Such applications can make use of the knowledge-base of<br>opinions.
%U http://scidok.sulb.uni-saarland.de/volltexte/2013/5037/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

D5IMPR-CS

R. Awadallah, M. Ramanath, and G. Weikum

“Options Network for Politically Controversial Topics,” in PLEAD ’12, ACM Workshop on Politics, Elections and Data, Maui, Hawaii, USA, 2012.

mehr

BibTeX

@inproceedings{Awadallah2012,
TITLE = {Options Network for Politically Controversial Topics},
AUTHOR = {Awadallah, Rawia and Ramanath, Maya and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1713-9},
URL = {http://doi.acm.org/10.1145/2389661.2389668},
DOI = {10.1145/2389661.2389668},
LOCALID = {Local-ID: C1256DBF005F876D-6A6C02CE5A7632E3C1257B280033B845-Awadallah2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {PLEAD '12, ACM Workshop on Politics, Elections and Data},
PAGES = {15--22},
ADDRESS = {Maui, Hawaii, USA},
}

Endnote

%0 Conference Proceedings
%A Awadallah, Rawia
%A Ramanath, Maya
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Options Network for Politically Controversial Topics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58FF-1
%F EDOC: 647499
%R 10.1145/2389661.2389668
%U http://doi.acm.org/10.1145/2389661.2389668
%F OTHER: Local-ID: C1256DBF005F876D-6A6C02CE5A7632E3C1257B280033B845-Awadallah2012
%D 2012
%B ACM Workshop on Politics, Elections and Data
%Z date of event: 2012-11-02 - 2012-11-02
%C Maui, Hawaii, USA
%B PLEAD '12
%P 15 - 22
%I ACM
%@ 978-1-4503-1713-9

Conference paper

D5IMPR-CS

R. Awadallah, M. Ramanath, and G. Weikum

“PolariCQ: Polarity Classification of Political Quotations,” in CIKM’12, 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, USA, 2012.

mehr

BibTeX

@inproceedings{AwadallahCIKM2012,
TITLE = {{PolariCQ}: Polarity Classification of Political Quotations},
AUTHOR = {Awadallah, Rawia and Ramanath, Maya and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1156-4},
URL = {http://doi.acm.org/10.1145/2396761.2398549},
DOI = {10.1145/2396761.2398549},
LOCALID = {Local-ID: C1256DBF005F876D-3A0C0952AA4D44B5C1257AE900509EEC-AwadallahCIKM2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {CIKM'12, 21st ACM International Conference on Information and Knowledge Management},
EDITOR = {Chen, Xue-Wen and Lebanon, Guy and Wang, Haixun and Zaki, Mohammed J.},
PAGES = {1945--1949},
ADDRESS = {Maui, Hawaii, USA},
}

Endnote

%0 Conference Proceedings
%A Awadallah, Rawia
%A Ramanath, Maya
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T PolariCQ: Polarity Classification of Political Quotations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58BE-4
%F EDOC: 647485
%R 10.1145/2396761.2398549
%U http://doi.acm.org/10.1145/2396761.2398549
%F OTHER: Local-ID: C1256DBF005F876D-3A0C0952AA4D44B5C1257AE900509EEC-AwadallahCIKM2012
%D 2012
%B 21st ACM International Conference on Information and Knowledge Management
%Z date of event: 2012-10-29 - 2012-11-02
%C Maui, Hawaii, USA
%B CIKM'12
%E Chen, Xue-Wen; Lebanon, Guy; Wang, Haixun; Zaki, Mohammed J.
%P 1945 - 1949
%I ACM
%@ 978-1-4503-1156-4

Proceedings

R. Baeza-Yates, J. Masanès, and M. Spaniol

Eds., TempWeb 2012. ACM, 2012.

mehr

BibTeX

@proceedings{BMSp12,
TITLE = {TempWeb 2012},
EDITOR = {Baeza-Yates, Ricardo and Masan{\`e}s, Julien and Spaniol, Marc},
LANGUAGE = {eng},
ISBN = {978-1-4503-1188-5},
LOCALID = {Local-ID: C1256DBF005F876D-EAB0DA5225ED613FC1257AD8004DE1A8-BMSp12},
PUBLISHER = {ACM},
YEAR = {2012},
PAGES = {55},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%E Baeza-Yates, Ricardo
%E Masan&#232;s, Julien
%E Spaniol, Marc
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T TempWeb 2012 : Proceedings of the 2nd Temporal Web Analytics Workshop
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5761-7
%F EDOC: 647518
%@ 978-1-4503-1188-5
%F OTHER: Local-ID: C1256DBF005F876D-EAB0DA5225ED613FC1257AD8004DE1A8-BMSp12
%I ACM
%D 2012
%B TempWeb 2012
%Z date of event: 2012-04-17 - 2012-04-17
%D 2012
%C Lyon, France
%P 55

Article

K. Balog, D. Carmel, A. P. de Vries, D. M. Herzig, P. Mika, H. Roitman, R. Schenkel, P. Serdyukov, and D. T. Tran

“The First Joint International Workshop on Entity-oriented and Semantic Search (JIWES),” ACM SIGIR Forum, vol. 46, no. 2, 2012.

mehr

Abstract

The First Joint International Workshop on Entity-oriented and Semantic Search

(JIWES) workshop was held on Aug 16, 2012 in Portland, Oregon, USA, in

conjunction with the 35th Annual International ACM SIGIR Conference (SIGIR

2012). The objective for the workshop was to bring together academic

researchers and industry practitioners working on entity-oriented search to

discuss tasks and challenges, and to uncover the next frontiers for academic

research on the topic. The workshop program accommodated two invited talks,

eight refereed papers divided into two technical paper sessions, and a group

discussion.

BibTeX

@article{JIWES_SIGIRF2012,
TITLE = {The First Joint International Workshop on Entity-oriented and Semantic Search ({JIWES})},
AUTHOR = {Balog, Krisztian and Carmel, David and de Vries, Arjen P. and Herzig, Daniel M. and Mika, Peter and Roitman, Haggai and Schenkel, Ralf and Serdyukov, Pavel and Tran, Duc Thanh},
LANGUAGE = {eng},
DOI = {10.1145/2422256.2422268},
LOCALID = {Local-ID: C1256DBF005F876D-9FECE961D3BB9689C1257B1100537301-JIWES_SIGIRF2012},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {The First Joint International Workshop on Entity-oriented and Semantic Search (JIWES) workshop was held on Aug 16, 2012 in Portland, Oregon, USA, in conjunction with the 35th Annual International ACM SIGIR Conference (SIGIR 2012). The objective for the workshop was to bring together academic researchers and industry practitioners working on entity-oriented search to discuss tasks and challenges, and to uncover the next frontiers for academic research on the topic. The workshop program accommodated two invited talks, eight refereed papers divided into two technical paper sessions, and a group discussion.},
JOURNAL = {ACM SIGIR Forum},
VOLUME = {46},
NUMBER = {2},
PAGES = {87--94},
}

Endnote

%0 Journal Article
%A Balog, Krisztian
%A Carmel, David
%A de Vries, Arjen P.
%A Herzig, Daniel M.
%A Mika, Peter
%A Roitman, Haggai
%A Schenkel, Ralf
%A Serdyukov, Pavel
%A Tran, Duc Thanh
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T The First Joint International Workshop on Entity-oriented and Semantic Search (JIWES) : 
%! JIWES 2012
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-575F-F
%F EDOC: 647540
%R 10.1145/2422256.2422268
%F OTHER: Local-ID: C1256DBF005F876D-9FECE961D3BB9689C1257B1100537301-JIWES_SIGIRF2012
%D 2012
%* Review method: peer-reviewed
%X The First Joint International Workshop on Entity-oriented and Semantic Search 
(JIWES) workshop was held on Aug 16, 2012 in Portland, Oregon, USA, in 
conjunction with the 35th Annual International ACM SIGIR Conference (SIGIR 
2012). The objective for the workshop was to bring together academic 
researchers and industry practitioners working on entity-oriented search to 
discuss tasks and challenges, and to uncover the next frontiers for academic 
research on the topic. The workshop program accommodated two invited talks, 
eight refereed papers divided into two technical paper sessions, and a group 
discussion.
%J ACM SIGIR Forum
%V 46
%N 2
%& 87
%P 87 - 94
%I ACM
%C New York, NY

Article

P. Bellot, T. Chappell, A. Doucet, S. Geva, J. Kamps, G. Kazai, M. Koolen, M. Landoni, M. Marx, V. Moriceau, J. Mothe, G. Ramirez Camps, M. Sanderson, E. SanJuan, F. Scholer, X. Tannier, M. Theobald, M. Trappett, A. Trotman, and Q. Wang

“Report on INEX 2011,” ACM SIGIR Forum, vol. 46, no. 1, 2012.

mehr

BibTeX

@article{INEX-SIGIR-Forum-2012,
TITLE = {Report on {INEX} 2011},
AUTHOR = {Bellot, Patrice and Chappell, Timothy and Doucet, Antoine and Geva, Shlomo and Kamps, Jaap and Kazai, Gabriella and Koolen, Marijn and Landoni, Monica and Marx, Maarten and Moriceau, V{\'e}ronique and Mothe, Josiane and Ramirez Camps, Georgiana and Sanderson, Mark and SanJuan, Eric and Scholer, Falk and Tannier, Xavier and Theobald, Martin and Trappett, Matthew and Trotman, Andrew and Wang, Qiuyue},
LANGUAGE = {eng},
URL = {http://doi.acm.org/10.1145/2215676.2215679},
DOI = {10.1145/2215676.2215679},
LOCALID = {Local-ID: C1256DBF005F876D-AF4EDEC7D1769A24C1257AE800325373-INEX-SIGIR-Forum-2012},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2012},
DATE = {2012},
JOURNAL = {ACM SIGIR Forum},
VOLUME = {46},
NUMBER = {1},
PAGES = {33--42},
}

Endnote

%0 Journal Article
%A Bellot, Patrice
%A Chappell, Timothy
%A Doucet, Antoine
%A Geva, Shlomo
%A Kamps, Jaap
%A Kazai, Gabriella
%A Koolen, Marijn
%A Landoni, Monica
%A Marx, Maarten
%A Moriceau, V&#233;ronique
%A Mothe, Josiane
%A Ramirez Camps, Georgiana
%A Sanderson, Mark
%A SanJuan, Eric
%A Scholer, Falk
%A Tannier, Xavier
%A Theobald, Martin
%A Trappett, Matthew
%A Trotman, Andrew
%A Wang, Qiuyue
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Report on INEX 2011 : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-579D-3
%F EDOC: 647524
%R 10.1145/2215676.2215679
%U http://doi.acm.org/10.1145/2215676.2215679
%F OTHER: Local-ID: C1256DBF005F876D-AF4EDEC7D1769A24C1257AE800325373-INEX-SIGIR-Forum-2012
%D 2012
%* Review method: peer-reviewed
%J ACM SIGIR Forum
%V 46
%N 1
%& 33
%P 33 - 42
%I ACM
%C New York, NY

Report

K. Berberich and S. Bedathur

“Computing n-Gram Statistics in MapReduce,” Max-Planck-Institut für Informatik, Saa, MPI-I-2012-5-003, 2012.

mehr

BibTeX

@techreport{BerberichBedathur2012,
TITLE = {Computing n--Gram Statistics in {MapReduce}},
AUTHOR = {Berberich, Klaus and Bedathur, Srikanta},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2012-5-003},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saa},
YEAR = {2012},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Berberich, Klaus
%A Bedathur, Srikanta
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Computing n-Gram Statistics in MapReduce : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-0416-A
%Y Max-Planck-Institut f&#252;r Informatik
%C Saa
%D 2012
%P 39 p.
%B Research Report
%@ false

Conference paper

M. Bienvenu, D. Deutch, D. Martinenghi, P. Senellart, and F. Suchanek

“Dealing with the Deep Web and all its Quirks,” in Very Large Data Search (VLDS 2012), Istanbul, 2012.

mehr

BibTeX

@inproceedings{deepwebquirks2012,
TITLE = {Dealing with the {Deep Web} and all its {Quirks}},
AUTHOR = {Bienvenu, Meghyn and Deutch, Daniel and Martinenghi, Davide and Senellart, Pierre and Suchanek, Fabian},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-884-7; http://ceur-ws.org/Vol-884},
PUBLISHER = {CEUR},
YEAR = {2013},
BOOKTITLE = {Very Large Data Search (VLDS 2012)},
EDITOR = {Brambilla, Marco and Ceri, Stefano and Furche, Tim and Gottlob, Georg},
PAGES = {21--24},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {884},
ADDRESS = {Istanbul},
}

Endnote

%0 Conference Proceedings
%A Bienvenu, Meghyn
%A Deutch, Daniel
%A Martinenghi, Davide
%A Senellart, Pierre
%A Suchanek, Fabian
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
%T Dealing with the Deep Web and all its Quirks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-657C-5
%D 2012
%B Second International Workshop on Searching and Integrating New Web Data Sources

%Z date of event: 2013-08-31 - 2013-08-31
%C Istanbul
%B Very Large Data Search
%E Brambilla, Marco; Ceri, Stefano; Furche, Tim; Gottlob, Georg
%P 21 - 24
%I CEUR
%B CEUR Workshop Proceedings
%N 884
%@ false
%U http://ceur-ws.org/Vol-884/VLDS2012_p21_Bienvenu.pdf

Conference paper

C. Böhm, G. de Melo, F. Naumann, and G. Weikum

“LINDA: Distributed Web-of-Data-Scale Entity Matching,” in CIKM’12, 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, USA, 2012.

mehr

Abstract

Linked Data has emerged as a powerful way of interconnecting structured data on

the Web. However, the cross-linkage between Linked Data sources is not as

extensive as one would hope for. In this paper, we formalize the task of

automatically creating "sameAs" links across data sources in a globally

consistent manner. Our algorithm, presented in a multi-core as well as a

distributed version, achieves this link generation by accounting for joint

evidence of a match. Experiments confirm that our system scales beyond 100

million entities and delivers highly accurate results despite the vast

heterogeneity and daunting scale.

BibTeX

@inproceedings{BoehmCIKM2012,
TITLE = {{LINDA}: Distributed Web-of-Data-Scale Entity Matching},
AUTHOR = {B{\"o}hm, Christoph and de Melo, Gerard and Naumann, Felix and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1156-4},
URL = {http://doi.acm.org/10.1145/2396761.2398582},
DOI = {10.1145/2396761.2398582},
LOCALID = {Local-ID: C1256DBF005F876D-FE889843CCA8614CC1257AE90052FE14-BoehmCIKM2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Linked Data has emerged as a powerful way of interconnecting structured data on the Web. However, the cross-linkage between Linked Data sources is not as extensive as one would hope for. In this paper, we formalize the task of automatically creating "sameAs" links across data sources in a globally consistent manner. Our algorithm, presented in a multi-core as well as a distributed version, achieves this link generation by accounting for joint evidence of a match. Experiments confirm that our system scales beyond 100 million entities and delivers highly accurate results despite the vast heterogeneity and daunting scale.},
BOOKTITLE = {CIKM'12, 21st ACM International Conference on Information and Knowledge Management},
EDITOR = {Chen, Xue-Wen and Lebanon, Guy and Wang, Haixun and Zaki, Mohammed J.},
PAGES = {2104--2108},
ADDRESS = {Maui, Hawaii, USA},
}

Endnote

%0 Conference Proceedings
%A B&#246;hm, Christoph
%A de Melo, Gerard
%A Naumann, Felix
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T LINDA: Distributed Web-of-Data-Scale Entity Matching : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5963-4
%F EDOC: 647525
%R 10.1145/2396761.2398582
%U http://doi.acm.org/10.1145/2396761.2398582
%F OTHER: Local-ID: C1256DBF005F876D-FE889843CCA8614CC1257AE90052FE14-BoehmCIKM2012
%D 2012
%B 21st ACM International Conference on Information and Knowledge Management
%Z date of event: 2012-10-29 - 2012-11-02
%C Maui, Hawaii, USA
%X Linked Data has emerged as a powerful way of interconnecting structured data on 
the Web. However, the cross-linkage between Linked Data sources is not as 
extensive as one would hope for. In this paper, we formalize the task of 
automatically creating "sameAs" links across data sources in a globally 
consistent manner. Our algorithm, presented in a multi-core as well as a 
distributed version, achieves this link generation by accounting for joint 
evidence of a match. Experiments confirm that our system scales beyond 100 
million entities and delivers highly accurate results despite the vast 
heterogeneity and daunting scale.
%B CIKM'12
%E Chen, Xue-Wen; Lebanon, Guy; Wang, Haixun; Zaki, Mohammed J.
%P 2104 - 2108
%I ACM
%@ 978-1-4503-1156-4

Thesis

A. Boldyrev

“Towards an Architecture for Open-domain Information Extraction: Integrated Extraction, Clustering, and Reasoning with Patterns,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{BoldyrevBachelorsThesis2012,
TITLE = {Towards an Architecture for Open-domain Information Extraction: {Integrated} Extraction, Clustering, and Reasoning with Patterns},
AUTHOR = {Boldyrev, Artem},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
TYPE = {Bachelor's thesis},
}

Endnote

%0 Thesis
%A Boldyrev, Artem
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Towards an Architecture for Open-domain Information Extraction: Integrated Extraction, Clustering, and Reasoning with Patterns : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C48-4
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V bachelor
%9 bachelor

Thesis

D5IMPR-CS

A. Broschart

“Efficient Query Processing and Index Tuning Using Proximity Scores,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

Abstract

n the presence of growing data, the need for efficient query processing under
result quality and index size control becomes more and more a challenge to
search engines. We show how to use proximity scores to make query processing
effective and efficient with focus on either of the optimization goals.
More precisely, we make the following contributions:
• We present a comprehensive comparative analysis of proximity score models and
a rigorous analysis of the potential of phrases and adapt a leading proximity
score model for XML data.
• We discuss the feasibility of all presented proximity score models for top-k
query processing and present a novel index combining a content and proximity
score that helps to accelerate top-k query processing and improves result
quality.
• We present a novel, distributed index tuning framework for term and term pair
index lists that optimizes pruning parameters by means of well-defined
optimization criteria under disk space constraints. Indexes can be tuned with
emphasis on efficiency or effectiveness: the resulting indexes yield fast
processing at high result quality.
• We show that pruned index lists processed with a merge join outperform top-k
query processing with unpruned lists at a high result quality.
• Moreover, we present a hybrid index structure for improved cold cache run
times.

BibTeX

@phdthesis{Broschart_PhD2012,
TITLE = {Efficient Query Processing and Index Tuning Using Proximity Scores},
AUTHOR = {Broschart, Andreas},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-49816},
DOI = {10.22028/D291-26400},
LOCALID = {Local-ID: C1256DBF005F876D-DE4B2520B99264A3C1257B1900434A8C-Broschart_PhD2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {n the presence of growing data, the need for efficient query processing under <br>result quality and index size control becomes more and more a challenge to <br>search engines. We show how to use proximity scores to make query processing <br>effective and efficient with focus on either of the optimization goals.<br>More precisely, we make the following contributions:<br>\mbox{$\bullet$} We present a comprehensive comparative analysis of proximity score models and <br>a rigorous analysis of the potential of phrases and adapt a leading proximity <br>score model for XML data.<br>\mbox{$\bullet$} We discuss the feasibility of all presented proximity score models for top-k <br>query processing and present a novel index combining a content and proximity <br>score that helps to accelerate top-k query processing and improves result <br>quality.<br>\mbox{$\bullet$} We present a novel, distributed index tuning framework for term and term pair <br>index lists that optimizes pruning parameters by means of well-defined <br>optimization criteria under disk space constraints. Indexes can be tuned with <br>emphasis on efficiency or effectiveness: the resulting indexes yield fast <br>processing at high result quality.<br>\mbox{$\bullet$} We show that pruned index lists processed with a merge join outperform top-k <br>query processing with unpruned lists at a high result quality.<br>\mbox{$\bullet$} Moreover, we present a hybrid index structure for improved cold cache run <br>times.},
}

Endnote

%0 Thesis
%A Broschart, Andreas
%Y Schenkel, Ralf
%Y Suel, Torsten
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Efficient Query Processing and Index Tuning Using Proximity Scores : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-6275-D
%F EDOC: 647546
%F OTHER: Local-ID: C1256DBF005F876D-DE4B2520B99264A3C1257B1900434A8C-Broschart_PhD2012
%R 10.22028/D291-26400
%U urn:nbn:de:bsz:291-scidok-49816
%F OTHER: hdl:20.500.11880/26456
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V phd
%9 phd
%X n the presence of growing data, the need for efficient query processing under <br>result quality and index size control becomes more and more a challenge to <br>search engines. We show how to use proximity scores to make query processing <br>effective and efficient with focus on either of the optimization goals.<br>More precisely, we make the following contributions:<br>&#8226; We present a comprehensive comparative analysis of proximity score models and <br>a rigorous analysis of the potential of phrases and adapt a leading proximity <br>score model for XML data.<br>&#8226; We discuss the feasibility of all presented proximity score models for top-k <br>query processing and present a novel index combining a content and proximity <br>score that helps to accelerate top-k query processing and improves result <br>quality.<br>&#8226; We present a novel, distributed index tuning framework for term and term pair <br>index lists that optimizes pruning parameters by means of well-defined <br>optimization criteria under disk space constraints. Indexes can be tuned with <br>emphasis on efficiency or effectiveness: the resulting indexes yield fast <br>processing at high result quality.<br>&#8226; We show that pruned index lists processed with a merge join outperform top-k <br>query processing with unpruned lists at a high result quality.<br>&#8226; Moreover, we present a hybrid index structure for improved cold cache run <br>times.
%U http://scidok.sulb.uni-saarland.de/volltexte/2012/4981/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Article

D5IMPR-CS

A. Broschart and R. Schenkel

“High-performance Processing of Text Queries with Tunable Pruned Term and Term Pair Indexes,” Transactions on Information Systems, vol. 30, no. 1, 2012.

mehr

BibTeX

@article{BroschartS_TOIS2012,
TITLE = {High-performance Processing of Text Queries with Tunable Pruned Term and Term Pair Indexes},
AUTHOR = {Broschart, Andreas and Schenkel, Ralf},
LANGUAGE = {eng},
ISSN = {1046-8188},
URL = {http://doi.acm.org/10.1145/2094072.2094077},
DOI = {10.1145/2094072.2094077},
LOCALID = {Local-ID: C1256DBF005F876D-154E4E282059391BC125793A004D3BE7-BroschartS_TOIS2012},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2012},
DATE = {2012},
JOURNAL = {Transactions on Information Systems},
VOLUME = {30},
NUMBER = {1},
PAGES = {5:1--5:32},
}

Endnote

%0 Journal Article
%A Broschart, Andreas
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T High-performance Processing of Text Queries with Tunable Pruned Term and Term Pair Indexes : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59DE-3
%F EDOC: 647506
%R 10.1145/2094072.2094077
%U http://doi.acm.org/10.1145/2094072.2094077
%F OTHER: Local-ID: C1256DBF005F876D-154E4E282059391BC125793A004D3BE7-BroschartS_TOIS2012
%D 2012
%* Review method: peer-reviewed
%J Transactions on Information Systems
%V 30
%N 1
%& 5:1
%P 5:1 - 5:32
%I ACM
%C New York, NY
%@ false

Thesis

D5IMPR-CS

E. Cergani

“Relation Extraction Using Matrix Factorization Methods,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Cergani2012,
TITLE = {Relation Extraction Using Matrix Factorization Methods},
AUTHOR = {Cergani, Ervina},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-B2109FA6099C9CC8C1257AC900301865-Cergani2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Cergani, Ervina
%Y Weikum, Gerhard
%A referee: Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Relation Extraction Using Matrix Factorization Methods : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-6277-9
%F EDOC: 647514
%F OTHER: Local-ID: C1256DBF005F876D-B2109FA6099C9CC8C1257AC900301865-Cergani2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Conference paper

D5IMPR-CS

T. Crecelius and R. Schenkel

“Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs,” in CIKM’12, 21st ACM International Conference on Information and Knowledge Management, Maui, USA, 2012.

mehr

Abstract

An important building block of many graph applications such as searching in

social networks, keyword search in graphs, and retrieval of linked documents is

retrieving the transitive neighbors of a node in ascending order of their

distances. Since large graphs cannot be kept in memory and graph traversals at

query time would be prohibitively expensive, the list of neighbors for each

node is usually precomputed and stored in a compact form. While the problem of

precomputing all-pairs shortest distances has been well studied for decades,

efficiently maintaining this information when the graph changes is not as well

understood. This paper presents an algorithm for maintaining nearest neighbor

lists in weighted graphs under node insertions and decreasing edge weights. It

considers the important case where queries are a lot more frequent than

updates, and presents two approaches for transparently performing necessary

index updates while executing queries. Extensive experiments with large graphs,

including a subset of Twitter’s user graph, demonstrate that the overhead for

this maintenance is small.

BibTeX

@inproceedings{CreceliusS2012,
TITLE = {Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs},
AUTHOR = {Crecelius, Tom and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1-4503-1156-4},
URL = {http://doi.acm.org/10.1145/2396761.2396881},
DOI = {10.1145/2396761.2396881},
LOCALID = {Local-ID: C1256DBF005F876D-8D1B2527866B2B3CC1257A3E001B974A-CreceliusS2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {An important building block of many graph applications such as searching in social networks, keyword search in graphs, and retrieval of linked documents is retrieving the transitive neighbors of a node in ascending order of their distances. Since large graphs cannot be kept in memory and graph traversals at query time would be prohibitively expensive, the list of neighbors for each node is usually precomputed and stored in a compact form. While the problem of precomputing all-pairs shortest distances has been well studied for decades, efficiently maintaining this information when the graph changes is not as well understood. This paper presents an algorithm for maintaining nearest neighbor lists in weighted graphs under node insertions and decreasing edge weights. It considers the important case where queries are a lot more frequent than updates, and presents two approaches for transparently performing necessary index updates while executing queries. Extensive experiments with large graphs, including a subset of Twitter{\textquoteright}s user graph, demonstrate that the overhead for this maintenance is small.},
BOOKTITLE = {CIKM'12, 21st ACM International Conference on Information and Knowledge Management},
EDITOR = {Chen, Xue-Wen and Lebanon, Guy and Wang, Haixun and Zaki, Mohammed J.},
PAGES = {952--961},
ADDRESS = {Maui, USA},
}

Endnote

%0 Conference Proceedings
%A Crecelius, Tom
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58C0-B
%F EDOC: 647468
%R 10.1145/2396761.2396881
%U http://doi.acm.org/10.1145/2396761.2396881
%F OTHER: Local-ID: C1256DBF005F876D-8D1B2527866B2B3CC1257A3E001B974A-CreceliusS2012
%D 2012
%B 21st ACM International Conference on Information and Knowledge Management
%Z date of event: 2012-10-29 - 2012-11-02
%C Maui, USA
%X An important building block of many graph applications such as searching in 
social networks, keyword search in graphs, and retrieval of linked documents is 
retrieving the transitive neighbors of a node in ascending order of their 
distances. Since large graphs cannot be kept in memory and graph traversals at 
query time would be prohibitively expensive, the list of neighbors for each 
node is usually precomputed and stored in a compact form. While the problem of 
precomputing all-pairs shortest distances has been well studied for decades, 
efficiently maintaining this information when the graph changes is not as well 
understood. This paper presents an algorithm for maintaining nearest neighbor 
lists in weighted graphs under node insertions and decreasing edge weights. It 
considers the important case where queries are a lot more frequent than 
updates, and presents two approaches for transparently performing necessary 
index updates while executing queries. Extensive experiments with large graphs, 
including a subset of Twitter&#8217;s user graph, demonstrate that the overhead for 
this maintenance is small.
%B CIKM'12
%E Chen, Xue-Wen; Lebanon, Guy; Wang, Haixun; Zaki, Mohammed J.
%P 952 - 961
%I ACM
%@ 978-1-4503-1156-4

Thesis

D5IMPR-CS

T. Crecelius

“Socially Enhanced Search and Exploration in Social Tagging Networks,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

Abstract

Social tagging networks have become highly popular for publishing and searching
contents. Users in such networks can review, rate and comment on contents, or
annotate them with keywords (social tags) to give short but exact text
representations of even non-textual contents. In addition, there is an inherent
support for interactions and relationships among users. Thus, users naturally
form groups of friends or of common interests.

We address three research areas in our work utilising these intrinsic features
of social tagging networks.

(1) We investigate new approaches for exploiting the social knowledge of and
the relationships between users for searching and recommending relevant
contents, and integrate them in a comprehensive framework, coined SENSE, for
search in social tagging networks.

(2) To dynamically update precomputed lists of transitive friends in descending
order of their distance in user graphs of social tagging networks, we provide
an algorithm for incrementally solving the all pairs shortest distance problem
in large, disk-resident graphs and formally prove its correctness.

(3) Since users are content providers in social tagging networks, users may
keep their own data at independent, local peers that collaborate in a
distributed P2P network. We provide an algorithm for such systems to counter
cheating of peers in authority computations over social networks.

The viability of each solution is demonstrated by extensive experiments
regarding effectiveness and efficiency.

BibTeX

@phdthesis{Crecelius2012,
TITLE = {Socially Enhanced Search and Exploration in Social Tagging Networks},
AUTHOR = {Crecelius, Tom},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-48548},
DOI = {10.22028/D291-26379},
LOCALID = {Local-ID: C1256DBF005F876D-09A3BA69BFF35ED9C12579FA002F601D-Crecelius2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Social tagging networks have become highly popular for publishing and searching <br>contents. Users in such networks can review, rate and comment on contents, or <br>annotate them with keywords (social tags) to give short but exact text <br>representations of even non-textual contents. In addition, there is an inherent <br>support for interactions and relationships among users. Thus, users naturally <br>form groups of friends or of common interests.<br><br>We address three research areas in our work utilising these intrinsic features <br>of social tagging networks.<br><br>(1) We investigate new approaches for exploiting the social knowledge of and <br>the relationships between users for searching and recommending relevant <br>contents, and integrate them in a comprehensive framework, coined SENSE, for <br>search in social tagging networks.<br><br>(2) To dynamically update precomputed lists of transitive friends in descending <br>order of their distance in user graphs of social tagging networks, we provide <br>an algorithm for incrementally solving the all pairs shortest distance problem <br>in large, disk-resident graphs and formally prove its correctness.<br><br>(3) Since users are content providers in social tagging networks, users may <br>keep their own data at independent, local peers that collaborate in a <br>distributed P2P network. We provide an algorithm for such systems to counter <br>cheating of peers in authority computations over social networks.<br><br>The viability of each solution is demonstrated by extensive experiments <br>regarding effectiveness and efficiency.},
}

Endnote

%0 Thesis
%A Crecelius, Tom
%Y Schenkel, Ralf
%A referee: Amer-Yahia, Sihem
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Socially Enhanced Search and Exploration in Social Tagging Networks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-620B-C
%F EDOC: 647462
%F OTHER: Local-ID: C1256DBF005F876D-09A3BA69BFF35ED9C12579FA002F601D-Crecelius2012
%U urn:nbn:de:bsz:291-scidok-48548
%R 10.22028/D291-26379
%F OTHER: hdl:20.500.11880/26435
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%P 238 p.
%V phd
%9 phd
%X Social tagging networks have become highly popular for publishing and searching <br>contents. Users in such networks can review, rate and comment on contents, or <br>annotate them with keywords (social tags) to give short but exact text <br>representations of even non-textual contents. In addition, there is an inherent <br>support for interactions and relationships among users. Thus, users naturally <br>form groups of friends or of common interests.<br><br>We address three research areas in our work utilising these intrinsic features <br>of social tagging networks.<br><br>(1) We investigate new approaches for exploiting the social knowledge of and <br>the relationships between users for searching and recommending relevant <br>contents, and integrate them in a comprehensive framework, coined SENSE, for <br>search in social tagging networks.<br><br>(2) To dynamically update precomputed lists of transitive friends in descending <br>order of their distance in user graphs of social tagging networks, we provide <br>an algorithm for incrementally solving the all pairs shortest distance problem <br>in large, disk-resident graphs and formally prove its correctness.<br><br>(3) Since users are content providers in social tagging networks, users may <br>keep their own data at independent, local peers that collaborate in a <br>distributed P2P network. We provide an algorithm for such systems to counter <br>cheating of peers in authority computations over social networks.<br><br>The viability of each solution is demonstrated by extensive experiments <br>regarding effectiveness and efficiency.
%U http://scidok.sulb.uni-saarland.de/volltexte/2012/4854/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Article

G. de Melo and G. Weikum

“Constructing and Utilizing Wordnets Using Statistical Methods,” Language Resources and Evaluation, vol. 46, no. 2, 2012.

mehr

Abstract

Lexical databases following the wordnet paradigm capture information about

words, word senses, and their relationships. A large number of existing tools

and datasets are based on the original WordNet, so extending the landscape of

resources aligned with WordNet leads to great potential for interoperability

and to substantial synergies. Wordnets are being compiled for a considerable

number of languages, however most have yet to reach a comparable level of

coverage. We propose a method for automatically producing such resources for

new languages based on WordNet, and analyse the implications of this approach

both from a linguistic perspective as well as by considering natural language

processing tasks. Our approach takes advantage of the original WordNet in

conjunction with translation dictionaries. A small set of training associations

is used to learn a statistical model for predicting associations between terms

and senses. The associations are represented using a variety of scores that

take into account structural properties as well as semantic relatedness and

corpus frequency information. Although the resulting wordnets are imperfect in

terms of their quality and coverage of language-specific phenomena, we show

that they constitute a cheap and suitable alternative for many applications,

both for monolingual tasks as well as for cross-lingual interoperability. Apart

from analysing the resources directly, we conducted tests on semantic

relatedness assessment and cross-lingual text classification with very

promising results.

BibTeX

@article{deMelo2012,
TITLE = {Constructing and Utilizing Wordnets Using Statistical Methods},
AUTHOR = {de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {1574-020X},
DOI = {10.1007/s10579-012-9183-2},
LOCALID = {Local-ID: C1256DBF005F876D-C7F6932E33392013C1257AE90058744C-deMelo2012},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Lexical databases following the wordnet paradigm capture information about words, word senses, and their relationships. A large number of existing tools and datasets are based on the original WordNet, so extending the landscape of resources aligned with WordNet leads to great potential for interoperability and to substantial synergies. Wordnets are being compiled for a considerable number of languages, however most have yet to reach a comparable level of coverage. We propose a method for automatically producing such resources for new languages based on WordNet, and analyse the implications of this approach both from a linguistic perspective as well as by considering natural language processing tasks. Our approach takes advantage of the original WordNet in conjunction with translation dictionaries. A small set of training associations is used to learn a statistical model for predicting associations between terms and senses. The associations are represented using a variety of scores that take into account structural properties as well as semantic relatedness and corpus frequency information. Although the resulting wordnets are imperfect in terms of their quality and coverage of language-specific phenomena, we show that they constitute a cheap and suitable alternative for many applications, both for monolingual tasks as well as for cross-lingual interoperability. Apart from analysing the resources directly, we conducted tests on semantic relatedness assessment and cross-lingual text classification with very promising results.},
JOURNAL = {Language Resources and Evaluation},
VOLUME = {46},
NUMBER = {2},
PAGES = {287--311},
}

Endnote

%0 Journal Article
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Constructing and Utilizing Wordnets Using Statistical Methods : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5F9C-9
%F EDOC: 647489
%R 10.1007/s10579-012-9183-2
%F OTHER: Local-ID: C1256DBF005F876D-C7F6932E33392013C1257AE90058744C-deMelo2012
%D 2012
%* Review method: peer-reviewed
%X Lexical databases following the wordnet paradigm capture information about 
words, word senses, and their relationships. A large number of existing tools 
and datasets are based on the original WordNet, so extending the landscape of 
resources aligned with WordNet leads to great potential for interoperability 
and to substantial synergies. Wordnets are being compiled for a considerable 
number of languages, however most have yet to reach a comparable level of 
coverage. We propose a method for automatically producing such resources for 
new languages based on WordNet, and analyse the implications of this approach 
both from a linguistic perspective as well as by considering natural language 
processing tasks. Our approach takes advantage of the original WordNet in 
conjunction with translation dictionaries. A small set of training associations 
is used to learn a statistical model for predicting associations between terms 
and senses. The associations are represented using a variety of scores that 
take into account structural properties as well as semantic relatedness and 
corpus frequency information. Although the resulting wordnets are imperfect in 
terms of their quality and coverage of language-specific phenomena, we show 
that they constitute a cheap and suitable alternative for many applications, 
both for monolingual tasks as well as for cross-lingual interoperability. Apart 
from analysing the resources directly, we conducted tests on semantic 
relatedness assessment and cross-lingual text classification with very 
promising results.
%J Language Resources and Evaluation
%V 46
%N 2
%& 287
%P 287 - 311
%I Springer
%C New York, NY
%@ false

Conference paper

G. de Melo and G. Weikum

“UWN: A Large Multilingual Lexical Knowledge Base,” in The 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju Island, Korea, 2012.

mehr

BibTeX

@inproceedings{deMeloACL2012,
TITLE = {{UWN}: A Large Multilingual Lexical Knowledge Base},
AUTHOR = {de Melo, Gerard and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-27-5},
URL = {http://www.aclweb.org/anthology-new/P/P12/P12-3026.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-90F518E033031A00C1257AE9005687D4-deMeloACL2012},
PUBLISHER = {The Association for Computer Linguistics},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {The 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012)},
PAGES = {151--156},
ADDRESS = {Jeju Island, Korea},
}

Endnote

%0 Conference Proceedings
%A de Melo, Gerard
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T UWN: A Large Multilingual Lexical Knowledge Base : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-56FC-8
%F EDOC: 647487
%U http://www.aclweb.org/anthology-new/P/P12/P12-3026.pdf
%F OTHER: Local-ID: C1256DBF005F876D-90F518E033031A00C1257AE9005687D4-deMeloACL2012
%D 2012
%B 50th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2012-07-10 - 2012-07-10
%C Jeju Island, Korea
%B The 50th Annual Meeting of the Association for Computational Linguistics
%P 151 - 156
%I The Association for Computer Linguistics
%@ 978-1-937284-27-5

Conference paper

G. de Melo, C. F. Baker, N. Ide, R. Passonneau, and C. Fellbaum

“Empirical Comparisons of MASC Word Sense Annotations,” in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, 2012.

mehr

BibTeX

@inproceedings{deMelo2012LREC,
TITLE = {Empirical Comparisons of {MASC} Word Sense Annotations},
AUTHOR = {de Melo, Gerard and Baker, Collin F. and Ide, Nancy and Passonneau, Rebecca and Fellbaum, Christiane},
LANGUAGE = {eng},
ISBN = {978-2-9517408-7-7},
LOCALID = {Local-ID: C1256DBF005F876D-35E1601D8492B9EAC1257B11002E286F-deMelo2012LREC},
PUBLISHER = {ELRA},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012)},
PAGES = {23--25},
ADDRESS = {Istanbul, Turkey},
}

Endnote

%0 Conference Proceedings
%A de Melo, Gerard
%A Baker, Collin F.
%A Ide, Nancy
%A Passonneau, Rebecca
%A Fellbaum, Christiane
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
%T Empirical Comparisons of MASC Word Sense Annotations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5A38-2
%F EDOC: 647539
%F OTHER: Local-ID: C1256DBF005F876D-35E1601D8492B9EAC1257B11002E286F-deMelo2012LREC
%D 2012
%B Eight International Conference on Language Resources and Evaluation
%Z date of event: 2012-05-21 - 2012-05-27
%C Istanbul, Turkey
%B Proceedings of the Eight International Conference on Language Resources and Evaluation
%P 23 - 25
%I ELRA
%@ 978-2-9517408-7-7

Book

G. de Melo

Graph-based Methods for Large-scale Multilingual Knowledge Integration. Saarbrücken: universaar, 2012.

mehr

BibTeX

@book{deMelo2012ThesisBook,
TITLE = {Graph-based Methods for Large-scale Multilingual Knowledge Integration},
AUTHOR = {de Melo, Gerard},
LANGUAGE = {eng},
ISBN = {978-3-86223-028-0},
LOCALID = {Local-ID: C1256DBF005F876D-DEADCC4A0B56A277C1257B11002DC62F-deMelo2012ThesisBook},
PUBLISHER = {universaar},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
PAGES = {XIV, 192},
}

Endnote

%0 Book
%A de Melo, Gerard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Graph-based Methods for Large-scale Multilingual Knowledge Integration : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59F6-B
%F EDOC: 647538
%@ 978-3-86223-028-0
%F OTHER: Local-ID: C1256DBF005F876D-DEADCC4A0B56A277C1257B11002DC62F-deMelo2012ThesisBook
%I universaar
%C Saarbr&#252;cken
%D 2012
%P XIV, 192

Thesis

D5IMPR-CS

D. Denev

“Methods and Models for Web Archive Crawling,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

Abstract

Web archives offer a rich and plentiful
source of information to researchers, analysts, and legal experts.
For this purpose, they gather Web sites as the sites change over time.
In order to keep up to high standards of data quality, Web archives have to
collect all versions of the Web sites.
Due to limited resuources and technical constraints this is not possible.
Therefore, Web archives consist of versions archived at various time points
without guarantee for mutual consistency.

This thesis presents a model for assessing the data quality in
Web archives as well as a family of crawling strategies yielding
high-quality captures. We distinguish between single-visit crawling strategies
for exploratory and visit-revisit crawling strategies for evidentiary purposes.
Single-visit strategies download every page
exactly once aiming for an ``undistorted'' capture of the ever-changing Web.
We express the quality of such the resulting capture with the ``blur'' quality
measure.
In contrast, visit-revisit strategies download every page twice. The initial
downloads of all pages form the visit phase of the crawling strategy.
The second downloads are grouped together in the revisit phase.
These two phases enable us to check which pages changed during the crawling
process.
Thus, we can identify the pages that are consistent with each other.
The quality of the visit-revisit captures is expressed by the ``coherence''
measure.

Quality-conscious strategies are based on predictions of the change behaviour of
individual pages. We model the Web site dynamics by Poisson processes
with page-specific change rates. Furthermore, we show that these rates can be
statistically predicted. Finally, we propose visualization techniques for
exploring
the quality of the resulting Web archives.

A fully functional prototype demonstrates the practical viability of our
approach.

BibTeX

@phdthesis{DenevPhD2012,
TITLE = {Methods and Models for Web Archive Crawling},
AUTHOR = {Denev, Dimitar},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-49374},
DOI = {10.22028/D291-26396},
LOCALID = {Local-ID: C1256DBF005F876D-92B687F6B976DAC4C1257A65004F67A6-DenevPhD2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Web archives offer a rich and plentiful <br>source of information to researchers, analysts, and legal experts. <br>For this purpose, they gather Web sites as the sites change over time.<br>In order to keep up to high standards of data quality, Web archives have to <br>collect all versions of the Web sites. <br>Due to limited resuources and technical constraints this is not possible.<br>Therefore, Web archives consist of versions archived at various time points <br>without guarantee for mutual consistency.<br><br>This thesis presents a model for assessing the data quality in<br>Web archives as well as a family of crawling strategies yielding<br>high-quality captures. We distinguish between single-visit crawling strategies<br>for exploratory and visit-revisit crawling strategies for evidentiary purposes.<br>Single-visit strategies download every page<br>exactly once aiming for an ``undistorted'' capture of the ever-changing Web.<br>We express the quality of such the resulting capture with the ``blur'' quality <br>measure. <br>In contrast, visit-revisit strategies download every page twice. The initial <br>downloads of all pages form the visit phase of the crawling strategy.<br>The second downloads are grouped together in the revisit phase.<br>These two phases enable us to check which pages changed during the crawling <br>process. <br>Thus, we can identify the pages that are consistent with each other.<br>The quality of the visit-revisit captures is expressed by the ``coherence'' <br>measure.<br><br>Quality-conscious strategies are based on predictions of the change behaviour of<br>individual pages. We model the Web site dynamics by Poisson processes<br>with page-specific change rates. Furthermore, we show that these rates can be<br>statistically predicted. Finally, we propose visualization techniques for <br>exploring <br>the quality of the resulting Web archives.<br><br>A fully functional prototype demonstrates the practical viability of our <br>approach.},
}

Endnote

%0 Thesis
%A Denev, Dimitar
%Y Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Methods and Models for Web Archive Crawling : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-6217-1
%F EDOC: 647475
%F OTHER: Local-ID: C1256DBF005F876D-92B687F6B976DAC4C1257A65004F67A6-DenevPhD2012
%R 10.22028/D291-26396
%U urn:nbn:de:bsz:291-scidok-49374
%F OTHER: hdl:20.500.11880/26452
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V phd
%9 phd
%X Web archives offer a rich and plentiful <br>source of information to researchers, analysts, and legal experts. <br>For this purpose, they gather Web sites as the sites change over time.<br>In order to keep up to high standards of data quality, Web archives have to <br>collect all versions of the Web sites. <br>Due to limited resuources and technical constraints this is not possible.<br>Therefore, Web archives consist of versions archived at various time points <br>without guarantee for mutual consistency.<br><br>This thesis presents a model for assessing the data quality in<br>Web archives as well as a family of crawling strategies yielding<br>high-quality captures. We distinguish between single-visit crawling strategies<br>for exploratory and visit-revisit crawling strategies for evidentiary purposes.<br>Single-visit strategies download every page<br>exactly once aiming for an ``undistorted'' capture of the ever-changing Web.<br>We express the quality of such the resulting capture with the ``blur'' quality <br>measure.  <br>In contrast, visit-revisit strategies download every page twice. The initial <br>downloads of all pages form the visit phase of the crawling strategy.<br>The second downloads are grouped together in the revisit phase.<br>These two phases enable us to check which pages changed during the crawling <br>process. <br>Thus, we can identify the pages that are consistent with each other.<br>The quality of the visit-revisit captures is expressed by the ``coherence'' <br>measure.<br><br>Quality-conscious strategies are based on predictions of the change behaviour of<br>individual pages. We model the Web site dynamics by Poisson processes<br>with page-specific change rates. Furthermore, we show that these rates can be<br>statistically predicted. Finally, we propose visualization techniques for <br>exploring <br>the quality of the resulting Web archives.<br><br>A fully functional prototype demonstrates the practical viability of our <br>approach.
%U http://scidok.sulb.uni-saarland.de/volltexte/2012/4937/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Thesis

A. K. Dutta

“A Distributed In-Memory SPARQL Query Processor based on Message Passing,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Dutta2012,
TITLE = {A Distributed In-Memory {SPARQL} Query Processor based on Message Passing},
AUTHOR = {Dutta, Arnab Kumar},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-7A497488836CCFC0C1257A5D0042804B-Dutta2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Dutta, Arnab Kumar
%Y Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Distributed In-Memory SPARQL Query Processor based on Message Passing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-B001-4
%F EDOC: 647473
%F OTHER: Local-ID: C1256DBF005F876D-7A497488836CCFC0C1257A5D0042804B-Dutta2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Report

M. Dylla, I. Miliaraki, and M. Theobald

“Top-k Query Processing in Probabilistic Databases with Non-materialized Views,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2012-5-002, 2012.

mehr

BibTeX

@techreport{DyllaTopk2012,
TITLE = {Top-k Query Processing in Probabilistic Databases with Non-materialized Views},
AUTHOR = {Dylla, Maximilian and Miliaraki, Iris and Theobald, Martin},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2012-5-002},
LOCALID = {Local-ID: 62EC1C9C96B8EFF4C1257B560029F18C-DyllaTopk2012},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Dylla, Maximilian
%A Miliaraki, Iris
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Top-k Query Processing in Probabilistic Databases with Non-materialized Views : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-B02F-2
%F OTHER: Local-ID: 62EC1C9C96B8EFF4C1257B560029F18C-DyllaTopk2012
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2012
%B Research Report
%@ false

Thesis

D5IMPR-CS

S. Elbassuoni

“Effective Searching of RDF Knowledge Bases,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@phdthesis{Elbassuoni2011,
TITLE = {Effective Searching of {RDF} Knowledge Bases},
AUTHOR = {Elbassuoni, Shady},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-47085},
DOI = {10.22028/D291-26312},
LOCALID = {Local-ID: C1256DBF005F876D-5AC1FB349CA835F1C12579AB002FFB29-Elbassuoni2011},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Elbassuoni, Shady
%Y Weikum, Gerhard
%A referee: Nejdl, Wolfgang
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Effective Searching of RDF Knowledge Bases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FFC-4
%F EDOC: 647461
%F OTHER: Local-ID: C1256DBF005F876D-5AC1FB349CA835F1C12579AB002FFB29-Elbassuoni2011
%R 10.22028/D291-26312
%U urn:nbn:de:bsz:291-scidok-47085
%F OTHER: hdl:20.500.11880/26368
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V phd
%9 phd
%U http://scidok.sulb.uni-saarland.de/volltexte/2012/4708/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Conference paper

D5IMPR-CS

S. Elbassuoni, M. Ramanath, and G. Weikum

“RDF Xpress: A Fexible Expressive RDF Search Engine,” in SIGIR’12, 35th ACM SIGIR Conference on Research & Development in Information Retrieval, Portland, OR, USA, 2012.

mehr

Abstract

We demonstrate RDF Xpress, a search engine that enables users to effectively

retrieve information from large RDF knowledge bases or Linked Data Sources. RDF

Xpress provides a search interface where users can combine triple patterns with

keywords to form queries. Moreover, RDF Xpress supports automatic query

relaxation and returns a ranked list of diverse query results.

BibTeX

@inproceedings{ElbassuoniSIGIR2012,
TITLE = {{RDF Xpress}: A Flexible Expressive {RDF} Search Engine},
AUTHOR = {Elbassuoni, Shady and Ramanath, Maya and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1472-5},
URL = {http://doi.acm.org/10.1145/2348283.2348438},
DOI = {10.1145/2348283.2348438},
LOCALID = {Local-ID: C1256DBF005F876D-1D513F1D85123B44C1257AE90055F13A-ElbassuoniSIGIR2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {We demonstrate RDF Xpress, a search engine that enables users to effectively retrieve information from large RDF knowledge bases or Linked Data Sources. RDF Xpress provides a search interface where users can combine triple patterns with keywords to form queries. Moreover, RDF Xpress supports automatic query relaxation and returns a ranked list of diverse query results.},
BOOKTITLE = {SIGIR'12, 35th ACM SIGIR Conference on Research \& Development in Information Retrieval},
EDITOR = {Callan, Jamie and Hersh, William and Maarek, Yoelle and Sanderson, Mark},
PAGES = {1013--1013},
ADDRESS = {Portland, OR, USA},
}

Endnote

%0 Conference Proceedings
%A Elbassuoni, Shady
%A Ramanath, Maya
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T RDF Xpress: A Fexible Expressive RDF Search Engine : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-57A8-9
%F EDOC: 647526
%R 10.1145/2348283.2348438
%U http://doi.acm.org/10.1145/2348283.2348438
%F OTHER: Local-ID: C1256DBF005F876D-1D513F1D85123B44C1257AE90055F13A-ElbassuoniSIGIR2012
%D 2012
%B 35th ACM SIGIR Conference on Research & Development in Information Retrieval
%Z date of event: 2012-08-12 - 2012-08-16
%C Portland, OR, USA
%X We demonstrate RDF Xpress, a search engine that enables users to effectively 
retrieve information from large RDF knowledge bases or Linked Data Sources. RDF 
Xpress provides a search interface where users can combine triple patterns with 
keywords to form queries. Moreover, RDF Xpress supports automatic query 
relaxation and returns a ranked list of diverse query results.
%B SIGIR'12
%E Callan, Jamie; Hersh, William; Maarek, Yoelle; Sanderson, Mark
%P 1013 - 1013
%I ACM
%@ 978-1-4503-1472-5

Conference paper

D. Erdös, R. Gemulla, and E. Terzi

“Reconstructing Graphs from Neighborhood Data,” in 12th IEEE International Conference on Data Mining (ICDM 2012), Brussels, Belgium, 2012.

mehr

BibTeX

@inproceedings{Erdos2012,
TITLE = {Reconstructing Graphs from Neighborhood Data},
AUTHOR = {Erd{\"o}s, Dora and Gemulla, Rainer and Terzi, Evimaria},
LANGUAGE = {eng},
ISBN = {978-1-4673-4649-8},
DOI = {10.1109/ICDM.2012.154},
LOCALID = {Local-ID: C1256DBF005F876D-0C2323F93ECC9C01C1257AD7003D616C-Erdos2012},
PUBLISHER = {IEEE},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {12th IEEE International Conference on Data Mining (ICDM 2012)},
PAGES = {231--240},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Erd&#246;s, Dora
%A Gemulla, Rainer
%A Terzi, Evimaria
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Reconstructing Graphs from Neighborhood Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-57A1-8
%F EDOC: 647480
%R 10.1109/ICDM.2012.154
%F OTHER: Local-ID: C1256DBF005F876D-0C2323F93ECC9C01C1257AD7003D616C-Erdos2012
%D 2012
%B 12th IEEE International Conference on Data Mining
%Z date of event: 2012-12-10 - 2012-12-13
%C Brussels, Belgium
%B 12th IEEE International Conference on Data Mining
%P 231 - 240
%I IEEE
%@ 978-1-4673-4649-8

Conference poster

B. Fetahu and R. Schenkel

“Retrieval Evaluation on Focused Tasks,” Proceeding of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 2012.

mehr

BibTeX

@inproceedings{FetahuSchenkel_SIGIR2012,
TITLE = {Retrieval Evaluation on Focused Tasks},
AUTHOR = {Fetahu, Besnik and Schenkel, Ralf},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-2BB1029940C59CC5C1257AED003B7B3F-FetahuSchenkel_SIGIR2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Proceeding of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval},
PAGES = {1135--1136},
ADDRESS = {Portland, Oregon},
}

Endnote

%0 Generic
%A Fetahu, Besnik
%A Schenkel, Ralf
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Retrieval Evaluation on Focused Tasks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5799-B
%F EDOC: 647531
%F OTHER: Local-ID: C1256DBF005F876D-2BB1029940C59CC5C1257AED003B7B3F-FetahuSchenkel_SIGIR2012
%D 2012
%Z name of event: SIGIR 2012
%Z date of event: 2012-08-12 - 2012-08-16
%Z place of event: Portland, Oregon
%B Proceeding of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval
%P 1135 - 1136

Article

L. Galárraga, K. Hose, and R. Schenkel

“Partout: A Distributed Engine for Efficient RDF Processing,” arXiv, vol. abs/1212.5636, 2012.

mehr

Abstract

The increasing interest in Semantic Web technologies has led not only to a

rapid growth of semantic data on the Web but also to an increasing number of

backend applications with already more than a trillion triples in some cases.

Confronted with such huge amounts of data and the future growth, existing

state-of-the-art systems for storing RDF and processing SPARQL queries are no

longer sufficient. In this paper, we introduce Partout, a distributed engine

for efficient RDF processing in a cluster of machines. We propose an effective

approach for fragmenting RDF data sets based on a query log, allocating the

fragments to nodes in a cluster, and finding the optimal configuration. Partout

can efficiently handle updates and its query optimizer produces efficient query

execution plans for ad-hoc SPARQL queries. Our experiments show the superiority

of our approach to state-of-the-art approaches for partitioning and distributed

SPARQL query processing.

BibTeX

@article{Partout_CoRR,
TITLE = {Partout: A Distributed Engine for Efficient {RDF} Processing},
AUTHOR = {Gal{\'a}rraga, Luis and Hose, Katja and Schenkel, Ralf},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1212.5636},
EPRINT = {1212.5636},
EPRINTTYPE = {arXiv},
LOCALID = {Local-ID: C1256DBF005F876D-2CE92E2545984CE3C1257B12003A9F18-Partout_CoRR},
PUBLISHER = {Cornell University Library},
ADDRESS = {Ithaca, NY},
YEAR = {2012},
ABSTRACT = {The increasing interest in Semantic Web technologies has led not only to a rapid growth of semantic data on the Web but also to an increasing number of backend applications with already more than a trillion triples in some cases. Confronted with such huge amounts of data and the future growth, existing state-of-the-art systems for storing RDF and processing SPARQL queries are no longer sufficient. In this paper, we introduce Partout, a distributed engine for efficient RDF processing in a cluster of machines. We propose an effective approach for fragmenting RDF data sets based on a query log, allocating the fragments to nodes in a cluster, and finding the optimal configuration. Partout can efficiently handle updates and its query optimizer produces efficient query execution plans for ad-hoc SPARQL queries. Our experiments show the superiority of our approach to state-of-the-art approaches for partitioning and distributed SPARQL query processing.},
JOURNAL = {arXiv},
VOLUME = {abs/1212.5636},
PAGES = {1--12},
}

Endnote

%0 Journal Article
%A Gal&#225;rraga, Luis
%A Hose, Katja
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Partout: A Distributed Engine for Efficient RDF Processing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58D2-3
%F EDOC: 647541
%U http://arxiv.org/abs/1212.5636
%F OTHER: Local-ID: C1256DBF005F876D-2CE92E2545984CE3C1257B12003A9F18-Partout_CoRR
%7 2012
%D 2012
%X The increasing interest in Semantic Web technologies has led not only to a 
rapid growth of semantic data on the Web but also to an increasing number of 
backend applications with already more than a trillion triples in some cases. 
Confronted with such huge amounts of data and the future growth, existing 
state-of-the-art systems for storing RDF and processing SPARQL queries are no 
longer sufficient. In this paper, we introduce Partout, a distributed engine 
for efficient RDF processing in a cluster of machines. We propose an effective 
approach for fragmenting RDF data sets based on a query log, allocating the 
fragments to nodes in a cluster, and finding the optimal configuration. Partout 
can efficiently handle updates and its query optimizer produces efficient query 
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority 
of our approach to state-of-the-art approaches for partitioning and distributed 
SPARQL query processing.
%J arXiv
%V abs/1212.5636
%& 1
%P 1 - 12
%I Cornell University Library
%C Ithaca, NY

Thesis

L. Galárraga

“Partout: A Distributed Approach Towards Scalable RDF Processing,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Galarraga_MT2012,
TITLE = {Partout: A Distributed Approach Towards Scalable {RDF} Processing},
AUTHOR = {Gal{\'a}rraga, Luis},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-001507D5A03060E2C1257B190040DE84-Galarraga_MT2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Gal&#225;rraga, Luis
%Y Hose, Katja
%Y Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Partout: A Distributed Approach Towards Scalable RDF Processing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-604E-6
%F EDOC: 647543
%F OTHER: Local-ID: C1256DBF005F876D-001507D5A03060E2C1257B190040DE84-Galarraga_MT2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Conference paper

E. Galbrun and P. Miettinen

“A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining,” in ECML PKDD 2012 Workshop on Instant Interactive Data Mining, Bristol, UK, 2012.

mehr

Abstract

We present a method for visual and interactive geospatial redescription mining.

The goal of geospatial redescription mining is to characterize geospatial areas

using two different descriptions, such as their bioclimatic features and fauna.

Indeed, one application of geospa- tial redescription mining is finding

bioclimatic niches, i.e. explaining the distribution of species using their

bioclimatic envelope.

Allowing users to find the geospatial redescriptions in an interactive way, and

to see the results in clear visualizations, is fundamental for the ap-

plicability of the method. We present several goals we think a good in-

teractive and visual redescription mining method should fulfil, and we explain

how our proposed method achieves (most of) them. Finally, we also discuss some

open problems in interactive redescription mining.

BibTeX

@inproceedings{galbrun12case,
TITLE = {A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
URL = {http://adrem.ua.ac.be/iid2012/papers/galbrun_miettinen-visual_and_interactive_geospatial_redescription_mining.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-0468E8CE7C5F550BC1257AF3003CFE4C-galbrun12case},
YEAR = {2012},
ABSTRACT = {We present a method for visual and interactive geospatial redescription mining. The goal of geospatial redescription mining is to characterize geospatial areas using two different descriptions, such as their bioclimatic features and fauna. Indeed, one application of geospa- tial redescription mining is finding bioclimatic niches, i.e. explaining the distribution of species using their bioclimatic envelope. Allowing users to find the geospatial redescriptions in an interactive way, and to see the results in clear visualizations, is fundamental for the ap- plicability of the method. We present several goals we think a good in- teractive and visual redescription mining method should fulfil, and we explain how our proposed method achieves (most of) them. Finally, we also discuss some open problems in interactive redescription mining.},
BOOKTITLE = {ECML PKDD 2012 Workshop on Instant Interactive Data Mining},
PAGES = {1--12},
ADDRESS = {Bristol, UK},
}

Endnote

%0 Conference Proceedings
%A Galbrun, Esther
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FBA-5
%F EDOC: 647533
%U http://adrem.ua.ac.be/iid2012/papers/galbrun_miettinen-visual_and_interactive_geospatial_redescription_mining.pdf
%F OTHER: Local-ID: C1256DBF005F876D-0468E8CE7C5F550BC1257AF3003CFE4C-galbrun12case
%D 2012
%B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
%Z date of event: 2012-09-24 - 2012-09-24
%C Bristol, UK
%X We present a method for visual and interactive geospatial redescription mining. 
The goal of geospatial redescription mining is to characterize geospatial areas 
using two different descriptions, such as their bioclimatic features and fauna. 
Indeed, one application of geospa- tial redescription mining is finding 
bioclimatic niches, i.e. explaining the distribution of species using their 
bioclimatic envelope.

Allowing users to find the geospatial redescriptions in an interactive way, and 
to see the results in clear visualizations, is fundamental for the ap- 
plicability of the method. We present several goals we think a good in- 
teractive and visual redescription mining method should fulfil, and we explain 
how our proposed method achieves (most of) them. Finally, we also discuss some 
open problems in interactive redescription mining.
%B ECML PKDD 2012 Workshop on Instant Interactive Data Mining
%P 1 - 12

Conference paper

E. Galbrun and P. Miettinen

“Siren: An Interactive Tool for Mining and Visualizing Geospatial Redescriptions,” in KDD’12, 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012.

mehr

Abstract

We present SIREN, an interactive tool for mining and visualizing geospatial

redescriptions. Redescription mining is a powerful data analysis tool that aims

at finding alternative descriptions of the same entities. For example, in

biology, an important task is to identify the bioclimatic constraints that

allow some species to survive, that is, to describe geographical regions in

terms of both the fauna that inhabits them and their bioclimatic conditions.

Using SIREN, users can explore geospatial data of their interest by visualizing

the redescriptions on a map, interactively edit, extend and filter them.

To demonstrate the use of the tool, we focus on climatic niche-finding over

Europe, as an example task. Yet, SIREN is by no means limited to a particular

dataset or application.

BibTeX

@inproceedings{galbrun12siren,
TITLE = {Siren: An Interactive Tool for Mining and Visualizing Geospatial Redescriptions},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-4503-1462-6},
DOI = {10.1145/2339530.2339776},
LOCALID = {Local-ID: C1256DBF005F876D-D8C1FE9633C1FEEAC1257AF3003C19CA-galbrun12siren},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {We present SIREN, an interactive tool for mining and visualizing geospatial redescriptions. Redescription mining is a powerful data analysis tool that aims at finding alternative descriptions of the same entities. For example, in biology, an important task is to identify the bioclimatic constraints that allow some species to survive, that is, to describe geographical regions in terms of both the fauna that inhabits them and their bioclimatic conditions. Using SIREN, users can explore geospatial data of their interest by visualizing the redescriptions on a map, interactively edit, extend and filter them. To demonstrate the use of the tool, we focus on climatic niche-finding over Europe, as an example task. Yet, SIREN is by no means limited to a particular dataset or application.},
BOOKTITLE = {KDD'12, 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
PAGES = {1544--1547},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Galbrun, Esther
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Siren: An Interactive Tool for Mining and Visualizing Geospatial Redescriptions : (Demo)
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5768-A
%F EDOC: 647492
%R 10.1145/2339530.2339776
%F OTHER: Local-ID: C1256DBF005F876D-D8C1FE9633C1FEEAC1257AF3003C19CA-galbrun12siren
%D 2012
%B 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
%Z date of event: 2012-08-12 - 2012-08-16
%C Beijing, China
%X We present SIREN, an interactive tool for mining and visualizing geospatial 
redescriptions. Redescription mining is a powerful data analysis tool that aims 
at finding alternative descriptions of the same entities. For example, in 
biology, an important task is to identify the bioclimatic constraints that 
allow some species to survive, that is, to describe geographical regions in 
terms of both the fauna that inhabits them and their bioclimatic conditions.

Using SIREN, users can explore geospatial data of their interest by visualizing 
the redescriptions on a map, interactively edit, extend and filter them.

To demonstrate the use of the tool, we focus on climatic niche-finding over 
Europe, as an example task. Yet, SIREN is by no means limited to a particular 
dataset or application.
%B KDD'12
%P 1544 - 1547
%I ACM
%@ 978-1-4503-1462-6

Article

E. Galbrun and P. Miettinen

“From Black and White to Full Colour: Extending Redescription Mining Outside the Boolean World,” Statistical Analysis and Data Mining, vol. 5, no. 4, 2012.

mehr

Abstract

Redescription mining is a powerful data analysis tool that is used to find

multiple descriptions of the same entities. Consider geographical regions as an

example. They can be characterized by the fauna that inhabits them on one hand

and by their meteorological conditions on the other hand. Finding such

redescriptors, a task known as niche-finding, is of much importance in biology.

Current redescription mining methods cannot handle other than Boolean data.

This restricts the range of possible applications or makes discretization a

prerequisite, entailing a possibly harmful loss of information. In

niche-finding, while the fauna can be naturally represented using a Boolean

presence/absence data, the weather cannot.

In this paper, we extend redescription mining to categorical and real-valued

data with possibly missing values using a surprisingly simple and efficient

approach. We provide extensive experimental evaluation to study the behaviour

of the proposed algorithm. Furthermore, we show the statistical significance of

our results using recent innovations on randomization methods.

BibTeX

@article{galbrun12black,
TITLE = {From Black and White to Full Colour: Extending Redescription Mining Outside the Boolean World},
AUTHOR = {Galbrun, Esther and Miettinen, Pauli},
LANGUAGE = {eng},
ISSN = {1932-1872},
DOI = {10.1002/sam.11145},
LOCALID = {Local-ID: C1256DBF005F876D-BE029674FDF3303BC1257AF300432F52-galbrun12black},
PUBLISHER = {Wiley},
ADDRESS = {Chichester},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding such redescriptors, a task known as niche-finding, is of much importance in biology. Current redescription mining methods cannot handle other than Boolean data. This restricts the range of possible applications or makes discretization a prerequisite, entailing a possibly harmful loss of information. In niche-finding, while the fauna can be naturally represented using a Boolean presence/absence data, the weather cannot. In this paper, we extend redescription mining to categorical and real-valued data with possibly missing values using a surprisingly simple and efficient approach. We provide extensive experimental evaluation to study the behaviour of the proposed algorithm. Furthermore, we show the statistical significance of our results using recent innovations on randomization methods.},
JOURNAL = {Statistical Analysis and Data Mining},
VOLUME = {5},
NUMBER = {4},
PAGES = {284--303},
}

Endnote

%0 Journal Article
%A Galbrun, Esther
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T From Black and White to Full Colour: Extending Redescription Mining Outside the Boolean World :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5A2A-2
%F EDOC: 647493
%R 10.1002/sam.11145
%F OTHER: Local-ID: C1256DBF005F876D-BE029674FDF3303BC1257AF300432F52-galbrun12black
%7 2012
%D 2012
%* Review method: peer-reviewed
%X Redescription mining is a powerful data analysis tool that is used to find
multiple descriptions of the same entities. Consider geographical regions as an
example. They can be characterized by the fauna that inhabits them on one hand
and by their meteorological conditions on the other hand. Finding such
redescriptors, a task known as niche-finding, is of much importance in biology.

Current redescription mining methods cannot handle other than Boolean data.
This restricts the range of possible applications or makes discretization a
prerequisite, entailing a possibly harmful loss of information. In
niche-finding, while the fauna can be naturally represented using a Boolean
presence/absence data, the weather cannot.

In this paper, we extend redescription mining to categorical and real-valued
data with possibly missing values using a surprisingly simple and efficient
approach. We provide extensive experimental evaluation to study the behaviour
of the proposed algorithm. Furthermore, we show the statistical significance of
our results using recent innovations on randomization methods.
%J Statistical Analysis and Data Mining
%V 5
%N 4
%& 284
%P 284 - 303
%I Wiley
%C Chichester
%@ false

Conference paper

C. Gao and S. Michel

“Top-k Interesting Phrase Mining in Ad-hoc Collections using Sequence Pattern Indexing,” in Advances in Database Technology - EDBT 2012, Berlin, Germany, 2012.

mehr

BibTeX

@inproceedings{GaoM2012,
TITLE = {Top-k Interesting Phrase Mining in Ad-hoc Collections using Sequence Pattern Indexing},
AUTHOR = {Gao, Chuancong and Michel, Sebastian},
LANGUAGE = {eng},
ISBN = {978-1-4503-0790-1},
URL = {http://doi.acm.org/10.1145/2247596.2247628},
DOI = {10.1145/2247596.2247628},
LOCALID = {Local-ID: C1256DBF005F876D-6110B86888907C1BC1257981002853F6-GaoM2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Advances in Database Technology -- EDBT 2012},
EDITOR = {Rundensteiner, Elke A. and Markl, Volker and Manolescu, Ioana and Amer-Yahia, Sihem and Naumann, Felix and Ari, Ismail},
PAGES = {264--275},
ADDRESS = {Berlin, Germany},
}

Endnote

%0 Conference Proceedings
%A Gao, Chuancong
%A Michel, Sebastian
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Top-k Interesting Phrase Mining in Ad-hoc Collections using Sequence Pattern Indexing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-575D-4
%F EDOC: 647459
%R 10.1145/2247596.2247628
%U http://doi.acm.org/10.1145/2247596.2247628
%F OTHER: Local-ID: C1256DBF005F876D-6110B86888907C1BC1257981002853F6-GaoM2012
%D 2012
%B 15th International Conference on Extending Database Technology
%Z date of event: 2012-03-26 - 2012-03-30
%C Berlin, Germany
%B Advances in Database Technology - EDBT 2012
%E Rundensteiner, Elke A.; Markl, Volker; Manolescu, Ioana; Amer-Yahia, Sihem; Naumann, Felix; Ari, Ismail
%P 264 - 275
%I ACM
%@ 978-1-4503-0790-1

Proceedings

S. Geva, J. Kamps, and R. Schenkel

Eds., Focused Retrieval of Content and Structure : 10th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2011. Springer, 2012.

mehr

BibTeX

@proceedings{GevaKS_INEX2012,
TITLE = {Focused Retrieval of Content and Structure : 10th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2011},
EDITOR = {Geva, Shlomo and Kamps, Jaap and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-3-642-35733-6},
DOI = {10.1007/978-3-642-35734-3},
LOCALID = {Local-ID: C1256DBF005F876D-31142B6B34E4633DC1257A3E0022DF89-GevaKS_INEX2012},
PUBLISHER = {Springer},
YEAR = {2011},
DATE = {2012},
PAGES = {336},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {7424},
ADDRESS = {Saarbr{\"u}cken},
}

Endnote

%0 Conference Proceedings
%E Geva, Shlomo
%E Kamps, Jaap
%E Schenkel, Ralf
%+ External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Focused Retrieval of Content and Structure : 10th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2011 : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5A2D-B
%F EDOC: 647467
%@ 978-3-642-35733-6
%R 10.1007/978-3-642-35734-3
%F OTHER: Local-ID: C1256DBF005F876D-31142B6B34E4633DC1257A3E0022DF89-GevaKS_INEX2012
%I Springer
%D 2012
%B INEX 2011
%Z date of event: 2011-12-12 - 2011-12-14
%D 2011
%C Saarbr&#252;cken
%P 336
%S Lecture Notes in Computer Science
%V 7424

Conference paper

C. Giatsidis, K. Berberich, D. Thilikos, and M. Vazirgiannis

“Visual Exploration of Collaboration Networks based on Graph Degeneracy,” in KDD 2012, 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012.

mehr

BibTeX

@inproceedings{Giatsidis2012,
TITLE = {Visual Exploration of Collaboration Networks based on Graph Degeneracy},
AUTHOR = {Giatsidis, Christos and Berberich, Klaus and Thilikos, Dimitrios and Vazirgiannis, Michalis},
LANGUAGE = {eng},
ISBN = {978-1-4503-1462-6},
URL = {http://dl.acm.org/citation.cfm?doid=2339530.2339768},
DOI = {10.1145/2339530.2339768},
LOCALID = {Local-ID: C1256DBF005F876D-8613ABA2EF772908C1257A630043693E-Giatsidis2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {KDD 2012, 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
PAGES = {1512--1515},
ADDRESS = {Beijing, China},
}

Endnote

%0 Conference Proceedings
%A Giatsidis, Christos
%A Berberich, Klaus
%A Thilikos, Dimitrios
%A Vazirgiannis, Michalis
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Visual Exploration of Collaboration Networks based on Graph Degeneracy : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-56F1-D
%F EDOC: 647474
%R 10.1145/2339530.2339768
%U http://dl.acm.org/citation.cfm?doid=2339530.2339768
%F OTHER: Local-ID: C1256DBF005F876D-8613ABA2EF772908C1257A630043693E-Giatsidis2012
%D 2012
%B 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
%Z date of event: 2012-08-12 - 2012-08-16
%C Beijing, China
%B KDD 2012
%P 1512 - 1515
%I ACM
%@ 978-1-4503-1462-6

Conference paper

IMPR-CSD5

J. Göbölös-Szabó, N. Prytkova, M. Spaniol, and G. Weikum

“Cross-lingual Data Quality for Knowledge Base Acceleration Across Wikipedia Editions,” in QDB 2012, Istanbul, Turkey, 2012.

mehr

BibTeX

@inproceedings{GPSW12,
TITLE = {Cross-lingual Data Quality for Knowledge Base Acceleration Across Wikipedia Editions},
AUTHOR = {G{\"o}b{\"o}l{\"o}s-Szab{\'o}, Julianna and Prytkova, Natalia and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://www.purdue.edu/discoverypark/cyber/qdb2012/papers/6MultiWOC.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-FC49DEF8BC4E6EFCC1257AD8004FFCBE-GPSW12},
PUBLISHER = {Purdue University},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {QDB 2012},
PAGES = {1--7},
ADDRESS = {Istanbul, Turkey},
}

Endnote

%0 Conference Proceedings
%A G&#246;b&#246;l&#246;s-Szab&#243;, Julianna
%A Prytkova, Natalia
%A Spaniol, Marc
%A Weikum, Gerhard
%+ External Organizations
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Cross-lingual Data Quality for Knowledge Base Acceleration Across Wikipedia Editions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5F98-2
%F EDOC: 647520
%U http://www.purdue.edu/discoverypark/cyber/qdb2012/papers/6MultiWOC.pdf
%F OTHER: Local-ID: C1256DBF005F876D-FC49DEF8BC4E6EFCC1257AD8004FFCBE-GPSW12
%D 2012
%B 10th International Workshop on Quality in Databases in Conjunction with VLDB
%Z date of event: 2012-08-27 - 2012-08-27
%C Istanbul, Turkey
%B QDB 2012
%P 1 - 7
%I Purdue University

Article

P. Haghani, S. Michel, and K. Aberer

“Efficient Monitoring of Personalized Hot News Over Web 2.0 Streams,” Computer Science - Research and Development, vol. 27, no. 1, 2012.

mehr

BibTeX

@article{Haghani2012,
TITLE = {Efficient Monitoring of Personalized Hot News Over {Web 2.0} Streams},
AUTHOR = {Haghani, Parisa and Michel, Sebastian and Aberer, Karl},
LANGUAGE = {eng},
ISSN = {1865-2034},
URL = {http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s00450-011-0178-9},
DOI = {10.1007/s00450-011-0178-9},
LOCALID = {Local-ID: C1256DBF005F876D-C44B7CE5A54233E5C125798F00285C31-Haghani2012},
PUBLISHER = {Springer},
ADDRESS = {New York, NY},
YEAR = {2012},
DATE = {2012},
JOURNAL = {Computer Science -- Research and Development},
VOLUME = {27},
NUMBER = {1},
PAGES = {81--92},
}

Endnote

%0 Journal Article
%A Haghani, Parisa
%A Michel, Sebastian
%A Aberer, Karl
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Efficient Monitoring of Personalized Hot News Over Web 2.0 Streams : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5A5D-0
%F EDOC: 647460
%R 10.1007/s00450-011-0178-9
%U http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s00450-011-0178-9
%F OTHER: Local-ID: C1256DBF005F876D-C44B7CE5A54233E5C125798F00285C31-Haghani2012
%D 2012
%* Review method: peer-reviewed
%J Computer Science - Research and Development
%V 27
%N 1
%& 81
%P 81 - 92
%I Springer
%C New York, NY
%@ false

Conference paper

A. Harth, K. Hose, and R. Schenkel

“Database Techniques for Linked Data Management,” in SIGMOD 2012, Scottsdale, USA, 2012.

mehr

Abstract

Linked Data refers to data published in accordance with a number of principles

rooted in web standards. In the past few years we have witnessed a tremendous

growth in Linked Data publishing on the web, leading to tens of billions of

data items published online. Querying the data is a key functionality required

to make use of the wealth of rich interlinked data. The goal of the tutorial is

to introduce, motivate, and detail techniques for querying heterogeneous

structured data from across the web. Our tutorial aims to introduce database

researchers and practitioners to the new publishing paradigm on the web, and

show how the abundance of data published as Linked Data can serve as fertile

ground for database research and experimentation. As such, the tutorial focuses

on applying database techniques to processing Linked Data, such as optimized

indexing and query processing methods in the centralized setting as well as

distributed approaches for querying. At the same time, we make the connection

from Linked Data best practices to established technologies in distributed

databases and the concept of Dataspaces and show differences as well as

commonalities between the fields.

BibTeX

@inproceedings{HarthHS2012,
TITLE = {Database Techniques for Linked Data Management},
AUTHOR = {Harth, Andreas and Hose, Katja and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1-4503-1247-9},
URL = {http://dx.doi.org/10.1145/2213836.2213909},
DOI = {10.1145/2213836.2213909},
LOCALID = {Local-ID: C1256DBF005F876D-5BAB795A662D21C7C12579BF004276F2-HarthHS2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Linked Data refers to data published in accordance with a number of principles rooted in web standards. In the past few years we have witnessed a tremendous growth in Linked Data publishing on the web, leading to tens of billions of data items published online. Querying the data is a key functionality required to make use of the wealth of rich interlinked data. The goal of the tutorial is to introduce, motivate, and detail techniques for querying heterogeneous structured data from across the web. Our tutorial aims to introduce database researchers and practitioners to the new publishing paradigm on the web, and show how the abundance of data published as Linked Data can serve as fertile ground for database research and experimentation. As such, the tutorial focuses on applying database techniques to processing Linked Data, such as optimized indexing and query processing methods in the centralized setting as well as distributed approaches for querying. At the same time, we make the connection from Linked Data best practices to established technologies in distributed databases and the concept of Dataspaces and show differences as well as commonalities between the fields.},
BOOKTITLE = {SIGMOD 2012},
PAGES = {597--600},
ADDRESS = {Scottsdale, USA},
}

Endnote

%0 Conference Proceedings
%A Harth, Andreas
%A Hose, Katja
%A Schenkel, Ralf
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Database Techniques for Linked Data Management : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5F95-8
%F EDOC: 647505
%R 10.1145/2213836.2213909
%U http://dx.doi.org/10.1145/2213836.2213909
%F OTHER: Local-ID: C1256DBF005F876D-5BAB795A662D21C7C12579BF004276F2-HarthHS2012
%D 2012
%B ACM SIGMOD International Conference on Management of Data
%Z date of event: 2012-05-20 - 2012-05-24
%C Scottsdale, USA
%X Linked Data refers to data published in accordance with a number of principles 
rooted in web standards. In the past few years we have witnessed a tremendous 
growth in Linked Data publishing on the web, leading to tens of billions of 
data items published online. Querying the data is a key functionality required 
to make use of the wealth of rich interlinked data. The goal of the tutorial is 
to introduce, motivate, and detail techniques for querying heterogeneous 
structured data from across the web. Our tutorial aims to introduce database 
researchers and practitioners to the new publishing paradigm on the web, and 
show how the abundance of data published as Linked Data can serve as fertile 
ground for database research and experimentation. As such, the tutorial focuses 
on applying database techniques to processing Linked Data, such as optimized 
indexing and query processing methods in the centralized setting as well as 
distributed approaches for querying. At the same time, we make the connection 
from Linked Data best practices to established technologies in distributed 
databases and the concept of Dataspaces and show differences as well as 
commonalities between the fields.
%B SIGMOD 2012
%P 597 - 600
%I ACM
%@ 978-1-4503-1247-9

Conference paper

D5IMPR-CS

J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum

“KORE: Keyphrase Overlap Relatedness for Entity Disambiguation,” in CIKM’12, 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, USA, 2012.

mehr

Abstract

Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based

knowledge extraction. This paper focuses on disambiguating

names in a Web or text document by jointly mapping all names

onto semantically related entities registered in a knowledge base.

To this end, we have developed a novel notion of semantic

relatedness between two entities represented as sets

of weighted (multi-word) keyphrases, with consideration of

partially overlapping phrases. This measure improves

the quality of prior link-based models, and also eliminates the

need for (usually Wikipedia-centric) explicit interlinkage between entities.

Thus, our method is more versatile and can cope with long-tail and

newly emerging entities that have few or no links associated with them.

For efficiency, we have developed approximation techniques

based on min-hash sketches and locality-sensitive hashing.

Our experiments on semantic relatedness and on named entity

disambiguation demonstrate the superiority of our method

compared to state-of-the-art baselines.

BibTeX

@inproceedings{HoffartCIKM2012,
TITLE = {{KORE}: Keyphrase Overlap Relatedness for Entity Disambiguation},
AUTHOR = {Hoffart, Johannes and Seufert, Stephan and Nguyen, Dat Ba and Theobald, Martin and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1156-4},
URL = {http://doi.acm.org/10.1145/2396761.2396832},
DOI = {10.1145/2396761.2396832},
LOCALID = {Local-ID: C1256DBF005F876D-3DCFB5FA0199B58EC1257ABC0043DD61-HoffartCIKM2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.},
BOOKTITLE = {CIKM'12, 21st ACM International Conference on Information and Knowledge Management},
EDITOR = {Chen, Xue-Wen and Lebanon, Guy and Wang, Haixun and Zaki, Mohammed J.},
PAGES = {545--554},
ADDRESS = {Maui, Hawaii, USA},
}

Endnote

%0 Conference Proceedings
%A Hoffart, Johannes
%A Seufert, Stephan
%A Nguyen, Dat Ba
%A Theobald, Martin
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T KORE: Keyphrase Overlap Relatedness for Entity Disambiguation : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59A6-0
%F EDOC: 647479
%R 10.1145/2396761.2396832
%U http://doi.acm.org/10.1145/2396761.2396832
%F OTHER: Local-ID: C1256DBF005F876D-3DCFB5FA0199B58EC1257ABC0043DD61-HoffartCIKM2012
%D 2012
%B 21st ACM International Conference on Information and Knowledge Management
%Z date of event: 2012-10-29 - 2012-11-02
%C Maui, Hawaii, USA
%X Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based
knowledge extraction. This paper focuses on disambiguating
names in a Web or text document by jointly mapping all names
onto semantically related entities registered in a knowledge base.
To this end, we have developed a novel notion of semantic
relatedness between two entities represented as sets
of weighted (multi-word) keyphrases, with consideration of 
partially overlapping phrases. This measure improves
the quality of prior link-based models, and also eliminates the
need for (usually Wikipedia-centric) explicit interlinkage between entities.
Thus, our method is more versatile and can cope with long-tail and
newly emerging entities that have few or no links associated with them.
For efficiency, we have developed approximation techniques
based on min-hash sketches and locality-sensitive hashing.
Our experiments on semantic relatedness and on named entity
disambiguation demonstrate the superiority of our method
compared to state-of-the-art baselines.
%B CIKM'12
%E Chen, Xue-Wen; Lebanon, Guy; Wang, Haixun; Zaki, Mohammed J.
%P 545 - 554
%I ACM
%@ 978-1-4503-1156-4

Conference paper

K. Hose and R. Schenkel

“Towards Benefit-Based RDF Source Selection for SPARQL Queries,” in SWIM ’12, 4th International Workshop on Semantic Web Information Management, Scottsdale, AZ, 2012.

mehr

Abstract

The Linked Data cloud consists of a great variety of data provided by an

increasing number of sources. Selecting relevant sources is therefore a core

ingredient of efficient query processing. So far, this is either done with

additional indexes or by iteratively performing lookups for relevant URIs. None

of the existing methods takes additional aspects into account such as the

degree of overlap between the sources, resulting in unnecessary requests. In

this paper, we propose a sketch-based query routing strategy that takes source

overlap into account. The proposed strategy uses sketches and can be tuned

towards either retrieving as many results as possible for a given budget or

minimizing the number of requests necessary to retrieve all or a certain

fraction of the results. Our experiments show significant improvements over

state-of-the-art but overlap-ignorant methods for source selection.

BibTeX

@inproceedings{HoseSchenkel_SWIM2012,
TITLE = {Towards Benefit-Based {RDF} Source Selection for {SPARQL} Queries},
AUTHOR = {Hose, Katja and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1-4503-1446-6},
URL = {http://dx.doi.org/10.1145/2237867.2237869},
DOI = {10.1145/2237867.2237869},
LOCALID = {Local-ID: C1256DBF005F876D-DAAD136B50B0C0ECC12579E6004D6582-HoseSchenkel_SWIM2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {The Linked Data cloud consists of a great variety of data provided by an increasing number of sources. Selecting relevant sources is therefore a core ingredient of efficient query processing. So far, this is either done with additional indexes or by iteratively performing lookups for relevant URIs. None of the existing methods takes additional aspects into account such as the degree of overlap between the sources, resulting in unnecessary requests. In this paper, we propose a sketch-based query routing strategy that takes source overlap into account. The proposed strategy uses sketches and can be tuned towards either retrieving as many results as possible for a given budget or minimizing the number of requests necessary to retrieve all or a certain fraction of the results. Our experiments show significant improvements over state-of-the-art but overlap-ignorant methods for source selection.},
BOOKTITLE = {SWIM '12, 4th International Workshop on Semantic Web Information Management},
PAGES = {2:1--2:8},
ADDRESS = {Scottsdale, AZ},
}

Endnote

%0 Conference Proceedings
%A Hose, Katja
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Towards Benefit-Based RDF Source Selection for SPARQL Queries : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5754-5
%F EDOC: 647457
%R 10.1145/2237867.2237869
%U http://dx.doi.org/10.1145/2237867.2237869
%F OTHER: Local-ID: C1256DBF005F876D-DAAD136B50B0C0ECC12579E6004D6582-HoseSchenkel_SWIM2012
%D 2012
%B 4th International Workshop on Semantic Web Information Management
%Z date of event: 2012-05-20 - 2012-05-20
%C Scottsdale, AZ
%X The Linked Data cloud consists of a great variety of data provided by an 
increasing number of sources. Selecting relevant sources is therefore a core 
ingredient of efficient query processing. So far, this is either done with 
additional indexes or by iteratively performing lookups for relevant URIs. None 
of the existing methods takes additional aspects into account such as the 
degree of overlap between the sources, resulting in unnecessary requests. In 
this paper, we propose a sketch-based query routing strategy that takes source 
overlap into account. The proposed strategy uses sketches and can be tuned 
towards either retrieving as many results as possible for a given budget or 
minimizing the number of requests necessary to retrieve all or a certain 
fraction of the results. Our experiments show significant improvements over 
state-of-the-art but overlap-ignorant methods for source selection.
%B SWIM '12
%P 2:1 - 2:8
%I ACM
%@ 978-1-4503-1446-6

Article

K. Hose and A. Vlachou

“A Survey of Skyline Processing in Highly Distributed Environments,” The VLDB Journal, vol. 21, no. 3, 2012.

mehr

BibTeX

@article{HosVla12,
TITLE = {A Survey of Skyline Processing in Highly Distributed Environments},
AUTHOR = {Hose, Katja and Vlachou, Akrivi},
LANGUAGE = {eng},
ISSN = {1066-8888},
URL = {http://www.springerlink.com/content/117542787l20h533/},
DOI = {10.1007/s00778-011-0246-6},
LOCALID = {Local-ID: C1256DBF005F876D-4FDD37E24C072A17C12578C2004ABB39-HosVla12},
PUBLISHER = {Springer},
ADDRESS = {Berlin},
YEAR = {2012},
DATE = {2012},
JOURNAL = {The VLDB Journal},
VOLUME = {21},
NUMBER = {3},
PAGES = {359--384},
}

Endnote

%0 Journal Article
%A Hose, Katja
%A Vlachou, Akrivi
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T A Survey of Skyline Processing in Highly Distributed Environments : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FB7-B
%F EDOC: 647507
%R 10.1007/s00778-011-0246-6
%U http://www.springerlink.com/content/117542787l20h533/
%F OTHER: Local-ID: C1256DBF005F876D-4FDD37E24C072A17C12578C2004ABB39-HosVla12
%7 2012
%D 2012
%* Review method: peer-reviewed
%J The VLDB Journal
%V 21
%N 3
%& 359
%P 359 - 384
%I Springer
%C Berlin
%@ false

Conference paper

K. Hose and A. Vlachou

“Distributed Skyline Processing: a Trend in Database Research Still Going Strong,” in Advances in Database Technology - EDBT 2012, Scottsdale, Arizona, USA, 2012.

mehr

BibTeX

@inproceedings{HoseEDBT2012,
TITLE = {Distributed Skyline Processing: a Trend in Database Research Still Going Strong},
AUTHOR = {Hose, Katja and Vlachou, Akrivi},
LANGUAGE = {eng},
ISBN = {978-1-4503-0790-1},
URL = {http://doi.acm.org/10.1145/2247596.2247665},
DOI = {10.1145/2247596.2247665},
LOCALID = {Local-ID: C1256DBF005F876D-4EF755740179C6C8C12579E6005EABDB-HoseEDBT2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Advances in Database Technology -- EDBT 2012},
EDITOR = {Rundensteiner, Elke and Markl, Volker and Manolescu, Ioana and Amer-Yahia, Sihem and Naumann, Felix and Ari, Ismail},
PAGES = {558--561},
ADDRESS = {Scottsdale, Arizona, USA},
}

Endnote

%0 Conference Proceedings
%A Hose, Katja
%A Vlachou, Akrivi
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Distributed Skyline Processing: a Trend in Database Research Still Going Strong : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5DFD-0
%F EDOC: 647456
%R 10.1145/2247596.2247665
%U http://doi.acm.org/10.1145/2247596.2247665
%F OTHER: Local-ID: C1256DBF005F876D-4EF755740179C6C8C12579E6005EABDB-HoseEDBT2012
%D 2012
%B 15th International Conference on Extending Database Technology
%Z date of event: 2012-04-20 - 2012-04-20
%C Scottsdale, Arizona, USA
%B Advances in Database Technology - EDBT 2012
%E Rundensteiner, Elke; Markl, Volker; Manolescu, Ioana; Amer-Yahia, Sihem; Naumann, Felix; Ari, Ismail
%P 558 - 561
%I ACM
%@ 978-1-4503-0790-1

Conference paper

D5IMPR-CS

R. Jäkel, S. Metzger, J. M. Daivandy, K. Hose, D. Hünich, R. Schenkel, and B. Schuller

“Interactive Information Extraction based on Distributed Data Management for German Grid Projects,” in EGI Community Forum 2012 / EMI Second Technical Conference (EGICF12-EMITC2 2012), Munich, Germany, 2012.

mehr

Abstract

The current infrastructure proviced and maintained by the German Grid Initiative (D-Grid) primarily covers resource management and exchange at the

data level supporting mainly technical resources such as computational

capacity, data transport networks, storage resources, and management software.

The WisNetGrid project (www.wisnetgrid.org) aims to broaden the focus of

resource sharing towards the actual content, such as research and production

data, to enable interdisciplinary usage. To achieve this goal, resource sharing

is supported on different abstraction layers. First, we create an information

layer by providing a universal interface to access data on the grid independent

of the underlying grid storage system. Second, at the knowledge layer, we offer

interactive knowledge extraction and management tools that can also take

advantage of a community’s grid resources. These tools enable the user to

formulate the domain specific knowledge in different ways to ease the

interaction with the knowledge extraction process and to provide input for

automatic extraction workflow. Within this project, we work together with use

groups from the humanities and from landscaping as disparate use cases to

evaluate which advantages can be gained by using semi-automatic extraction

tools to gather and manage knowledge content.

BibTeX

@inproceedings{EGI2012,
TITLE = {Interactive Information Extraction based on Distributed Data Management for German Grid Projects},
AUTHOR = {J{\"a}kel, Ren{\'e} and Metzger, Steffen and Daivandy, Jason Milad and Hose, Katja and H{\"u}nich, Dennis and Schenkel, Ralf and Schuller, Bernd},
LANGUAGE = {eng},
URL = {http://pos.sissa.it/archive/conferences/162/031/EGICF12-EMITC2_031.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-F5608AD15B5A9CE3C1257AD1003647FA-EGI2012},
PUBLISHER = {Proceedings of Science},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {The current infrastructure proviced and maintained by the German Grid Initiative (D-Grid) primarily covers resource management and exchange at the data level supporting mainly technical resources such as computational capacity, data transport networks, storage resources, and management software. The WisNetGrid project (www.wisnetgrid.org) aims to broaden the focus of resource sharing towards the actual content, such as research and production data, to enable interdisciplinary usage. To achieve this goal, resource sharing is supported on different abstraction layers. First, we create an information layer by providing a universal interface to access data on the grid independent of the underlying grid storage system. Second, at the knowledge layer, we offer interactive knowledge extraction and management tools that can also take advantage of a community{\textquoteright}s grid resources. These tools enable the user to formulate the domain specific knowledge in different ways to ease the interaction with the knowledge extraction process and to provide input for automatic extraction workflow. Within this project, we work together with use groups from the humanities and from landscaping as disparate use cases to evaluate which advantages can be gained by using semi-automatic extraction tools to gather and manage knowledge content.},
BOOKTITLE = {EGI Community Forum 2012 / EMI Second Technical Conference (EGICF12-EMITC2 2012)},
PAGES = {1--10},
SERIES = {Proceedings of Science},
ADDRESS = {Munich, Germany},
}

Endnote

%0 Conference Proceedings
%A J&#228;kel, Ren&#233;
%A Metzger, Steffen
%A Daivandy, Jason Milad
%A Hose, Katja
%A H&#252;nich, Dennis
%A Schenkel, Ralf
%A Schuller, Bernd
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Interactive Information Extraction based on Distributed Data Management for German Grid Projects : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59C2-0
%F EDOC: 647515
%U http://pos.sissa.it/archive/conferences/162/031/EGICF12-EMITC2_031.pdf
%F OTHER: Local-ID: C1256DBF005F876D-F5608AD15B5A9CE3C1257AD1003647FA-EGI2012
%D 2012
%B EGI Community Forum 2012 / EMI Second Technical Conference
%Z date of event: 2012-03-26 - 2012-03-30
%C Munich, Germany
%X The current infrastructure proviced and maintained by the German Grid Initiative (D-Grid) primarily covers resource management and exchange at the 
data level supporting mainly technical resources such as computational 
capacity, data transport networks, storage resources, and management software. 
The WisNetGrid project (www.wisnetgrid.org) aims to broaden the focus of 
resource sharing towards the actual content, such as research and production 
data, to enable interdisciplinary usage. To achieve this goal, resource sharing 
is supported on different abstraction layers. First, we create an information 
layer by providing a universal interface to access data on the grid independent 
of the underlying grid storage system. Second, at the knowledge layer, we offer 
interactive knowledge extraction and management tools that can also take 
advantage of a community&#8217;s grid resources. These tools enable the user to 
formulate the domain specific knowledge in different ways to ease the 
interaction with the knowledge extraction process and to provide input for 
automatic extraction workflow. Within this project, we work together with use 
groups from the humanities and from landscaping as disparate use cases to 
evaluate which advantages can be gained by using semi-automatic extraction 
tools to gather and manage knowledge content.
%B EGI Community Forum 2012 / EMI Second Technical Conference
%P 1 - 10
%I Proceedings of Science
%B Proceedings of Science

Conference poster

N. Kanhabua, K. Berberich, and K. Norvag

“Learning to Predict a Time-aware Ranking Method,” SIGIR ’12, International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, New York, NY, 2012.

mehr

BibTeX

@inproceedings{Berberich2012a,
TITLE = {Learning to Predict a Time-aware Ranking Method},
AUTHOR = {Kanhabua, Nattiya and Berberich, Klaus and Norvag, Kjetil},
LANGUAGE = {eng},
ISBN = {978-1-4503-1658-3},
DOI = {10.1145/2348283.2348488},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {SIGIR '12, International ACM SIGIR Conference on Research \& Development in Information Retrieval},
PAGES = {1099--1100},
ADDRESS = {Portland, OR, USA},
}

Endnote

%0 Generic
%A Kanhabua, Nattiya
%A Berberich, Klaus
%A Norvag, Kjetil
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Learning to Predict a Time-aware Ranking Method : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5965-F
%F EDOC: 647508
%R 10.1145/2348283.2348488
%D 2012
%Z name of event: International ACM SIGIR Conference on Research & Development in Information Retrieval
%Z date of event: 2012-08-12 - 2012-08-16
%Z place of event: Portland, OR, USA
%B SIGIR '12
%P 1099 - 1100
%@ 978-1-4503-1658-3

Conference paper

D4D5

K. I. Kim, J. Tompkin, M. Theobald, J. Kautz, and C. Theobalt

“Match Graph Construction for Large Image Databases,” in Computer Vision - ECCV 2012, Florence, Italy, 2012.

mehr

BibTeX

@inproceedings{KimECCV2011,
TITLE = {Match Graph Construction for Large Image Databases},
AUTHOR = {Kim, Kwang In and Tompkin, James and Theobald, Martin and Kautz, Jan and Theobalt, Christian},
LANGUAGE = {eng},
ISBN = {978-3-642-33717-8},
DOI = {10.1007/978-3-642-33718-5_20},
LOCALID = {Local-ID: C1256DBF005F876D-E76B01A16794BBABC1257AA9006A5F28-KimECCV2011},
PUBLISHER = {Springer},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Computer Vision -- ECCV 2012},
EDITOR = {Fitzgibbon, Andrew W. and Lazebnik, Svetlana and Perona, Pietro and Sato, Yoichi and Schmid, Cordelia},
PAGES = {272--285},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {7572},
ADDRESS = {Florence, Italy},
}

Endnote

%0 Conference Proceedings
%A Kim, Kwang In
%A Tompkin, James
%A Theobald, Martin
%A Kautz, Jan
%A Theobalt, Christian
%+ Computer Graphics, MPI for Informatics, Max Planck Society
Computer Graphics, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Computer Graphics, MPI for Informatics, Max Planck Society
Computer Graphics, MPI for Informatics, Max Planck Society
%T Match Graph Construction for Large Image Databases : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FBE-E
%F EDOC: 647376
%R 10.1007/978-3-642-33718-5_20
%F OTHER: Local-ID: C1256DBF005F876D-E76B01A16794BBABC1257AA9006A5F28-KimECCV2011
%D 2012
%B 12th European Conference on Computer Vision
%Z date of event: 2012-10-07 - 2012-10-13
%C Florence, Italy
%B Computer Vision - ECCV 2012
%E Fitzgibbon, Andrew W.; Lazebnik, Svetlana; Perona, Pietro; Sato, Yoichi; Schmid, Cordelia
%P 272 - 285
%I Springer
%@ 978-3-642-33717-8
%B Lecture Notes in Computer Science
%N 7572

Thesis

M. Kumar

“Updates for Top-K Queries for Versioned XML Data,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{ManishKumar2012,
TITLE = {Updates for Top-K Queries for Versioned {XML} Data},
AUTHOR = {Kumar, Manish},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-F112C48699797374C1257AAD003658D1-ManishKumar2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Kumar, Manish
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Updates for Top-K Queries for Versioned XML Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-6272-4
%F EDOC: 647477
%F OTHER: Local-ID: C1256DBF005F876D-F112C48699797374C1257AAD003658D1-ManishKumar2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Conference paper

IMPR-CSD5

E. Kuzey and G. Weikum

“Extraction of Temporal Facts and Events from Wikipedia,” in TempWeb 2012, 2nd Temporal Web Analytics Workshop, Lyon, France, 2012.

mehr

Abstract

Recently, large-scale knowledge bases have been constructed

by automatically extracting relational facts from text. Unfortunately,

most of the current knowledge bases focus on

static facts and ignore the temporal dimension. However,

the vast majority of facts are evolving with time or are valid

only during a particular time period. Thus, time is a signifi-

cant dimension that should be included in knowledge bases.

In this paper, we introduce a complete information extraction

framework that harvests temporal facts and events from

semi-structured data and free text of Wikipedia articles to

create a temporal ontology. First, we extend a temporal data

representation model by making it aware of events. Second,

we develop an information extraction method which harvests

temporal facts and events from Wikipedia infoboxes,

categories, lists, and article titles in order to build a temporal

knowledge base. Third, we show how the system can use

its extracted knowledge for further growing the knowledge

base.

We demonstrate the effectiveness of our proposed methods

through several experiments. We extracted more than one

million temporal facts with precision over 90\% for extraction

from semi-structured data and almost 70\% for extraction

from text.

BibTeX

@inproceedings{Kuzey2012,
TITLE = {Extraction of Temporal Facts and Events from Wikipedia},
AUTHOR = {Kuzey, Erdal and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1188-5},
URL = {http://www.mpi-inf.mpg.de/~ekuzey/papers/tempweb_erdal_kuzey.pdf},
DOI = {10.1145/2169095.2169101},
LOCALID = {Local-ID: C1256DBF005F876D-D67E3F65E4EB6F0BC1257AEF00612873-Kuzey2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Recently, large-scale knowledge bases have been constructed by automatically extracting relational facts from text. Unfortunately, most of the current knowledge bases focus on static facts and ignore the temporal dimension. However, the vast majority of facts are evolving with time or are valid only during a particular time period. Thus, time is a signifi- cant dimension that should be included in knowledge bases. In this paper, we introduce a complete information extraction framework that harvests temporal facts and events from semi-structured data and free text of Wikipedia articles to create a temporal ontology. First, we extend a temporal data representation model by making it aware of events. Second, we develop an information extraction method which harvests temporal facts and events from Wikipedia infoboxes, categories, lists, and article titles in order to build a temporal knowledge base. Third, we show how the system can use its extracted knowledge for further growing the knowledge base. We demonstrate the effectiveness of our proposed methods through several experiments. We extracted more than one million temporal facts with precision over 90\% for extraction from semi-structured data and almost 70\% for extraction from text.},
BOOKTITLE = {TempWeb 2012, 2nd Temporal Web Analytics Workshop},
EDITOR = {Baeza-Yates, Ricardo and Masan{\`e}s, Julien and Spaniol, Marc},
PAGES = {25--32},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Kuzey, Erdal
%A Weikum, Gerhard
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Extraction of Temporal Facts and Events from Wikipedia : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5A30-1
%F EDOC: 647491
%R 10.1145/2169095.2169101
%U http://www.mpi-inf.mpg.de/~ekuzey/papers/tempweb_erdal_kuzey.pdf
%F OTHER: Local-ID: C1256DBF005F876D-D67E3F65E4EB6F0BC1257AEF00612873-Kuzey2012
%D 2012
%B 2nd Temporal Web Analytics Workshop
%Z date of event: 2012-04-17 - 2012-04-17
%C Lyon, France
%X Recently, large-scale knowledge bases have been constructed
by automatically extracting relational facts from text. Unfortunately,
most of the current knowledge bases focus on
static facts and ignore the temporal dimension. However,
the vast majority of facts are evolving with time or are valid
only during a particular time period. Thus, time is a signifi-
cant dimension that should be included in knowledge bases.
In this paper, we introduce a complete information extraction
framework that harvests temporal facts and events from
semi-structured data and free text of Wikipedia articles to
create a temporal ontology. First, we extend a temporal data
representation model by making it aware of events. Second,
we develop an information extraction method which harvests
temporal facts and events from Wikipedia infoboxes,
categories, lists, and article titles in order to build a temporal
knowledge base. Third, we show how the system can use
its extracted knowledge for further growing the knowledge
base.
We demonstrate the effectiveness of our proposed methods
through several experiments. We extracted more than one
million temporal facts with precision over 90\% for extraction
from semi-structured data and almost 70\% for extraction
from text.
%B TempWeb 2012
%E Baeza-Yates, Ricardo; Masan&#232;s, Julien; Spaniol, Marc
%P 25 - 32
%I ACM
%@ 978-1-4503-1188-5

Conference paper

S. Metzger, K. Hose, and R. Schenkel

“Colledge - A Vision of Collaborative Knowledge Networks,” in 2nd International Workshop on Semantic Search over the Web (SSW 2012), Istanbul, Turkey, 2012.

mehr

Abstract

More and more semantic information has become available as RDF data recently,

with the linked open data cloud as a prominent example. However, participating

in the Semantic Web is cumbersome. Typically several steps are involved in

using semantic knowledge. Information is first acquired, e.g. by information

extraction, crowd sourcing or human experts. Then ontologies are published and

distributed. Users may apply reasoning and otherwise modify their local

ontology instances.

However, currently these steps are treated separately and although each

involves human effort, nearly no synergy effect is used and it is also mostly a

one way process, e.g. user feedback hardly flows back into the main ontology

version. Similarly, user cooperation is low.

While there are approaches alleviating some of these limitations,

e.g. extracting information at query time, personalizing queries, and

integration of user feedback, this work combines all the pieces envisioning a

social knowledge network that enables collaborative knowledge generation and

exchange. Each aforementioned step is seen as a particular implementation

of a network node responding to knowledge queries in its own way, e.g. by

extracting it, applying reasoning or asking users,

and learning from knowledge exchanged with neighbours.

Original knowledge as well as user feedback is distributed over the network

based on similar trust and provenance mechanisms.

The extended query language we call for also allows for

personalization.

BibTeX

@inproceedings{MetzgerHS_SSW2012,
TITLE = {Colledge -- A Vision of Collaborative Knowledge Networks},
AUTHOR = {Metzger, Steffen and Hose, Katja and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1-4503-2301-7},
DOI = {10.1145/2494068.2494069},
LOCALID = {Local-ID: B6F7A7E019D22A82C1257A1C00124400-MetzgerHS_SSW2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {More and more semantic information has become available as RDF data recently, with the linked open data cloud as a prominent example. However, participating in the Semantic Web is cumbersome. Typically several steps are involved in using semantic knowledge. Information is first acquired, e.g. by information extraction, crowd sourcing or human experts. Then ontologies are published and distributed. Users may apply reasoning and otherwise modify their local ontology instances. However, currently these steps are treated separately and although each involves human effort, nearly no synergy effect is used and it is also mostly a one way process, e.g. user feedback hardly flows back into the main ontology version. Similarly, user cooperation is low. While there are approaches alleviating some of these limitations, e.g. extracting information at query time, personalizing queries, and integration of user feedback, this work combines all the pieces envisioning a social knowledge network that enables collaborative knowledge generation and exchange. Each aforementioned step is seen as a particular implementation of a network node responding to knowledge queries in its own way, e.g. by extracting it, applying reasoning or asking users, and learning from knowledge exchanged with neighbours. Original knowledge as well as user feedback is distributed over the network based on similar trust and provenance mechanisms. The extended query language we call for also allows for personalization.},
BOOKTITLE = {2nd International Workshop on Semantic Search over the Web (SSW 2012)},
ADDRESS = {Istanbul, Turkey},
}

Endnote

%0 Conference Proceedings
%A Metzger, Steffen
%A Hose, Katja
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Colledge - A Vision of Collaborative Knowledge Networks : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0015-1CD7-A
%R 10.1145/2494068.2494069
%F OTHER: Local-ID: B6F7A7E019D22A82C1257A1C00124400-MetzgerHS_SSW2012
%D 2012
%B 2nd International Workshop on Semantic Search over the Web
%Z date of event: 2012-08-27 - 2012-08-27
%C Istanbul, Turkey
%X More and more semantic information has become available as RDF data recently, 
with the linked open data cloud as a prominent example. However, participating 
in the Semantic Web is cumbersome. Typically several steps are involved in 
using semantic knowledge. Information is first acquired, e.g. by information 
extraction, crowd sourcing or human experts. Then ontologies are published and 
distributed. Users may apply reasoning and otherwise modify their local 
ontology instances.
However, currently these steps are treated separately and although each 
involves human effort, nearly no synergy effect is used and it is also mostly a 
one way process, e.g. user feedback hardly flows back into the main ontology 
version. Similarly, user cooperation is low.

While there are approaches alleviating some of these limitations,
e.g. extracting information at query time, personalizing queries, and 
integration of user feedback, this work combines all the pieces envisioning a 
social knowledge network that enables collaborative knowledge generation and 
exchange. Each aforementioned step is seen as a particular implementation 
of a network node responding to knowledge queries in its own way, e.g. by 
extracting it, applying reasoning or asking users, 
and learning from knowledge exchanged with neighbours.
Original knowledge as well as user feedback is distributed over the network 
based on similar trust and provenance mechanisms.
The extended query language we call for also allows for
personalization.
%B 2nd International Workshop on Semantic Search over the Web
%I ACM
%@ 978-1-4503-2301-7

Conference paper

D5IMPR-CS

S. Metzger, M. Stoll, K. Hose, and R. Schenkel

“LUKe and MIKe: Learning from User Knowledge and Managing Interactive Knowledge Extraction,” in CIKM’12, 21st ACM International Conference on Information and Knowledge Management, Maui, USA, 2012.

mehr

Abstract

Semantic recognition and annotation of unqiue enities and their relations is a

key in understanding the essence contained in large text corpora. It typically

requires a combination of efficient automatic methods and manual verification.

Usually, both parts are seen as consecutive steps. In this demo we present

MIKE, a user interface enabling the integration of user feedback into an

iterative extraction process. We show how an extraction system can directly

learn from such integrated user supervision. In general, this setup allows for

stepwise training of the extraction system to a particular domain, while using

user feedback early in the iterative extraction process improves extraction

quality and reduces the overall human effort needed.

BibTeX

@inproceedings{MetzgerSHS_CIKM2012,
TITLE = {{LUKe} and {MIKe}: Learning from User Knowledge and Managing Interactive Knowledge Extraction},
AUTHOR = {Metzger, Steffen and Stoll, Michael and Hose, Katja and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-1-4503-1156-4},
URL = {http://doi.acm.org/10.1145/2396761.2398721},
DOI = {10.1145/2396761.2398721},
LOCALID = {Local-ID: C1256DBF005F876D-B1BE320040B32699C1257A5200325F1E-MetzgerSHS_CIKM2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Semantic recognition and annotation of unqiue enities and their relations is a key in understanding the essence contained in large text corpora. It typically requires a combination of efficient automatic methods and manual verification. Usually, both parts are seen as consecutive steps. In this demo we present MIKE, a user interface enabling the integration of user feedback into an iterative extraction process. We show how an extraction system can directly learn from such integrated user supervision. In general, this setup allows for stepwise training of the extraction system to a particular domain, while using user feedback early in the iterative extraction process improves extraction quality and reduces the overall human effort needed.},
BOOKTITLE = {CIKM'12, 21st ACM International Conference on Information and Knowledge Management},
EDITOR = {Chen, Xue-Wen and Lebanon, Guy and Wang, Haixun and Zaki, Mohammed J.},
PAGES = {2671--2673},
ADDRESS = {Maui, USA},
}

Endnote

%0 Conference Proceedings
%A Metzger, Steffen
%A Stoll, Michael
%A Hose, Katja
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T LUKe and MIKe: Learning from User Knowledge and Managing Interactive Knowledge Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5960-A
%F EDOC: 647511
%R 10.1145/2396761.2398721
%U http://doi.acm.org/10.1145/2396761.2398721
%F OTHER: Local-ID: C1256DBF005F876D-B1BE320040B32699C1257A5200325F1E-MetzgerSHS_CIKM2012
%D 2012
%B 21st ACM International Conference on Information and Knowledge Management
%Z date of event: 2012-10-29 - 2012-11-02
%C Maui, USA
%X Semantic recognition and annotation of unqiue enities and their relations is a 
key in understanding the essence contained in large text corpora. It typically 
requires a combination of efficient automatic methods and manual verification. 
Usually, both parts are seen as consecutive steps. In this demo we present 
MIKE, a user interface enabling the integration of user feedback into an 
iterative extraction process. We show how an extraction system can directly 
learn from such integrated user supervision. In general, this setup allows for 
stepwise training of the extraction system to a particular domain, while using 
user feedback early in the iterative extraction process improves extraction 
quality and reduces the overall human effort needed.
%B CIKM'12
%E Chen, Xue-Wen; Lebanon, Guy; Wang, Haixun; Zaki, Mohammed J.
%P 2671 - 2673
%I ACM
%@ 978-1-4503-1156-4

Report

P. Miettinen and J. Vreeken

“MDL4BMF: Minimum Description Length for Boolean Matrix Factorization,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2012-5-001, 2012.

mehr

Abstract

Matrix factorizations—where a given data matrix is approximated by a prod- uct of two or more factor matrices—are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the ‘model order selection problem’ of determining where fine-grained structure stops, and noise starts, i.e., what is the proper size of the factor matrices.

Boolean matrix factorization (BMF)—where data, factors, and matrix product are Boolean—has received increased attention from the data mining community in recent years. The technique has desirable properties, such as high interpretability and natural sparsity. However, so far no method for selecting the correct model order for BMF has been available. In this paper we propose to use the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits, e.g., it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate.

We formulate the description length function for BMF in general—making it applicable for any BMF algorithm. We discuss how to construct an appropriate encoding, starting from a simple and intuitive approach, we arrive at a highly efficient data-to-model based encoding for BMF. We extend an existing algorithm for BMF to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior.

BibTeX

@techreport{MiettinenVreeken,
TITLE = {{MDL4BMF}: Minimum Description Length for Boolean Matrix Factorization},
AUTHOR = {Miettinen, Pauli and Vreeken, Jilles},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2012-5-001},
INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
ABSTRACT = {Matrix factorizations---where a given data matrix is approximated by a prod- uct of two or more factor matrices---are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the {\textquoteleft}model order selection problem{\textquoteright} of determining where fine-grained structure stops, and noise starts, i.e., what is the proper size of the factor matrices. Boolean matrix factorization (BMF)---where data, factors, and matrix product are Boolean---has received increased attention from the data mining community in recent years. The technique has desirable properties, such as high interpretability and natural sparsity. However, so far no method for selecting the correct model order for BMF has been available. In this paper we propose to use the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits, e.g., it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate. We formulate the description length function for BMF in general---making it applicable for any BMF algorithm. We discuss how to construct an appropriate encoding, starting from a simple and intuitive approach, we arrive at a highly efficient data-to-model based encoding for BMF. We extend an existing algorithm for BMF to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior.},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Miettinen, Pauli
%A Vreeken, Jilles
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T MDL4BMF: Minimum Description Length for Boolean Matrix Factorization :
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-0422-E
%Y Max-Planck-Institut f&#252;r Informatik
%C Saarbr&#252;cken
%D 2012
%P 48 p.
%X Matrix factorizations&#8212;where a given data matrix is approximated by a prod- uct of two or more factor matrices&#8212;are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the &#8216;model order selection problem&#8217; of determining where fine-grained structure stops, and noise starts, i.e., what is the proper size of the factor matrices.

Boolean matrix factorization (BMF)&#8212;where data, factors, and matrix product are Boolean&#8212;has received increased attention from the data mining community in recent years. The technique has desirable properties, such as high interpretability and natural sparsity. However, so far no method for selecting the correct model order for BMF has been available. In this paper we propose to use the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits, e.g., it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate.

We formulate the description length function for BMF in general&#8212;making it applicable for any BMF algorithm. We discuss how to construct an appropriate encoding, starting from a simple and intuitive approach, we arrive at a highly efficient data-to-model based encoding for BMF. We extend an existing algorithm for BMF to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior.
%B Research Report
%@ false

Conference paper

P. Miettinen

“Dynamic Boolean Matrix Factorizations,” in Proceedings of the 12th IEEE International Conference on Data Mining (ICDM 2012), Brussels, Belgium, 2012.

mehr

Abstract

Boolean matrix factorization is a method to de- compose a binary matrix into

two binary factor matrices. Akin to other matrix factorizations, the factor

matrices can be used for various data analysis tasks. Many (if not most)

real-world data sets are dynamic, though, meaning that new information is

recorded over time. Incorporating this new information into the factorization

can require a re-computation of the factorization

– something we cannot do if we want to keep our factorization up-to-date after

each update.

This paper proposes a method to dynamically update the Boolean matrix

factorization when new data is added to the data base. This method is extended

with a mechanism to improve the factorization with a trade-off in speed of

computation. The method is tested with a number of real-world and synthetic

data sets including studying its efficiency against off-line methods. The

results show that with good initialization the proposed online and dynamic

methods can beat the state- of-the-art offline Boolean matrix factorization

algorithms.

BibTeX

@inproceedings{miettinen12dynamic,
TITLE = {Dynamic {Boolean} Matrix Factorizations},
AUTHOR = {Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-0-7695-4905-7},
DOI = {10.1109/ICDM.2012.118},
LOCALID = {Local-ID: C1256DBF005F876D-1E98F38844178811C1257AF3004454A6-miettinen12dynamic},
PUBLISHER = {IEEE Computer Society},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Boolean matrix factorization is a method to de- compose a binary matrix into two binary factor matrices. Akin to other matrix factorizations, the factor matrices can be used for various data analysis tasks. Many (if not most) real-world data sets are dynamic, though, meaning that new information is recorded over time. Incorporating this new information into the factorization can require a re-computation of the factorization -- something we cannot do if we want to keep our factorization up-to-date after each update. This paper proposes a method to dynamically update the Boolean matrix factorization when new data is added to the data base. This method is extended with a mechanism to improve the factorization with a trade-off in speed of computation. The method is tested with a number of real-world and synthetic data sets including studying its efficiency against off-line methods. The results show that with good initialization the proposed online and dynamic methods can beat the state- of-the-art offline Boolean matrix factorization algorithms.},
BOOKTITLE = {Proceedings of the 12th IEEE International Conference on Data Mining (ICDM 2012)},
EDITOR = {Zaki, Mohammed J. and Siebes, Arno and Yu, Jeffrey Xu and Goethals, Bart and Webb, Geoff and Wu, Xindong},
PAGES = {519--528},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Dynamic Boolean Matrix Factorizations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5DFA-5
%F EDOC: 647535
%R 10.1109/ICDM.2012.118
%F OTHER: Local-ID: C1256DBF005F876D-1E98F38844178811C1257AF3004454A6-miettinen12dynamic
%D 2012
%B 12th IEEE International Conference on Data Mining
%Z date of event: 2012-12-10 - 2012-12-13
%C Brussels, Belgium
%X Boolean matrix factorization is a method to de- compose a binary matrix into 
two binary factor matrices. Akin to other matrix factorizations, the factor 
matrices can be used for various data analysis tasks. Many (if not most) 
real-world data sets are dynamic, though, meaning that new information is 
recorded over time. Incorporating this new information into the factorization 
can require a re-computation of the factorization
&#8211; something we cannot do if we want to keep our factorization up-to-date after 
each update.

This paper proposes a method to dynamically update the Boolean matrix 
factorization when new data is added to the data base. This method is extended 
with a mechanism to improve the factorization with a trade-off in speed of 
computation. The method is tested with a number of real-world and synthetic 
data sets including studying its efficiency against off-line methods. The 
results show that with good initialization the proposed online and dynamic 
methods can beat the state- of-the-art offline Boolean matrix factorization 
algorithms.
%B Proceedings of the 12th IEEE International Conference on Data Mining
%E Zaki, Mohammed J.; Siebes, Arno; Yu, Jeffrey Xu; Goethals, Bart; Webb, Geoff; Wu, Xindong
%P 519 - 528
%I IEEE Computer Society
%@ 978-0-7695-4905-7

Conference paper

P. Miettinen

“On Finding Joint Subspace Boolean Matrix Factorizations,” in Proceedings of the Twelfth SIAM International Conference on Data Mining (SDM 2012), Anaheim, CA, USA, 2012.

mehr

Abstract

Finding latent factors of the data using matrix factorizations is a

tried-and-tested approach in data mining. But finding shared factors over

multiple matrices is more novel problem. Specifically, given two matrices, we

want to find a set of factors shared by these two matrices and sets of factors

specific for the matrices. Not only does such decomposition reveal what is

common between the two matrices, it also eliminates the need of explaining that

common part twice, thus concentrating the non-shared factors to uniquely

specific parts of the data. This paper studies a problem called Joint Subspace

Boolean Matrix Factorization asking exactly that: a set of shared factors and

sets of specific factors. Furthermore, the matrix factorization is based on the

Boolean arithmetic. This restricts the presented approach suitable to only

binary matrices. The benefits, however, include much sparser factor matrices

and greater interpretability of the results. The paper presents three

algorithms for finding the Joint Subspace Boolean Matrix Factorization, an MDL-

based method for selecting the subspaces’ dimensionality, and throughout

experimental evaluation of the proposed algorithms.

BibTeX

@inproceedings{miettinen12finding,
TITLE = {On Finding Joint Subspace Boolean Matrix Factorizations},
AUTHOR = {Miettinen, Pauli},
LANGUAGE = {eng},
ISBN = {978-1-61197-232-0},
URL = {http://siam.omnibooksonline.com/2012datamining/data/papers/205.pdf#page=1},
LOCALID = {Local-ID: C1256DBF005F876D-2B29D65C44BFCD9BC1257AF3003D99D9-miettinen12finding},
PUBLISHER = {SIAM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Finding latent factors of the data using matrix factorizations is a tried-and-tested approach in data mining. But finding shared factors over multiple matrices is more novel problem. Specifically, given two matrices, we want to find a set of factors shared by these two matrices and sets of factors specific for the matrices. Not only does such decomposition reveal what is common between the two matrices, it also eliminates the need of explaining that common part twice, thus concentrating the non-shared factors to uniquely specific parts of the data. This paper studies a problem called Joint Subspace Boolean Matrix Factorization asking exactly that: a set of shared factors and sets of specific factors. Furthermore, the matrix factorization is based on the Boolean arithmetic. This restricts the presented approach suitable to only binary matrices. The benefits, however, include much sparser factor matrices and greater interpretability of the results. The paper presents three algorithms for finding the Joint Subspace Boolean Matrix Factorization, an MDL- based method for selecting the subspaces{\textquoteright} dimensionality, and throughout experimental evaluation of the proposed algorithms.},
BOOKTITLE = {Proceedings of the Twelfth SIAM International Conference on Data Mining (SDM 2012)},
PAGES = {954--965},
ADDRESS = {Anaheim, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T On Finding Joint Subspace Boolean Matrix Factorizations : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5901-2
%F EDOC: 647534
%U http://siam.omnibooksonline.com/2012datamining/data/papers/205.pdf#page=1
%F OTHER: Local-ID: C1256DBF005F876D-2B29D65C44BFCD9BC1257AF3003D99D9-miettinen12finding
%D 2012
%B Twelfth SIAM International Conference on Data Mining
%Z date of event: 2012-04-26 - 2012-04-28
%C Anaheim, CA, USA
%X Finding latent factors of the data using matrix factorizations is a 
tried-and-tested approach in data mining. But finding shared factors over 
multiple matrices is more novel problem. Specifically, given two matrices, we 
want to find a set of factors shared by these two matrices and sets of factors 
specific for the matrices. Not only does such decomposition reveal what is 
common between the two matrices, it also eliminates the need of explaining that 
common part twice, thus concentrating the non-shared factors to uniquely 
specific parts of the data. This paper studies a problem called Joint Subspace 
Boolean Matrix Factorization asking exactly that: a set of shared factors and 
sets of specific factors. Furthermore, the matrix factorization is based on the 
Boolean arithmetic. This restricts the presented approach suitable to only 
binary matrices. The benefits, however, include much sparser factor matrices 
and greater interpretability of the results. The paper presents three 
algorithms for finding the Joint Subspace Boolean Matrix Factorization, an MDL- 
based method for selecting the subspaces&#8217; dimensionality, and throughout 
experimental evaluation of the proposed algorithms.
%B Proceedings of the Twelfth SIAM International Conference on Data Mining
%P 954 - 965
%I SIAM
%@ 978-1-61197-232-0

Conference paper

A. Mishra, S. Gurajada, and M. Theobald

“Design and Evaluation of an IR-Benchmark for SPARQL Queries with Fulltext Conditions,” in ESAIR’12, Fifth ACM Workshop on Exploiting Semantic Annotations in Information Retrieval, Maui, Hawaii, 2012.

mehr

BibTeX

@inproceedings{Mishra2012,
TITLE = {Design and Evaluation of an {IR}-Benchmark for {SPARQL} Queries with Fulltext Conditions},
AUTHOR = {Mishra, Arunav and Gurajada, Sairam and Theobald, Martin},
LANGUAGE = {eng},
ISBN = {978-1-4503-1717-7},
URL = {http://doi.acm.org/10.1145/2390148.2390154},
DOI = {10.1145/2390148.2390154},
LOCALID = {Local-ID: C1256DBF005F876D-C5AA8D665F77F115C1257B1D00478C11-Mishra2012},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {ESAIR'12, Fifth ACM Workshop on Exploiting Semantic Annotations in Information Retrieval},
EDITOR = {Kamps, Jaap and Kalgren, Jussi and Mika, Peter and Murdock, Vanessa},
PAGES = {9--10},
ADDRESS = {Maui, Hawaii},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Gurajada, Sairam
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Design and Evaluation of an IR-Benchmark for SPARQL Queries with Fulltext Conditions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5E04-7
%F EDOC: 647498
%R 10.1145/2390148.2390154
%U http://doi.acm.org/10.1145/2390148.2390154
%F OTHER: Local-ID: C1256DBF005F876D-C5AA8D665F77F115C1257B1D00478C11-Mishra2012
%D 2012
%B Fifth ACM Workshop on Exploiting Semantic Annotations in Information Retrieval
%Z date of event: 2012-11-02 - 2012-11-02
%C Maui, Hawaii
%B ESAIR'12
%E Kamps, Jaap; Kalgren, Jussi; Mika, Peter; Murdock, Vanessa
%P 9 - 10
%I ACM
%@ 978-1-4503-1717-7

Conference paper

A. Mishra, S. Gurajada, and M. Theobald

“Running SPARQL-fulltext Queries Inside a Relational DBMS,” in CLEF 2012 Evaluation Labs and Workshop, Rome, Italy, 2012.

mehr

BibTeX

@inproceedings{MishraCLEF2012,
TITLE = {Running {SPARQL}-fulltext Queries Inside a Relational {DBMS}},
AUTHOR = {Mishra, Arunav and Gurajada, Sairam and Theobald, Martin},
LANGUAGE = {eng},
ISBN = {978-88-904810-3-1},
URL = {http://www.clef-initiative.eu/documents/71612/4f3e7bbd-41be-449e-9a8f-923632344d0a},
LOCALID = {Local-ID: C1256DBF005F876D-BF654F138A6BF6CDC1257AE80030C21D-MishraCLEF2012},
YEAR = {2012},
BOOKTITLE = {CLEF 2012 Evaluation Labs and Workshop},
EDITOR = {Forner, Pamela and Karlgren, Jussi and Womser-Hacker, Christa},
PAGES = {1--13},
ADDRESS = {Rome, Italy},
}

Endnote

%0 Conference Proceedings
%A Mishra, Arunav
%A Gurajada, Sairam
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Running SPARQL-fulltext Queries Inside a Relational DBMS : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-577F-7
%F EDOC: 647522
%U http://www.clef-initiative.eu/documents/71612/4f3e7bbd-41be-449e-9a8f-923632344d0a
%F OTHER: Local-ID: C1256DBF005F876D-BF654F138A6BF6CDC1257AE80030C21D-MishraCLEF2012
%D 2012
%B CLEF 2012 Evaluation Labs and Workshop
%Z date of event: 2012-09-17 - 2012-09-20
%C Rome, Italy
%B CLEF 2012 Evaluation Labs and Workshop
%E Forner, Pamela; Karlgren, Jussi; Womser-Hacker, Christa
%P 1 - 13
%@ 978-88-904810-3-1

Thesis

O. Mykytiuk

“Fast Search for Large Entries in a Matrix Product,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Mykytiuk2012,
TITLE = {Fast Search for Large Entries in a Matrix Product},
AUTHOR = {Mykytiuk, Olga},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-BB2B4ABFDA62286FC1257A5C00395001-Mykytiuk2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Mykytiuk, Olga
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Fast Search for Large Entries in a Matrix Product : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-B007-7
%F EDOC: 647471
%F OTHER: Local-ID: C1256DBF005F876D-BB2B4ABFDA62286FC1257A5C00395001-Mykytiuk2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Conference paper

D5IMPR-CS

N. Nakashole, G. Weikum, and F. Suchanek

“PATTY: A Taxonomy of Relational Patterns with Semantic Types,” in 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), Jeju, South Korea, 2012.

mehr

BibTeX

@inproceedings{patty-emnlp12,
TITLE = {{PATTY}: A Taxonomy of Relational Patterns with Semantic Types},
AUTHOR = {Nakashole, Ndapandula and Weikum, Gerhard and Suchanek, Fabian},
LANGUAGE = {eng},
ISBN = {978-1-937284-43-5},
URL = {http://aclweb.org/anthology-new/D/D12/D12-1104.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-53ECC499AE08EA71C1257AED0037D385-patty-emnlp12},
PUBLISHER = {ACL},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012)},
PAGES = {1135--1145},
ADDRESS = {Jeju, South Korea},
}

Endnote

%0 Conference Proceedings
%A Nakashole, Ndapandula
%A Weikum, Gerhard
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
%T PATTY: A Taxonomy of Relational Patterns with Semantic Types : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58C5-1
%F EDOC: 647530
%U http://aclweb.org/anthology-new/D/D12/D12-1104.pdf
%F OTHER: Local-ID: C1256DBF005F876D-53ECC499AE08EA71C1257AED0037D385-patty-emnlp12
%D 2012
%B Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
%Z date of event: 2012-07-12 - 2012-07-14
%C Jeju, South Korea
%B 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
%P 1135 - 1145
%I ACL
%@ 978-1-937284-43-5

Conference paper

N. Nakashole, M. Sozio, F. Suchanek, and M. Theobald

“Query-time Reasoning in Uncertain RDF Knowledge Bases with Soft and Hard Rules,” in Very Large Data Search (VLDS 2012), Istanbul, 2012.

mehr

BibTeX

@inproceedings{URDF-VLDS-2012,
TITLE = {Query-time Reasoning in Uncertain {RDF} Knowledge Bases with Soft and Hard Rules},
AUTHOR = {Nakashole, Ndapandula and Sozio, Mauro and Suchanek, Fabian and Theobald, Martin},
LANGUAGE = {eng},
ISSN = {1613-0073},
URL = {urn:nbn:de:0074-884-7; http://ceur-ws.org/Vol-884},
PUBLISHER = {CEUR},
YEAR = {2013},
BOOKTITLE = {Very Large Data Search (VLDS 2012)},
EDITOR = {Brambilla, Marco and Ceri, Stefano and Furche, Tim and Gottlob, Georg},
PAGES = {15--20},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {884},
ADDRESS = {Istanbul},
}

Endnote

%0 Conference Proceedings
%A Nakashole, Ndapandula
%A Sozio, Mauro
%A Suchanek, Fabian
%A Theobald, Martin
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Query-time Reasoning in Uncertain RDF Knowledge Bases with Soft and Hard Rules : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-652D-A
%D 2012
%B Second International Workshop on Searching and Integrating New Web Data Sources

%Z date of event: 2013-08-31 - 2013-08-31
%C Istanbul
%B Very Large Data Search
%E Brambilla, Marco; Ceri, Stefano; Furche, Tim; Gottlob, Georg
%P 15 - 20
%I CEUR
%B CEUR Workshop Proceedings
%N 884
%@ false
%U http://ceur-ws.org/Vol-884/VLDS2012_p15_Nakashole.pdf

Conference paper

D5IMPR-CS

N. Nakashole and G. Weikum

“Real-time Population of Knowledge Bases: Opportunities and Challenges,” in Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX 2012), Montreal, Canada, 2012.

mehr

BibTeX

@inproceedings{pearl-akbc2012,
TITLE = {Real-time Population of Knowledge Bases: Opportunities and Challenges},
AUTHOR = {Nakashole, Ndapandula and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-20-6},
URL = {http://aclweb.org/anthology-new/W/W12/W12-3008.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-C699A47739C91FDDC1257AED00372F07-pearl-akbc2012},
PUBLISHER = {ACL},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX 2012)},
PAGES = {41--45},
ADDRESS = {Montreal, Canada},
}

Endnote

%0 Conference Proceedings
%A Nakashole, Ndapandula
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Real-time Population of Knowledge Bases: Opportunities and Challenges : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-57A3-4
%F EDOC: 647529
%U http://aclweb.org/anthology-new/W/W12/W12-3008.pdf
%F OTHER: Local-ID: C1256DBF005F876D-C699A47739C91FDDC1257AED00372F07-pearl-akbc2012
%D 2012
%B Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
%Z date of event: 2012-06-07 - 2012-06-08
%C Montreal, Canada
%B Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
%P 41 - 45
%I ACL
%@ 978-1-937284-20-6

Article

D5IMPR-CS

N. Nakashole, G. Weikum, and F. Suchanek

“Discovering and Exploring Relations on the Web,” Proceedings of the VLDB Endowment, vol. 5, no. 12, 2012.

mehr

BibTeX

@article{pattydemo-vldb12,
TITLE = {Discovering and Exploring Relations on the Web},
AUTHOR = {Nakashole, Ndapandula and Weikum, Gerhard and Suchanek, Fabian},
LANGUAGE = {eng},
URL = {http://dl.acm.org/citation.cfm?id=2367502.2367553},
LOCALID = {Local-ID: C1256DBF005F876D-F60E6D3BEB7392FEC1257AED0034AD08-pattydemo-vldb12},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2012},
DATE = {2012},
JOURNAL = {Proceedings of the VLDB Endowment},
VOLUME = {5},
NUMBER = {12},
BOOKTITLE = {Proceedings of the 38th International Conference on Very Large Data Bases},
}

Endnote

%0 Journal Article
%A Nakashole, Ndapandula
%A Weikum, Gerhard
%A Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
%T Discovering and Exploring Relations on the Web : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5E01-D
%F EDOC: 647528
%U http://dl.acm.org/citation.cfm?id=2367502.2367553
%F OTHER: Local-ID: C1256DBF005F876D-F60E6D3BEB7392FEC1257AED0034AD08-pattydemo-vldb12
%D 2012
%J Proceedings of the VLDB Endowment
%O PVLDB
%V 5
%N 12
%I ACM
%C New York, NY
%B Proceedings of the 38th International Conference on Very Large Data Bases

Thesis

D5IMPR-CS

N. Nakashole

“Automatic Extraction of Facts, Relations, and Entities for Web-scale Knowledge Base Population,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

Abstract

quipping machines with knowledge, through the construction of machine-readable
knowledge bases, presents a key asset for semantic search, machine
translation, question answering, and other formidable challenges in
artificial intelligence. However, human knowledge predominantly resides
in books and other natural language text forms. This means that knowledge
bases must be extracted and synthesized from natural language text.
When the source of text is the Web, extraction methods must cope with
ambiguity, noise, scale, and updates.

The goal of this dissertation is to develop knowledge base population
methods that address the afore mentioned characteristics of Web text. The
dissertation makes three contributions. The first contribution is a method
for mining high-quality facts at scale, through distributed constraint reasoning
and a pattern representation model that is robust against noisy
patterns. The second contribution is a method for mining a large comprehensive
collection of relation types beyond those commonly found in
existing knowledge bases. The third contribution is a method for extracting
facts from dynamic Web sources such as news articles and social media
where one of the key challenges is the constant emergence of new entities.
All methods have been evaluated through experiments involving Web-scale
text collections.

BibTeX

@phdthesis{phdthesis-nakashole,
TITLE = {Automatic Extraction of Facts, Relations, and Entities for Web-scale Knowledge Base Population},
AUTHOR = {Nakashole, Ndapandula},
LANGUAGE = {eng},
URL = {urn:nbn:de:bsz:291-scidok-50545},
DOI = {10.22028/D291-26412},
LOCALID = {Local-ID: C1256DBF005F876D-312844A683E3D3CFC1257AED006307CA-phdthesis-nakashole},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {quipping machines with knowledge, through the construction of machine-readable<br>knowledge bases, presents a key asset for semantic search, machine<br>translation, question answering, and other formidable challenges in<br>artificial intelligence. However, human knowledge predominantly resides<br>in books and other natural language text forms. This means that knowledge<br>bases must be extracted and synthesized from natural language text.<br>When the source of text is the Web, extraction methods must cope with<br>ambiguity, noise, scale, and updates.<br><br>The goal of this dissertation is to develop knowledge base population<br>methods that address the afore mentioned characteristics of Web text. The<br>dissertation makes three contributions. The first contribution is a method<br>for mining high-quality facts at scale, through distributed constraint reasoning<br>and a pattern representation model that is robust against noisy<br>patterns. The second contribution is a method for mining a large comprehensive<br>collection of relation types beyond those commonly found in<br>existing knowledge bases. The third contribution is a method for extracting<br>facts from dynamic Web sources such as news articles and social media<br>where one of the key challenges is the constant emergence of new entities.<br>All methods have been evaluated through experiments involving Web-scale<br>text collections.},
}

Endnote

%0 Thesis
%A Nakashole, Ndapandula
%Y Weikum, Gerhard
%A referee: Suchanek, Fabian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Automatic Extraction of Facts, Relations, and Entities for Web-scale Knowledge Base Population : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-627F-A
%F EDOC: 647490
%F OTHER: Local-ID: C1256DBF005F876D-312844A683E3D3CFC1257AED006307CA-phdthesis-nakashole
%R 10.22028/D291-26412
%U urn:nbn:de:bsz:291-scidok-50545
%F OTHER: hdl:20.500.11880/26468
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V phd
%9 phd
%X quipping machines with knowledge, through the construction of machine-readable<br>knowledge bases, presents a key asset for semantic search, machine<br>translation, question answering, and other formidable challenges in<br>artificial intelligence. However, human knowledge predominantly resides<br>in books and other natural language text forms. This means that knowledge<br>bases must be extracted and synthesized from natural language text.<br>When the source of text is the Web, extraction methods must cope with<br>ambiguity, noise, scale, and updates.<br><br>The goal of this dissertation is to develop knowledge base population<br>methods that address the afore mentioned characteristics of Web text. The<br>dissertation makes three contributions. The first contribution is a method<br>for mining high-quality facts at scale, through distributed constraint reasoning<br>and a pattern representation model that is robust against noisy<br>patterns. The second contribution is a method for mining a large comprehensive<br>collection of relation types beyond those commonly found in<br>existing knowledge bases. The third contribution is a method for extracting<br>facts from dynamic Web sources such as news articles and social media<br>where one of the key challenges is the constant emergence of new entities.<br>All methods have been evaluated through experiments involving Web-scale<br>text collections.
%U http://scidok.sulb.uni-saarland.de/volltexte/2013/5054/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de

Thesis

D5IMPR-CS

D. B. Nguyen

“Efficient Entity Disambiguation via Similarity Hashing,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Nguyen2012,
TITLE = {Efficient Entity Disambiguation via Similarity Hashing},
AUTHOR = {Nguyen, Dat Ba},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-86BEFB1566020C4AC1257A6400543D05-Nguyen2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Nguyen, Dat Ba
%Y Theobald, Martin
%A referee: Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Efficient Entity Disambiguation via Similarity Hashing : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-626B-5
%F EDOC: 647513
%F OTHER: Local-ID: C1256DBF005F876D-86BEFB1566020C4AC1257A6400543D05-Nguyen2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Thesis

B. Paudel

“Redundancy Control in Web Archives,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Paudel2012,
TITLE = {Redundancy Control in Web Archives},
AUTHOR = {Paudel, Bibek},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-DDB4B1E6A265141CC1257A130040A75B-Paudel2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Paudel, Bibek
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Redundancy Control in Web Archives : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-620E-6
%F EDOC: 647463
%F OTHER: Local-ID: C1256DBF005F876D-DDB4B1E6A265141CC1257A130040A75B-Paudel2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Conference paper

C. Pölitz and R. Schenkel

“Robust Ranking Models Using Noisy Feedback,” in Workshop “Information Retrieval Over Query Sessions” (SIR 2012) at ECIR 2012, Barcelona, Spain, 2012.

mehr

Abstract

Direct feedback of users of search engines by click information is naturally

noisy. Ranking models that integrate such feedback in their training process

must cope with this noise. In worst case such noise can lead to large variance

among the results for different queries in the resulting rankings. We propose

to integrate model averaging like bagging and random forest methods to reduce

the variance in the ranking models. We perform an experimental study on

different noise levels using a state of the art ranking model.

BibTeX

@inproceedings{PoelitzSchenkel_SIR2012,
TITLE = {Robust Ranking Models Using Noisy Feedback},
AUTHOR = {P{\"o}litz, Christian and Schenkel, Ralf},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-63A3D0866B5A0B5DC12579E6004DCDAA-PoelitzSchenkel_SIR2012},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Direct feedback of users of search engines by click information is naturally noisy. Ranking models that integrate such feedback in their training process must cope with this noise. In worst case such noise can lead to large variance among the results for different queries in the resulting rankings. We propose to integrate model averaging like bagging and random forest methods to reduce the variance in the ranking models. We perform an experimental study on different noise levels using a state of the art ranking model.},
BOOKTITLE = {Workshop "Information Retrieval Over Query Sessions" (SIR 2012) at ECIR 2012},
PAGES = {1--6},
ADDRESS = {Barcelona, Spain},
}

Endnote

%0 Conference Proceedings
%A P&#246;litz, Christian
%A Schenkel, Ralf
%+ Cluster of Excellence "Multimodal Computing and Interaction" 
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Robust Ranking Models Using Noisy Feedback : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5783-C
%F EDOC: 647504
%F OTHER: Local-ID: C1256DBF005F876D-63A3D0866B5A0B5DC12579E6004DCDAA-PoelitzSchenkel_SIR2012
%D 2012
%B 34th European Conference on IR Research
%Z date of event: 2012-04-01 - 2012-04-01
%C Barcelona, Spain
%X Direct feedback of users of search engines by click information is naturally 
noisy. Ranking models that integrate such feedback in their training process 
must cope with this noise. In worst case such noise can lead to large variance 
among the results for different queries in the resulting rankings. We propose 
to integrate model averaging like bagging and random forest methods to reduce 
the variance in the ranking models. We perform an experimental study on 
different noise levels using a state of the art ranking model.
%B Workshop "Information Retrieval Over Query Sessions" (SIR 2012) at ECIR 2012
%P 1 - 6

Conference paper

C. Pölitz and R. Schenkel

“Ranking under Tight Budgets,” in Proceedings Twenty-Third International Workshop on Database and Expert Systems Applications (DEXA 2012), Vienna, Austria, 2012.

mehr

BibTeX

@inproceedings{PoelitzSchenkel_TIR2012,
TITLE = {Ranking under Tight Budgets},
AUTHOR = {P{\"o}litz, Christian and Schenkel, Ralf},
LANGUAGE = {eng},
ISBN = {978-0-7695-4801-2},
DOI = {10.1109/DEXA.2012.21},
LOCALID = {Local-ID: C1256DBF005F876D-DAE6A93802084DE0C12579F20041B6D7-PoelitzSchenkel_TIR2012},
PUBLISHER = {IEEE},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Proceedings Twenty-Third International Workshop on Database and Expert Systems Applications (DEXA 2012)},
EDITOR = {Hameurlain, Abdelkader and Tjoa, A Min and Wagner, Roland R.},
PAGES = {161--165},
ADDRESS = {Vienna, Austria},
}

Endnote

%0 Conference Proceedings
%A P&#246;litz, Christian
%A Schenkel, Ralf
%+ Cluster of Excellence "Multimodal Computing and Interaction"
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Ranking under Tight Budgets : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-57B2-2
%F EDOC: 647501
%R 10.1109/DEXA.2012.21
%F OTHER: Local-ID: C1256DBF005F876D-DAE6A93802084DE0C12579F20041B6D7-PoelitzSchenkel_TIR2012
%D 2012
%B Twenty-Third International Workshop on Database and Expert Systems Applications
%Z date of event: 2012-09-03 - 2012-09-07
%C Vienna, Austria
%B Proceedings Twenty-Third International Workshop on Database and Expert Systems Applications
%E Hameurlain, Abdelkader; Tjoa, A Min; Wagner, Roland R.
%P 161 - 165
%I IEEE
%@ 978-0-7695-4801-2

Conference paper

IMPR-CSD5

N. Prytkova, M. Spaniol, and G. Weikum

“Predicting the Evolution of Taxonomy Restructuring in Collective Web Catalogues,” in 15th International Workshop on the Web and Databases (WebDB 2012), Scottsdale, AZ, USA, 2012.

mehr

BibTeX

@inproceedings{PSWe12,
TITLE = {Predicting the Evolution of Taxonomy Restructuring in Collective Web Catalogues},
AUTHOR = {Prytkova, Natalia and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://db.disi.unitn.eu/pages/WebDB2012/papers/p8.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-7703DFF5F7852510C1257AD8004F5200-PSWe12},
PUBLISHER = {WebDB},
YEAR = {2012},
BOOKTITLE = {15th International Workshop on the Web and Databases (WebDB 2012)},
EDITOR = {Ives, Zachary G. and Velegrakis, Yannis},
PAGES = {1--6},
ADDRESS = {Scottsdale, AZ, USA},
}

Endnote

%0 Conference Proceedings
%A Prytkova, Natalia
%A Spaniol, Marc
%A Weikum, Gerhard
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Predicting the Evolution of Taxonomy Restructuring in Collective Web Catalogues : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-57F0-7
%F EDOC: 647519
%U http://db.disi.unitn.eu/pages/WebDB2012/papers/p8.pdf
%F OTHER: Local-ID: C1256DBF005F876D-7703DFF5F7852510C1257AD8004F5200-PSWe12
%D 2012
%B 15th International Workshop on the Web and Databases
%Z date of event: 2012-05-20 - 2012-05-20
%C Scottsdale, AZ, USA
%B 15th International Workshop on the Web and Databases
%E Ives, Zachary G.; Velegrakis, Yannis
%P 1 - 6
%I WebDB

Conference paper

D5IMPR-CS

L. Qu, R. Gemulla, and G. Weikum

“A Weakly Supervised Model for Sentence-level Semantic Orientation Analysis with Multiple Experts,” in 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), Jeju Island, Korea, 2012.

mehr

Abstract

We propose the weakly supervised \emph{Multi-Experts Model} (MEM) for

analyzing the semantic orientation of opinions expressed in natural language

reviews. In contrast to most prior work, MEM predicts both opinion polarity and

opinion strength at the level of individual sentences; such fine-grained

analysis helps to understand better why users like or dislike the entity under

review. A key challenge in this setting is that it is hard to obtain

sentence-level training data for both polarity and strength. For this reason,

MEM is weakly supervised: It starts with potentially noisy indicators obtained

from coarse-grained training data (i.e., document-level ratings), a small set

of diverse base predictors, and, if available, small amounts of fine-grained

training data. We integrate these noisy indicators into a unified probabilistic

framework using ideas from ensemble learning and graph-based semi-supervised

learning. Our experiments indicate that MEM outperforms state-of-the-art

methods by a significant margin.

BibTeX

@inproceedings{Qu2012a,
TITLE = {A Weakly Supervised Model for Sentence-level Semantic Orientation Analysis with Multiple Experts},
AUTHOR = {Qu, Lizhen and Gemulla, Rainer and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-43-5},
URL = {http://aclweb.org/anthology-new/D/D12/D12-1014.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-75AE874C4A8E5F21C1257B0800733FFD-Qu2012a},
PUBLISHER = {ACL},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {We propose the weakly supervised \emph{Multi-Experts Model} (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.},
BOOKTITLE = {2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012)},
PAGES = {149--159},
ADDRESS = {Jeju Island, Korea},
}

Endnote

%0 Conference Proceedings
%A Qu, Lizhen
%A Gemulla, Rainer
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T A Weakly Supervised Model for Sentence-level Semantic Orientation Analysis with Multiple Experts : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FAF-0
%F EDOC: 647494
%U http://aclweb.org/anthology-new/D/D12/D12-1014.pdf
%F OTHER: Local-ID: C1256DBF005F876D-75AE874C4A8E5F21C1257B0800733FFD-Qu2012a
%D 2012
%B Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
%Z date of event: 2012-07-12 - 2012-07-14
%C Jeju Island, Korea
%X We propose the weakly supervised \emph{Multi-Experts Model} (MEM) for 
analyzing the semantic orientation of opinions expressed in natural language 
reviews. In contrast to most prior work, MEM predicts both opinion polarity and 
opinion strength at the level of individual sentences; such fine-grained 
analysis helps to understand better why users like or dislike the entity under 
review. A key challenge in this setting is that it is hard to obtain 
sentence-level training data for both polarity and strength. For this reason, 
MEM is weakly supervised: It starts with potentially noisy indicators obtained 
from coarse-grained training data (i.e., document-level ratings), a small set 
of diverse base predictors, and, if available, small amounts of fine-grained 
training data. We integrate these noisy indicators into a unified probabilistic 
framework using ideas from ensemble learning and graph-based semi-supervised 
learning. Our experiments indicate that MEM outperforms state-of-the-art 
methods by a significant margin.
%B 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
%P 149 - 159
%I ACL
%@ 978-1-937284-43-5

Proceedings

G. Raschia, M. Theobald, and I. Manolescu

Eds., Proceedings of the First International Workshop On Open Data. ACM, 2012.

mehr

BibTeX

@proceedings{Theobald-WOD2011,
TITLE = {Proceedings of the First International Workshop On Open Data (WOD 2012)},
EDITOR = {Raschia, Guillaume and Theobald, Martin and Manolescu, Ioana},
LANGUAGE = {eng},
ISBN = {978-1-4503-1404-6},
URL = {http://arxiv.org/abs/1204.3726},
LOCALID = {Local-ID: C1256DBF005F876D-23B11FE659FF1D1DC1257A2300245E40-Theobald-WOD2011},
PUBLISHER = {ACM},
YEAR = {2013},
PAGES = {77},
ADDRESS = {Nantes, France},
}

Endnote

%0 Conference Proceedings
%E Raschia, Guillaume
%E Theobald, Martin
%E Manolescu, Ioana
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Proceedings of the First International Workshop On Open Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-57E5-1
%F EDOC: 647509
%@ 978-1-4503-1404-6
%U http://arxiv.org/abs/1204.3726
%F OTHER: Local-ID: C1256DBF005F876D-23B11FE659FF1D1DC1257A2300245E40-Theobald-WOD2011
%I ACM
%D 2012
%B WOD 2012
%Z date of event: 2013-05-25 - 2013-05-25
%D 2013
%C Nantes, France
%P 77

Article

D5IMPR-CS

S. Seufert, A. Anand, S. Bedathur, and G. Weikum

“High-performance Reachability Query Processing under Index Size Restrictions,” arXiv, vol. abs/1211.3375, 2012.

mehr

Abstract

In this paper, we propose a scalable and highly efficient index structure for

the reachability problem over graphs. We build on the well-known node interval

labeling scheme where the set of vertices reachable from a particular node is

compactly encoded as a collection of node identifier ranges. We impose an

explicit bound on the size of the index and flexibly assign approximate

reachability ranges to nodes of the graph such that the number of index probes

to answer a query is minimized. The resulting tunable index structure generates

a better range labeling if the space budget is increased, thus providing a

direct control over the trade off between index size and the query processing

performance. By using a fast recursive querying method in conjunction with our

index structure, we show that web-scale graphs comprising hundreds of millions

of nodes and billions of edges can be efficiently processed such that the

resulting size-constrained index allows answering reachability queries in the

order of a few microseconds, using an off-the-shelf computer. Our claims are

supported by an extensive set of experimental results using a multitude of

benchmark and real-world web-scale graph datasets.

BibTeX

@article{Seufert2012,
TITLE = {High-performance Reachability Query Processing under Index Size Restrictions},
AUTHOR = {Seufert, Stephan and Anand, Avishek and Bedathur, Srikanta and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://arxiv.org/abs/1211.3375},
LOCALID = {Local-ID: C1256DBF005F876D-3C35E5D5EF717B47C1257AB800623532-Seufert2012},
PUBLISHER = {Cornell University Library},
ADDRESS = {Ithaca, NY},
YEAR = {2012},
ABSTRACT = {In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that web-scale graphs comprising hundreds of millions of nodes and billions of edges can be efficiently processed such that the resulting size-constrained index allows answering reachability queries in the order of a few microseconds, using an off-the-shelf computer. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.},
JOURNAL = {arXiv},
VOLUME = {abs/1211.3375},
PAGES = {1--30},
}

Endnote

%0 Journal Article
%A Seufert, Stephan
%A Anand, Avishek
%A Bedathur, Srikanta
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T High-performance Reachability Query Processing under Index Size Restrictions : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59DC-7
%F EDOC: 647478
%U http://arxiv.org/abs/1211.3375
%F OTHER: Local-ID: C1256DBF005F876D-3C35E5D5EF717B47C1257AB800623532-Seufert2012
%7 2012
%D 2012
%X In this paper, we propose a scalable and highly efficient index structure for 
the reachability problem over graphs. We build on the well-known node interval 
labeling scheme where the set of vertices reachable from a particular node is 
compactly encoded as a collection of node identifier ranges. We impose an 
explicit bound on the size of the index and flexibly assign approximate 
reachability ranges to nodes of the graph such that the number of index probes 
to answer a query is minimized. The resulting tunable index structure generates 
a better range labeling if the space budget is increased, thus providing a 
direct control over the trade off between index size and the query processing 
performance. By using a fast recursive querying method in conjunction with our 
index structure, we show that web-scale graphs comprising hundreds of millions 
of nodes and billions of edges can be efficiently processed such that the 
resulting size-constrained index allows answering reachability queries in the 
order of a few microseconds, using an off-the-shelf computer. Our claims are 
supported by an extensive set of experimental results using a multitude of 
benchmark and real-world web-scale graph datasets.
%J arXiv
%V abs/1211.3375
%& 1
%P 1 - 30
%I Cornell University Library
%C Ithaca, NY

Thesis

W. Shao

“Tensor Completion,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Shao2012,
TITLE = {Tensor Completion},
AUTHOR = {Shao, Weijia},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-D14917A89D962157C1257A5C003993F3-Shao2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
}

Endnote

%0 Thesis
%A Shao, Weijia
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Tensor Completion : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-6214-7
%F EDOC: 647472
%F OTHER: Local-ID: C1256DBF005F876D-D14917A89D962157C1257A5C003993F3-Shao2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master

Article

M. Spaniol, A. Benczúr, Z. Viharos, and G. Weikum

“Big Web Analytics: Toward a Virtual Web Observatory,” Ercim News, vol. 89, 2012.

mehr

BibTeX

@article{SBVW12,
TITLE = {Big Web Analytics: Toward a Virtual Web Observatory},
AUTHOR = {Spaniol, Marc and Bencz{\'u}r, Andr{\'a}s and Viharos, Zsolt and Weikum, Gerhard},
LANGUAGE = {eng},
ISSN = {0926-4981},
URL = {http://ercim-news.ercim.eu/images/stories/EN89/EN89-web.pdf#page=23},
LOCALID = {Local-ID: C1256DBF005F876D-D24A45C2003EA929C1257AD8004D6540-SBVW12},
PUBLISHER = {ERCIM EEIG},
ADDRESS = {Sophia-Antipolis},
YEAR = {2012},
DATE = {2012},
JOURNAL = {Ercim News},
VOLUME = {89},
PAGES = {23--24},
}

Endnote

%0 Journal Article
%A Spaniol, Marc
%A Bencz&#250;r, Andr&#225;s
%A Viharos, Zsolt
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Big Web Analytics: Toward a Virtual Web Observatory : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FA4-6
%F EDOC: 647517
%U http://ercim-news.ercim.eu/images/stories/EN89/EN89-web.pdf#page=23
%F OTHER: Local-ID: C1256DBF005F876D-D24A45C2003EA929C1257AD8004D6540-SBVW12
%D 2012
%* Review method: peer-reviewed
%J Ercim News
%V 89
%& 23
%P 23 - 24
%I ERCIM EEIG
%C Sophia-Antipolis
%@ false

Conference paper

M. Spaniol and G. Weikum

“Tracking Entities in Web Archives: The LAWA Project,” in WWW’12, 21st Annual Conference on World Wide Web Companion, Lyon, France, 2012.

mehr

BibTeX

@inproceedings{SpWe12,
TITLE = {Tracking Entities in Web Archives: The {LAWA} Project},
AUTHOR = {Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1230-1},
URL = {http://doi.acm.org/10.1145/2187980.2188030},
DOI = {10.1145/2187980.2188030},
LOCALID = {Local-ID: C1256DBF005F876D-E32BCE2C1A25AD0CC1257AD8004ECBD9-SpWe12},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {WWW'12, 21st Annual Conference on World Wide Web Companion},
EDITOR = {Mille, Alain and Gandon, Fabien and Misselis, Jacques and Rabinovich, Michael and Staab, Steffen},
PAGES = {287--290},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Tracking Entities in Web Archives: The LAWA Project : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-573E-C
%F EDOC: 647481
%R 10.1145/2187980.2188030
%U http://doi.acm.org/10.1145/2187980.2188030
%F OTHER: Local-ID: C1256DBF005F876D-E32BCE2C1A25AD0CC1257AD8004ECBD9-SpWe12
%D 2012
%B 21st Annual Conference on World Wide Web Companion
%Z date of event: 2012-04-16 - 2012-04-20
%C Lyon, France
%B WWW'12
%E Mille, Alain; Gandon, Fabien; Misselis, Jacques; Rabinovich, Michael; Staab, Steffen
%P 287 - 290
%I ACM
%@ 978-1-4503-1230-1

Conference paper

A. Stupar and S. Michel

“Being Picky - Processing Top-K Queries with Set-Defined Selections,” in CIKM’12, 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 2012.

mehr

BibTeX

@inproceedings{Stupar2012b,
TITLE = {Being Picky -- Processing Top-{K} Queries with Set-Defined Selections},
AUTHOR = {Stupar, Aleksandar and Michel, Sebastian},
LANGUAGE = {eng},
ISBN = {978-1-4503-1156-4},
DOI = {10.1145/2396761.2396877},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {CIKM'12, 21st ACM International Conference on Information and Knowledge Management},
EDITOR = {Chen, Xue-Wen and Lebanon, Guy and Wang, Haixun and Zaki, Mohammed J.},
PAGES = {912--921},
ADDRESS = {Maui, HI, USA},
}

Endnote

%0 Conference Proceedings
%A Stupar, Aleksandar
%A Michel, Sebastian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Being Picky - Processing Top-K Queries with Set-Defined Selections : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FAC-5
%F EDOC: 647510
%R 10.1145/2396761.2396877
%D 2012
%B 21st ACM International Conference on Information and Knowledge Management
%Z date of event: 2012-10-29 - 2012-11-02
%C Maui, HI, USA
%B CIKM'12
%E Chen, Xue-Wen; Lebanon, Guy; Wang, Haixun; Zaki, Mohammed J.
%P 912 - 921
%I ACM
%@ 978-1-4503-1156-4

Conference paper

A. Stupar and S. Michel

“Enhancing Locality Sensitive Hashing with Peek-Probing and Nearest Neighbor Links,” in 15th International Workshop on the Web and Databases (WebDB 2012), Scottsdale, AZ, USA, 2012.

mehr

BibTeX

@inproceedings{Stupar2012a,
TITLE = {Enhancing Locality Sensitive Hashing with Peek-Probing and Nearest Neighbor Links},
AUTHOR = {Stupar, Aleksandar and Michel, Sebastian},
LANGUAGE = {eng},
URL = {http://db.disi.unitn.eu/pages/WebDB2012/papers/p6.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-6839B88F561A515EC12579EC003D39FC-Stupar2012a},
PUBLISHER = {WebDB},
YEAR = {2012},
BOOKTITLE = {15th International Workshop on the Web and Databases (WebDB 2012)},
EDITOR = {Ives, Zachary G. and Velegrakis, Yannis},
PAGES = {37--42},
ADDRESS = {Scottsdale, AZ, USA},
}

Endnote

%0 Conference Proceedings
%A Stupar, Aleksandar
%A Michel, Sebastian
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Enhancing Locality Sensitive Hashing with Peek-Probing and Nearest Neighbor Links : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5A32-E
%F EDOC: 647500
%U http://db.disi.unitn.eu/pages/WebDB2012/papers/p6.pdf
%F OTHER: Local-ID: C1256DBF005F876D-6839B88F561A515EC12579EC003D39FC-Stupar2012a
%D 2012
%B 15th International Workshop on the Web and Databases
%Z date of event: 2012-05-20 - 2012-05-20
%C Scottsdale, AZ, USA
%B 15th International Workshop on the Web and Databases
%E Ives, Zachary G.; Velegrakis, Yannis
%P 37 - 42
%I WebDB

Thesis

A. Talaika

“Two-phase Information Extraction using Statistical Pattern Mining and Conditional Random Fields,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{TalaikaBachelorsThesis2012,
TITLE = {Two-phase Information Extraction using Statistical Pattern Mining and Conditional Random Fields},
AUTHOR = {Talaika, Aliaksandr},
LANGUAGE = {eng},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
TYPE = {Bachelor's thesis},
}

Endnote

%0 Thesis
%A Talaika, Aliaksandr
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Two-phase Information Extraction using Statistical Pattern Mining and
Conditional Random Fields : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0024-5C4A-F
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V bachelor
%9 bachelor

Conference poster

IMPR-CSD5

N. Tandon and A. Jain

“Citation Context Sentiment Analysis for Structured Summarization of Research Papers,” Proceedings of the 35th German Conference on Artificial Intelligence (KI 2012). Springer, Berlin, 2012.

mehr

BibTeX

@inproceedings{TandonKI2012,
TITLE = {Citation Context Sentiment Analysis for Structured Summarization of Research Papers},
AUTHOR = {Tandon, Niket and Jain, Ashish},
LANGUAGE = {eng},
URL = {http://www.mpi-inf.mpg.de/~ntandon/papers/ki2012-tandon.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-B4397D968E0E592FC1257AF8005AF791-TandonKI2012},
PUBLISHER = {Springer},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Proceedings of the 35th German Conference on Artificial Intelligence (KI 2012)},
PAGES = {98--102},
ADDRESS = {Saarbr{\"u}cken, Germany},
}

Endnote

%0 Generic
%A Tandon, Niket
%A Jain, Ashish
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Citation Context Sentiment Analysis for Structured Summarization of Research Papers : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5FA0-E
%F EDOC: 647537
%U http://www.mpi-inf.mpg.de/~ntandon/papers/ki2012-tandon.pdf
%F OTHER: Local-ID: C1256DBF005F876D-B4397D968E0E592FC1257AF8005AF791-TandonKI2012
%D 2012
%Z name of event: KI 2012
%Z date of event: 2012-09-24 - 2012-09-27
%Z place of event: Saarbr&#252;cken, Germany
%B Proceedings of the 35th German Conference on Artificial Intelligence
%P 98 - 102

Conference poster

IMPR-CSD5

N. Tandon, D. Rajagopal, and G. de Melo

“Markov Chains for Robust Graph-based Commonsense Information Extraction,” Proceedings of COLING 2012. ACL, Stroudsburg PA, 2012.

mehr

BibTeX

@inproceedings{TandonColing2012,
TITLE = {Markov Chains for Robust Graph-based Commonsense Information Extraction},
AUTHOR = {Tandon, Niket and Rajagopal, Dheeraj and de Melo, Gerard},
LANGUAGE = {eng},
URL = {http://aclweb.org/anthology//C/C12/C12-3055.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-E77653A07F1E4B86C1257AF8005AA36E-TandonColing2012},
PUBLISHER = {ACL},
YEAR = {2012},
BOOKTITLE = {Proceedings of COLING 2012},
PAGES = {439--446},
ADDRESS = {Mumbai, India},
}

Endnote

%0 Generic
%A Tandon, Niket
%A Rajagopal, Dheeraj
%A de Melo, Gerard
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
%T Markov Chains for Robust Graph-based Commonsense Information Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5953-8
%F EDOC: 647536
%U http://aclweb.org/anthology//C/C12/C12-3055.pdf
%F OTHER: Local-ID: C1256DBF005F876D-E77653A07F1E4B86C1257AF8005AA36E-TandonColing2012
%D 2012
%Z name of event: COLING 2012
%Z date of event: 2012-12-08 - 2012-12-15
%Z place of event: Mumbai, India
%B Proceedings of COLING 2012
%P 439 - 446

Conference paper

IMPR-CSD5

C. Teflioudi, F. Makari, and R. Gemulla

“Distributed Matrix Completion,” in 12th IEEE International Conference on Data Mining (ICDM 2012), Brussels, Belgium, 2012.

mehr

BibTeX

@inproceedings{Teflioudi2012,
TITLE = {Distributed Matrix Completion},
AUTHOR = {Teflioudi, Christina and Makari, Faraz and Gemulla, Rainer},
LANGUAGE = {eng},
ISBN = {978-0-7695-4905-7},
DOI = {10.1109/ICDM.2012.120},
LOCALID = {Local-ID: C1256DBF005F876D-B409E03D0ADD4EF5C1257AD7003D10C7-Teflioudi2012},
PUBLISHER = {IEEE},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {12th IEEE International Conference on Data Mining (ICDM 2012)},
PAGES = {655--664},
ADDRESS = {Brussels, Belgium},
}

Endnote

%0 Conference Proceedings
%A Teflioudi, Christina
%A Makari, Faraz
%A Gemulla, Rainer
%+ International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Distributed Matrix Completion : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5DFF-C
%F EDOC: 647516
%R 10.1109/ICDM.2012.120
%F OTHER: Local-ID: C1256DBF005F876D-B409E03D0ADD4EF5C1257AD7003D10C7-Teflioudi2012
%D 2012
%B 12th IEEE International Conference on Data Mining
%Z date of event: 2012-12-10 - 2012-12-13
%C Brussels, Belgium
%B 12th IEEE International Conference on Data Mining
%P 655 - 664
%I IEEE
%@ 978-0-7695-4905-7

Conference paper

Q. Wang, J. Kamps, G. Ramirez Camps, M. Marx, A. Schuth, M. Theobald, S. Gurajada, and A. Mishra

“Overview of the INEX 2012 Linked Data Track,” in CLEF 2012 Evaluation Labs and Workshop, Rome, Italy, 2012.

mehr

BibTeX

@inproceedings{CLEF2012,
TITLE = {Overview of the {INEX} 2012 Linked Data Track},
AUTHOR = {Wang, Qiuyue and Kamps, Jaap and Ramirez Camps, Georgina and Marx, Maarten and Schuth, Anne and Theobald, Martin and Gurajada, Sairam and Mishra, Arunav},
LANGUAGE = {eng},
ISBN = {978-88-904810-3-1},
URL = {http://www.clef-initiative.eu/documents/71612/ebba5434-ee01-4bf6-bc12-fe8661a496cb},
LOCALID = {Local-ID: C1256DBF005F876D-77D8148A6067C6A2C1257AE800314D36-CLEF2012},
YEAR = {2012},
BOOKTITLE = {CLEF 2012 Evaluation Labs and Workshop},
EDITOR = {Forner, Pamela and Karlgren, Jussi and Womser-Hacker, Christa},
PAGES = {1--13},
ADDRESS = {Rome, Italy},
}

Endnote

%0 Conference Proceedings
%A Wang, Qiuyue
%A Kamps, Jaap
%A Ramirez Camps, Georgina
%A Marx, Maarten
%A Schuth, Anne
%A Theobald, Martin
%A Gurajada, Sairam
%A Mishra, Arunav
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Overview of the INEX 2012 Linked Data Track : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58DB-2
%F EDOC: 647523
%U http://www.clef-initiative.eu/documents/71612/ebba5434-ee01-4bf6-bc12-fe8661a496cb
%F OTHER: Local-ID: C1256DBF005F876D-77D8148A6067C6A2C1257AE800314D36-CLEF2012
%D 2012
%B CLEF 2012 Evaluation Labs and Workshop
%Z date of event: 2012-09-17 - 2012-09-20
%C Rome, Italy
%B CLEF 2012 Evaluation Labs and Workshop
%E Forner, Pamela; Karlgren, Jussi; Womser-Hacker, Christa
%P 1 - 13
%@ 978-88-904810-3-1

Conference paper

Q. Wang, G. Ramírez, M. M. Marx, M. Theobald, and J. Kamps

“Overview of the INEX 2011 Data-Centric Track,” in Focused Retrieval of Content and Structure (INEX 2011), Saarbrücken, Germany, 2012.

mehr

BibTeX

@inproceedings{Theobald-INEX2011,
TITLE = {Overview of the {INEX} 2011 Data-Centric Track},
AUTHOR = {Wang, Qiuyue and Ram{\'i}rez, Georgina and Marx, Maarten Marx and Theobald, Martin and Kamps, Jaap},
LANGUAGE = {eng},
ISBN = {978-3-642-35734-3},
DOI = {10.1007/978-3-642-35734-3_10},
LOCALID = {Local-ID: C1256DBF005F876D-77A15C80CEBACB50C1257A230023B002-Theobald-INEX2011},
PUBLISHER = {Springer},
YEAR = {2011},
BOOKTITLE = {Focused Retrieval of Content and Structure (INEX 2011)},
EDITOR = {Geva, Shlomo and Kamps, Jaap and Schenkel, Ralf},
PAGES = {118--137},
SERIES = {Lecture Notes in Computer Science},
VOLUME = {7424},
ADDRESS = {Saarbr{\"u}cken, Germany},
}

Endnote

%0 Conference Proceedings
%A Wang, Qiuyue
%A Ram&#237;rez, Georgina
%A Marx, Maarten Marx
%A Theobald, Martin
%A Kamps, Jaap
%+ External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
%T Overview of the INEX 2011 Data-Centric Track : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58E0-3
%F EDOC: 647465
%R 10.1007/978-3-642-35734-3_10
%F OTHER: Local-ID: C1256DBF005F876D-77A15C80CEBACB50C1257A230023B002-Theobald-INEX2011
%D 2012
%B 10th International Workshop of the Initiative for the Evaluation of XML Retrieval
%Z date of event: 2011-12-12 - 2011-12-14
%C Saarbr&#252;cken, Germany
%B Focused Retrieval of Content and Structure
%E Geva, Shlomo; Kamps, Jaap; Schenkel, Ralf
%P 118 - 137
%I Springer
%@ 978-3-642-35734-3
%B Lecture Notes in Computer Science
%N 7424

Conference paper

D5IMPR-CS

Y. Wang, M. Dylla, Z. Ren, M. Spaniol, and G. Weikum

“PRAVDA-live: Interactive Knowledge Harvesting,” in CIKM’12, 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 2012.

mehr

BibTeX

@inproceedings{WDR12,
TITLE = {{PRAVDA}-live: Interactive Knowledge Harvesting},
AUTHOR = {Wang, Yafang and Dylla, Maximilian and Ren, Zhaouchun and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1156-4},
URL = {http://doi.acm.org/10.1145/2396761.2398722},
DOI = {10.1145/2396761.2398722},
LOCALID = {Local-ID: C1256DBF005F876D-8694BA7042FBF01DC1257AD80050E30E-WDR*12},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {CIKM'12, 21st ACM International Conference on Information and Knowledge Management},
EDITOR = {Chen, Xue-Wen and Lebanon, Guy and Wang, Haixun and Zaki, Mohammed J.},
PAGES = {2674--2676},
ADDRESS = {Maui, HI, USA},
}

Endnote

%0 Conference Proceedings
%A Wang, Yafang
%A Dylla, Maximilian
%A Ren, Zhaouchun
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T PRAVDA-live: Interactive Knowledge Harvesting : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-58BC-8
%F EDOC: 647484
%R 10.1145/2396761.2398722
%U http://doi.acm.org/10.1145/2396761.2398722
%F OTHER: Local-ID: C1256DBF005F876D-8694BA7042FBF01DC1257AD80050E30E-WDR*12
%D 2012
%B 21st ACM International Conference on Information and Knowledge Management
%Z date of event: 2012-10-29 - 2012-11-02
%C Maui, HI, USA
%B CIKM'12
%E Chen, Xue-Wen; Lebanon, Guy; Wang, Haixun; Zaki, Mohammed J.
%P 2674 - 2676
%I ACM
%@ 978-1-4503-1156-4

Conference paper

D5IMPR-CS

Y. Wang, M. Dylla, M. Spaniol, and G. Weikum

“Coupling Label Propagation and Constraints for Temporal Fact Extraction,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju, Republic of Korea, 2012, vol. 2.

mehr

BibTeX

@inproceedings{WDSW12,
TITLE = {Coupling Label Propagation and Constraints for Temporal Fact Extraction},
AUTHOR = {Wang, Yafang and Dylla, Maximilian and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-25-1},
URL = {http://www.aclweb.org/anthology-new/P/P12/P12-2046.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-FE16166213EB336CC1257AD8004FBA59-WDSW12},
PUBLISHER = {ACL},
YEAR = {2012},
DATE = {2012},
BOOKTITLE = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012)},
VOLUME = {2},
PAGES = {233--237},
ADDRESS = {Jeju, Republic of Korea},
}

Endnote

%0 Conference Proceedings
%A Wang, Yafang
%A Dylla, Maximilian
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Coupling Label Propagation and Constraints for Temporal Fact Extraction : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5F9A-D
%F EDOC: 647482
%U http://www.aclweb.org/anthology-new/P/P12/P12-2046.pdf
%F OTHER: Local-ID: C1256DBF005F876D-FE16166213EB336CC1257AD8004FBA59-WDSW12
%D 2012
%B 50th Annual Meeting of the Association for Computational Linguistics
%Z date of event: 2012-07-08 - 2012-07-14
%C Jeju, Republic of Korea
%B Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics
%V 2
%P 233 - 237
%I ACL
%@ 978-1-937284-25-1

Conference paper

G. Weikum

“Semantic Search: from Names and Phrases to Entities and Relations,” in Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources (VLDS 2012), Istanbul, Turkey, 2012.

mehr

BibTeX

@inproceedings{WeikumVLDS2012,
TITLE = {Semantic Search: from Names and Phrases to Entities and Relations},
AUTHOR = {Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://ceur-ws.org/Vol-884/VLDS2012_p03_invited_Weikum.pdf},
PUBLISHER = {CEUR- WS.org},
YEAR = {2012},
BOOKTITLE = {Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources (VLDS 2012)},
EDITOR = {Brambilla, Marco and Ceri, Stefano and Furche, Tim and Gottlob, Georg},
PAGES = {3--3},
SERIES = {CEUR Workshop Proceedings},
VOLUME = {884},
ADDRESS = {Istanbul, Turkey},
}

Endnote

%0 Conference Proceedings
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Semantic Search: from Names and Phrases to Entities and Relations : Invited speech
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0019-B238-8
%U http://ceur-ws.org/Vol-884/VLDS2012_p03_invited_Weikum.pdf
%D 2012
%B Second International Workshop on Searching and Integrating New Web Data Sources
%Z date of event: 2012-08-31 - 2012-08-31
%C Istanbul, Turkey
%B Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources
%E Brambilla, Marco; Ceri, Stefano; Furche, Tim; Gottlob, Georg
%P 3 - 3
%I CEUR- WS.org
%B CEUR Workshop Proceedings
%N 884

Article

D5IMPR-CS

G. Weikum, J. Hoffart, N. Nakashole, M. Spaniol, F. Suchanek, and M. A. Yosef

“Big Data Methods for Computational Linguistics,” Bulletin of the Technical Committee on Data Engineering, vol. 35, no. 3, 2012.

mehr

BibTeX

@article{bigdataieee2013,
TITLE = {Big Data Methods for Computational Linguistics},
AUTHOR = {Weikum, Gerhard and Hoffart, Johannes and Nakashole, Ndapandula and Spaniol, Marc and Suchanek, Fabian and Yosef, Mohamed Amir},
LANGUAGE = {eng},
LOCALID = {Local-ID:C1257ACD0050F94E-0BB003A44F32793EC1257B1600612E6E-bigdata@ieee2013; Local-ID: F18D22FA4E810E52C1257AD8005050D3-WHN*12},
PUBLISHER = {IEEE Computer Society},
ADDRESS = {Washington, DC},
YEAR = {2012},
DATE = {2012},
JOURNAL = {Bulletin of the Technical Committee on Data Engineering},
VOLUME = {35},
NUMBER = {3},
PAGES = {46--55},
}

Endnote

%0 Journal Article
%A Weikum, Gerhard
%A Hoffart, Johannes
%A Nakashole, Ndapandula
%A Spaniol, Marc
%A Suchanek, Fabian
%A Yosef, Mohamed Amir
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Ontologies, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
%T Big Data Methods for Computational Linguistics : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5446-0
%F OTHER: Local-ID:C1257ACD0050F94E-0BB003A44F32793EC1257B1600612E6E-bigdata@ieee2013
%F OTHER: Local-ID: F18D22FA4E810E52C1257AD8005050D3-WHN*12
%7 2012
%D 2012
%J Bulletin of the Technical Committee on Data Engineering 
%V 35
%N 3
%& 46
%P 46 - 55
%I IEEE Computer Society
%C Washington, DC
%U http://sites.computer.org/debull/A12sept/linguist.pdf

Thesis

P. Yadava

“Boolean Matrix Factorization with Missing Values,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

Abstract

Is it possible to meaningfully analyze the structure of a Boolean matrix for

which 99% data is missing?

Real-life data sets usually contain a high percentage of missing values which

hamper structure estimation from the data and the difficulty only increases

when the missing values dominate the known elements in the data set. There are

good real-valued factorization methods for such scenarios, but there exist

another class of data "Boolean data", which demand a different handling

strategy than their real-valued counterpart.

There are many application which find logical representation only via Boolean

matrices, where real-valued factorization methods do not provide correct and

intuitive solutions.

Currently, there exists no method which can factorize a Boolean matrix

containing a percentage of missing values usually associated with non-trivial

real-world data set. In this thesis, we introduce a method to fill this gap.

Our method is based on the correlation among the data records and is not

restricted by the percentage of unknowns in the matrix. It performs greedy

selection of the basis vectors, which represent the underlying

structure in the data.

This thesis also presents several experiments on a variety of synthetic and

real-world data, and discusses the performance of the algorithm for a range of

data properties.

However, it was not easy to obtain comparison statistics with existing methods,

for the reason that none exist. Hence we present indirect comparisons with

existing matrix completion methods which work with real-valued data sets.

BibTeX

@mastersthesis{Yadava2012,
TITLE = {Boolean Matrix Factorization with Missing Values},
AUTHOR = {Yadava, Prashant},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-394341F10E7CB40AC1257AAD00334BB1-Yadava2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {Is it possible to meaningfully analyze the structure of a Boolean matrix for which 99% data is missing? Real-life data sets usually contain a high percentage of missing values which hamper structure estimation from the data and the difficulty only increases when the missing values dominate the known elements in the data set. There are good real-valued factorization methods for such scenarios, but there exist another class of data "Boolean data", which demand a different handling strategy than their real-valued counterpart. There are many application which find logical representation only via Boolean matrices, where real-valued factorization methods do not provide correct and intuitive solutions. Currently, there exists no method which can factorize a Boolean matrix containing a percentage of missing values usually associated with non-trivial real-world data set. In this thesis, we introduce a method to fill this gap. Our method is based on the correlation among the data records and is not restricted by the percentage of unknowns in the matrix. It performs greedy selection of the basis vectors, which represent the underlying structure in the data. This thesis also presents several experiments on a variety of synthetic and real-world data, and discusses the performance of the algorithm for a range of data properties. However, it was not easy to obtain comparison statistics with existing methods, for the reason that none exist. Hence we present indirect comparisons with existing matrix completion methods which work with real-valued data sets.},
}

Endnote

%0 Thesis
%A Yadava, Prashant
%Y Weikum, Gerhard
%Y Miettinen, Pauli
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Boolean Matrix Factorization with Missing Values : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-627A-3
%F EDOC: 647476
%F OTHER: Local-ID: C1256DBF005F876D-394341F10E7CB40AC1257AAD00334BB1-Yadava2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V master
%9 master
%X Is it possible to meaningfully analyze the structure of a Boolean matrix for 
which 99% data is missing?

Real-life data sets usually contain a high percentage of missing values which 
hamper structure estimation from the data and the difficulty only increases 
when the missing values dominate the known elements in the data set. There are 
good real-valued factorization methods for such scenarios, but there exist 
another class of data "Boolean data", which demand a different handling 
strategy than their real-valued counterpart.
There are many application which find logical representation only via Boolean 
matrices, where real-valued factorization methods do not provide correct and 
intuitive solutions.
Currently, there exists no method which can factorize a Boolean matrix 
containing a percentage of missing values usually associated with non-trivial 
real-world data set. In this thesis, we introduce a method to fill this gap. 
Our method is based on the correlation among the data records and is not 
restricted by the percentage of unknowns in the matrix. It performs greedy 
selection of the basis vectors, which represent the underlying
structure in the data. 
This thesis also presents several experiments on a variety of synthetic and 
real-world data, and discusses the performance of the algorithm for a range of 
data properties.
However, it was not easy to obtain comparison statistics with existing methods, 
for the reason that none exist. Hence we present indirect comparisons with 
existing matrix completion methods which work with real-valued data sets.

Conference paper

D5IMPR-CS

M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, and G. Weikum

“Natural Language Questions for the Web of Data,” in 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), Jeju, South Korea, 2012.

mehr

Abstract

The Linked Data initiative comprises structured databases in the Semantic-Web

data model RDF. Exploring this heterogeneous data by structured query languages

is tedious and error-prone even for skilled users. To ease the task, this paper

presents a methodology for translating natural language questions into

structured SPARQL queries over linked-data sources.

Our method is based on an integer linear program to solve several

disambiguation tasks jointly: the segmentation of questions into phrases; the

mapping of phrases to semantic entities, classes, and relations; and the

construction of SPARQL triple patterns. Our solution harnesses the rich type

system provided by knowledge bases to constrain our semantic-coherence

objective function. We present experiments on both the question translation and

the resulting query answering.

BibTeX

@inproceedings{YahyaBERTW12a,
TITLE = {Natural Language Questions for the Web of Data},
AUTHOR = {Yahya, Mohamed and Berberich, Klaus and Elbassuoni, Shady and Ramanath, Maya and Tresp, Volker and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-937284-43-5},
URL = {http://aclweb.org/anthology-new/D/D12/D12-1035.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-43793842E42D08A9C1257A3A0077D0E2-YahyaBERTW12a},
PUBLISHER = {The Association for Computational Linguistics},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases to constrain our semantic-coherence objective function. We present experiments on both the question translation and the resulting query answering.},
BOOKTITLE = {2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012)},
PAGES = {379--390},
EID = {D12-1035},
ADDRESS = {Jeju, South Korea},
}

Endnote

%0 Conference Proceedings
%A Yahya, Mohamed
%A Berberich, Klaus
%A Elbassuoni, Shady
%A Ramanath, Maya
%A Tresp, Volker
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Natural Language Questions for the Web of Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5914-5
%F EDOC: 647466
%U http://aclweb.org/anthology-new/D/D12/D12-1035.pdf
%F OTHER: Local-ID: C1256DBF005F876D-43793842E42D08A9C1257A3A0077D0E2-YahyaBERTW12a
%D 2012
%B Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
%Z date of event: 2012-07-12 - 2012-07-14
%C Jeju, South Korea
%X The Linked Data initiative comprises structured databases in the Semantic-Web 
data model RDF. Exploring this heterogeneous data by structured query languages 
is tedious and error-prone even for skilled users. To ease the task, this paper 
presents a methodology for translating natural language questions into 
structured SPARQL queries over linked-data sources.

Our method is based on an integer linear program to solve several 
disambiguation tasks jointly: the segmentation of questions into phrases; the 
mapping of phrases to semantic entities, classes, and relations; and the 
construction of SPARQL triple patterns. Our solution harnesses the rich type 
system provided by knowledge bases to constrain our semantic-coherence 
objective function. We present experiments on both the question translation and 
the resulting query answering.
%B 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
%P 379 - 390
%Z sequence number: D12-1035
%I The Association for Computational Linguistics
%@ 978-1-937284-43-5

Conference paper

D5IMPR-CS

M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, and G. Weikum

“Deep Answers for Naturally Asked Questions on the Web of Data,” in WWW’12, 21st Annual Conference on World Wide Web Companion, Lyon, France, 2012.

mehr

Abstract

We present DEANNA, a framework for natural language question answering over

structured knowledge bases. Given a natural language question, DEANNA

translates questions into a structured SPARQL query that can be evaluated over

knowledge bases such as Yago, Dbpedia, Freebase, or other Linked Data sources.

DEANNA analyzes questions and maps verbal phrases to relations and noun phrases

to either individual entities or semantic classes. Importantly, it judiciously

generates variables for target entities or classes to express joins between

multiple triple patterns.

We leverage the semantic type system for entities and use constraints in

jointly mapping the constituents of the question to relations, classes, and

entities. We demonstrate the capabilities and interface of DEANNA, which allows

advanced users to influence the translation process and to see how the

different components interact to produce the final result.

BibTeX

@inproceedings{YahyaBERTW12,
TITLE = {Deep Answers for Naturally Asked Questions on the Web of Data},
AUTHOR = {Yahya, Mohamed and Berberich, Klaus and Elbassuoni, Shady and Ramanath, Maya and Tresp, Volker and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-1230-1},
URL = {http://www2012.wwwconference.org/proceedings/companion/p445.pdf},
DOI = {10.1145/2187980.2188070},
LOCALID = {Local-ID: C1256DBF005F876D-5DDF0848518D57E8C12579EA00510E26-YahyaBERTW12},
PUBLISHER = {ACM},
YEAR = {2012},
DATE = {2012},
ABSTRACT = {We present DEANNA, a framework for natural language question answering over structured knowledge bases. Given a natural language question, DEANNA translates questions into a structured SPARQL query that can be evaluated over knowledge bases such as Yago, Dbpedia, Freebase, or other Linked Data sources. DEANNA analyzes questions and maps verbal phrases to relations and noun phrases to either individual entities or semantic classes. Importantly, it judiciously generates variables for target entities or classes to express joins between multiple triple patterns. We leverage the semantic type system for entities and use constraints in jointly mapping the constituents of the question to relations, classes, and entities. We demonstrate the capabilities and interface of DEANNA, which allows advanced users to influence the translation process and to see how the different components interact to produce the final result.},
BOOKTITLE = {WWW'12, 21st Annual Conference on World Wide Web Companion},
EDITOR = {Mille, Alain and Gandon, Fabien and Misselis, Jacques and Rabinovich, Michael and Staab, Steffen},
PAGES = {445--449},
ADDRESS = {Lyon, France},
}

Endnote

%0 Conference Proceedings
%A Yahya, Mohamed
%A Berberich, Klaus
%A Elbassuoni, Shady
%A Ramanath, Maya
%A Tresp, Volker
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Deep Answers for Naturally Asked Questions on the Web of Data : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-5F6F-F
%F EDOC: 647503
%R 10.1145/2187980.2188070
%U http://www2012.wwwconference.org/proceedings/companion/p445.pdf
%F OTHER: Local-ID: C1256DBF005F876D-5DDF0848518D57E8C12579EA00510E26-YahyaBERTW12
%D 2012
%B 21st Annual Conference on World Wide Web Companion
%Z date of event: 2012-04-16 - 2012-04-20
%C Lyon, France
%X We present DEANNA, a framework for natural language question answering over 
structured knowledge bases. Given a natural language question, DEANNA 
translates questions into a structured SPARQL query that can be evaluated over 
knowledge bases such as Yago, Dbpedia, Freebase, or other Linked Data sources. 
DEANNA analyzes questions and maps verbal phrases to relations and noun phrases 
to either individual entities or semantic classes. Importantly, it judiciously 
generates variables for target entities or classes to express joins between 
multiple triple patterns. 
We leverage the semantic type system for entities and use constraints in 
jointly mapping the constituents of the question to relations, classes, and 
entities. We demonstrate the capabilities and interface of DEANNA, which allows 
advanced users to influence the translation process and to see how the 
different components interact to produce the final result.
%B WWW'12
%E Mille, Alain; Gandon, Fabien; Misselis, Jacques; Rabinovich, Michael; Staab, Steffen
%P 445 - 449
%I ACM
%@ 978-1-4503-1230-1

Conference paper

D5IMPR-CS

M. A. Yosef, S. Bauer, J. Hoffart, M. Spaniol, and G. Weikum

“HYENA: Hierarchical Type Classification for Entity Names,” in Proceedings of the 24th Intl. Conference on Computational Linguistics (COLING 2012), Mumbai, India, 2012.

mehr

BibTeX

@inproceedings{YBH12,
TITLE = {{HYENA}: Hierarchical Type Classification for Entity Names},
AUTHOR = {Yosef, Mohamed Amir and Bauer, Sandro and Hoffart, Johannes and Spaniol, Marc and Weikum, Gerhard},
LANGUAGE = {eng},
URL = {http://aclweb.org/anthology//C/C12/C12-2133.pdf},
LOCALID = {Local-ID: C1256DBF005F876D-279253CD89D04D55C1257AD8005134CF-YBH*12},
PUBLISHER = {ACL},
YEAR = {2012},
BOOKTITLE = {Proceedings of the 24th Intl. Conference on Computational Linguistics (COLING 2012)},
PAGES = {1361--1370},
ADDRESS = {Mumbai, India},
}

Endnote

%0 Conference Proceedings
%A Yosef, Mohamed Amir
%A Bauer, Sandro
%A Hoffart, Johannes
%A Spaniol, Marc
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
International Max Planck Research School, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T HYENA: Hierarchical Type Classification for Entity Names : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-59DA-B
%F EDOC: 647521
%F OTHER: Local-ID: C1256DBF005F876D-279253CD89D04D55C1257AD8005134CF-YBH*12
%U http://aclweb.org/anthology//C/C12/C12-2133.pdf
%D 2012
%B 24th International Conference on Computational Linguistics
%Z date of event: 2012-12-08 - 2012-12-15
%C Mumbai, India
%B Proceedings of the 24th Intl. Conference on Computational Linguistics
%P 1361 - 1370
%I ACL

Thesis

B. Zeini Jahromi

“Design and Implementation of a Multidimensional Query Interface for Large-Scale Data stored in HBase,” Universität des Saarlandes, Saarbrücken, 2012.

mehr

BibTeX

@mastersthesis{Zeini2012,
TITLE = {Design and Implementation of a Multidimensional Query Interface for Large-Scale Data stored in {HBase}},
AUTHOR = {Zeini Jahromi, Behrang},
LANGUAGE = {eng},
LOCALID = {Local-ID: C1256DBF005F876D-679D3C1E13106DADC1257A5600361043-Zeini2012},
SCHOOL = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2012},
DATE = {2012},
TYPE = {Bachelor's thesis},
}

Endnote

%0 Thesis
%A Zeini Jahromi, Behrang
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Design and Implementation of a Multidimensional Query Interface for Large-Scale Data stored in HBase : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0014-AFFE-8
%F EDOC: 647470
%F OTHER: Local-ID: C1256DBF005F876D-679D3C1E13106DADC1257A5600361043-Zeini2012
%I Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2012
%V bachelor
%9 bachelor

2011

Conference paper

P. Adolphs, M. Theobald, U. Schäfer, H. Uszkoreit, and G. Weikum

“YAGO-QA: Answering Questions by Structured Knowledge Queries,” in Fifth IEEE International Conference on Semantic Computing (ICSC 2011), Stanford University, Palo Alto, CA, USA, 2011.

mehr

BibTeX

@inproceedings{ICSC2011,
TITLE = {{YAGO-QA}: Answering Questions by Structured Knowledge Queries},
AUTHOR = {Adolphs, Peter and Theobald, Martin and Sch{\"a}fer, Ulrich and Uszkoreit, Hans and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-0-7695-4492-2},
URL = {http://dx.doi.org/10.1109/ICSC.2011.30},
DOI = {10.1109/ICSC.2011.30},
LOCALID = {Local-ID: C1256DBF005F876D-A524BF4C689CB769C12578F0003F9E27-ICSC2011},
PUBLISHER = {IEEE},
YEAR = {2011},
DATE = {2011},
BOOKTITLE = {Fifth IEEE International Conference on Semantic Computing (ICSC 2011)},
PAGES = {158--161},
ADDRESS = {Stanford University, Palo Alto, CA, USA},
}

Endnote

%0 Conference Proceedings
%A Adolphs, Peter
%A Theobald, Martin
%A Sch&#228;fer, Ulrich
%A Uszkoreit, Hans
%A Weikum, Gerhard
%+ External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T YAGO-QA: Answering Questions by Structured Knowledge Queries : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0010-14DE-5
%F EDOC: 618977
%R 10.1109/ICSC.2011.30
%U http://dx.doi.org/10.1109/ICSC.2011.30
%F OTHER: Local-ID: C1256DBF005F876D-A524BF4C689CB769C12578F0003F9E27-ICSC2011
%D 2011
%B Fifth IEEE International Conference on Semantic Computing
%Z date of event: 2011-09-18 - 2011-09-21
%C Stanford University, Palo Alto, CA, USA
%B Fifth IEEE International Conference on Semantic Computing
%P 158 - 161
%I IEEE
%@ 978-0-7695-4492-2

Article

D5D4

D. Alexander, P. Arvola, T. Beckers, P. Bellot, T. Chappell, C. M. de Vries, A. Doucet, N. Fuhr, S. Geva, J. Kamps, G. Kazai, M. Koolen, S. Kutty, M. Landoni, V. Moriceau, R. Nayak, R. Nordlie, N. Pharo, E. SanJuan, R. Schenkel, A. Tagarelli, X. Tannier, J. A. Thom, A. Trotman, J. Vainio, Q. Wang, and C. Wu

“Report on INEX 2010,” SIGIR Forum, vol. 45, no. 1, 2011.

mehr

Abstract

INEX investigates focused retrieval from structured documents by providing

large test

collections of structured documents, uniform evaluation measures, and a forum

for organizations

to compare their results. This paper reports on the INEX 2010 evaluation

campaign,

which consisted of a wide range of tracks: Ad Hoc, Book, Data Centric,

Interactive, QA,

Link the Wiki, Relevance Feedback, Web Service Discovery and XML Mining.

BibTeX

@article{INEX_SIGIRForum2011,
TITLE = {Report on {INEX} 2010},
AUTHOR = {Alexander, D. and Arvola, Paavo and Beckers, Thomas and Bellot, P. and Chappell, Timothy and de Vries, Christopher M. and Doucet, Antoine and Fuhr, Norbert and Geva, Shlomo and Kamps, Jaap and Kazai, Gabriella and Koolen, Marjin and Kutty, Sangheeta and Landoni, Monica and Moriceau, Veronique and Nayak, Richi and Nordlie, Ragnar and Pharo, Nils and SanJuan, Eric and Schenkel, Ralf and Tagarelli, A. and Tannier, Xavier and Thom, James A. and Trotman, Andrew and Vainio, J. and Wang, Q. and Wu, C.},
LANGUAGE = {eng},
ISSN = {0163-5840},
URL = {http://doi.acm.org/10.1145/1988852.1988854},
DOI = {10.1145/1988852.1988854},
LOCALID = {Local-ID: C1256DBF005F876D-1E211EE89DC2E288C125788C0046BC15-INEX_SIGIRForum2011},
PUBLISHER = {ACM},
ADDRESS = {New York, NY},
YEAR = {2011},
DATE = {2011},
ABSTRACT = {INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2010 evaluation campaign, which consisted of a wide range of tracks: Ad Hoc, Book, Data Centric, Interactive, QA, Link the Wiki, Relevance Feedback, Web Service Discovery and XML Mining.},
JOURNAL = {SIGIR Forum},
VOLUME = {45},
NUMBER = {1},
PAGES = {2--17},
}

Endnote

%0 Journal Article
%A Alexander, D.
%A Arvola, Paavo
%A Beckers, Thomas
%A Bellot, P.
%A Chappell, Timothy
%A de Vries, Christopher M.
%A Doucet, Antoine
%A Fuhr, Norbert
%A Geva, Shlomo
%A Kamps, Jaap
%A Kazai, Gabriella
%A Koolen, Marjin
%A Kutty, Sangheeta
%A Landoni, Monica
%A Moriceau, Veronique
%A Nayak, Richi
%A Nordlie, Ragnar
%A Pharo, Nils
%A SanJuan, Eric
%A Schenkel, Ralf
%A Tagarelli, A.
%A Tannier, Xavier
%A Thom, James A.
%A Trotman, Andrew
%A Vainio, J.
%A Wang, Q.
%A Wu, C.
%+ External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Databases and Information Systems, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
External Organizations
Computer Graphics, MPI for Informatics, Max Planck Society
%T Report on INEX 2010 : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0010-14A9-C
%F EDOC: 618961
%R 10.1145/1988852.1988854
%U http://doi.acm.org/10.1145/1988852.1988854
%F OTHER: Local-ID: C1256DBF005F876D-1E211EE89DC2E288C125788C0046BC15-INEX_SIGIRForum2011
%7 2011
%D 2011
%* Review method: peer-reviewed
%X INEX investigates focused retrieval from structured documents by providing 
large test
collections of structured documents, uniform evaluation measures, and a forum 
for organizations
to compare their results. This paper reports on the INEX 2010 evaluation 
campaign,
which consisted of a wide range of tracks: Ad Hoc, Book, Data Centric, 
Interactive, QA,
Link the Wiki, Relevance Feedback, Web Service Discovery and XML Mining.
%J SIGIR Forum
%O ACM SIGIR Forum
%V 45
%N 1
%& 2
%P 2 - 17
%I ACM
%C New York, NY
%@ false

Conference paper

F. Alvanaki, S. Michel, K. Ramamritham, and G. Weikum

“EnBlogue - Emergent Topic Detection in Web 2.0 Streams,” in Proceedings of the 2011 International Conference on Management of Data (SIGMOD 2011), Athens, Greece, 2011.

mehr

BibTeX

@inproceedings{Alvanaki2011,
TITLE = {{EnBlogue} -- Emergent Topic Detection in Web 2.0 Streams},
AUTHOR = {Alvanaki, Foteini and Michel, Sebastian and Ramamritham, Krithi and Weikum, Gerhard},
LANGUAGE = {eng},
ISBN = {978-1-4503-0661-4},
URL = {http://doi.acm.org/10.1145/1989323.1989473},
DOI = {10.1145/1989323.1989473},
LOCALID = {Local-ID: C1256DBF005F876D-63070A8B70A42DDAC125784D004215BC-Alvanaki2011},
PUBLISHER = {ACM},
YEAR = {2011},
DATE = {2011},
BOOKTITLE = {Proceedings of the 2011 International Conference on Management of Data (SIGMOD 2011)},
PAGES = {1271--1274},
ADDRESS = {Athens, Greece},
}

Endnote

%0 Conference Proceedings
%A Alvanaki, Foteini
%A Michel, Sebastian
%A Ramamritham, Krithi
%A Weikum, Gerhard
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T EnBlogue - Emergent Topic Detection in Web 2.0 Streams : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0010-144F-9
%F EDOC: 618948
%R 10.1145/1989323.1989473
%U http://doi.acm.org/10.1145/1989323.1989473
%F OTHER: Local-ID: C1256DBF005F876D-63070A8B70A42DDAC125784D004215BC-Alvanaki2011
%D 2011
%B 2011 International Conference on Management of Data
%Z date of event: 2011-06-12 - 2011-06-16
%C Athens, Greece
%B Proceedings of the 2011 International Conference on Management of Data
%P 1271 - 1274
%I ACM
%@ 978-1-4503-0661-4

Report

A. Anand, S. Bedathur, K. Berberich, and R. Schenkel

“Temporal Index Sharding for Space-time Efficiency in Archive Search,” Universität des Saarlandes, Saarbrücken, MPI-I-2011-5-001, 2011.

mehr

Abstract

Time-travel queries that couple temporal constraints with keyword

queries are useful in searching large-scale archives of time-evolving

content such as the Web, document collections, wikis, and so

on. Typical approaches for efficient evaluation of these queries

involve \emph{slicing} along the time-axis either the entire

collection~\cite{253349}, or individual index

lists~\cite{kberberi:sigir2007}. Both these methods are not

satisfactory since they sacrifice compactness of index for processing

efficiency making them either too big or, otherwise, too slow.

We present a novel index organization scheme that \emph{shards} the

index with \emph{zero increase in index size}, still minimizing the

cost of reading index index entries during query processing. Based on

the optimal sharding thus obtained, we develop practically efficient

sharding that takes into account the different costs of random and

sequential accesses. Our algorithm merges shards from the optimal

solution carefully to allow for few extra sequential accesses while

gaining significantly by reducing the random accesses. Finally, we

empirically establish the effectiveness of our novel sharding scheme

via detailed experiments over the edit history of the English version

of Wikipedia between 2001-2005 ($\approx$ 700 GB) and an archive of

the UK governmental web sites ($\approx$ 400 GB). Our results

demonstrate the feasibility of faster time-travel query processing

with no space overhead.

BibTeX

@techreport{Bedathur2011,
TITLE = {Temporal Index Sharding for Space-time Efficiency in Archive Search},
AUTHOR = {Anand, Avishek and Bedathur, Srikanta and Berberich, Klaus and Schenkel, Ralf},
LANGUAGE = {eng},
ISSN = {0946-011X},
NUMBER = {MPI-I-2011-5-001},
INSTITUTION = {Universit{\"a}t des Saarlandes},
ADDRESS = {Saarbr{\"u}cken},
YEAR = {2011},
DATE = {2011},
ABSTRACT = {Time-travel queries that couple temporal constraints with keyword queries are useful in searching large-scale archives of time-evolving content such as the Web, document collections, wikis, and so on. Typical approaches for efficient evaluation of these queries involve \emph{slicing} along the time-axis either the entire collection~\cite{253349}, or individual index lists~\cite{kberberi:sigir2007}. Both these methods are not satisfactory since they sacrifice compactness of index for processing efficiency making them either too big or, otherwise, too slow. We present a novel index organization scheme that \emph{shards} the index with \emph{zero increase in index size}, still minimizing the cost of reading index index entries during query processing. Based on the optimal sharding thus obtained, we develop practically efficient sharding that takes into account the different costs of random and sequential accesses. Our algorithm merges shards from the optimal solution carefully to allow for few extra sequential accesses while gaining significantly by reducing the random accesses. Finally, we empirically establish the effectiveness of our novel sharding scheme via detailed experiments over the edit history of the English version of Wikipedia between 2001-2005 ($\approx$ 700 GB) and an archive of the UK governmental web sites ($\approx$ 400 GB). Our results demonstrate the feasibility of faster time-travel query processing with no space overhead.},
TYPE = {Research Report},
}

Endnote

%0 Report
%A Anand, Avishek
%A Bedathur, Srikanta
%A Berberich, Klaus
%A Schenkel, Ralf
%+ Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
Databases and Information Systems, MPI for Informatics, Max Planck Society
%T Temporal Index Sharding for Space-time Efficiency in Archive Search : 
%G eng
%U http://hdl.handle.net/11858/00-001M-0000-0025-7311-D
%Y Universit&#228;t des Saarlandes
%C Saarbr&#252;cken
%D 2011
%X Time-travel queries that couple temporal constraints with keyword
queries are useful in searching large-scale archives of time-evolving
content such as the Web, document collections, wikis, and so
on. Typical approaches for efficient evaluation of these queries
involve \emph{slicing} along the time-axis either the entire
collection~\cite{253349}, or individual index
lists~\cite{kberberi:sigir2007}. Both these methods are not
satisfactory since they sacrifice compactness of index for processing
efficiency making them either too big or, otherwise, too slow.

We present a novel index organization scheme that \emph{shards} the
index with \emph{zero increase in index size}, still minimizing the
cost of reading index index entries during query processing. Based on
the optimal sharding thus obtained, we develop practically efficient
sharding that takes into account the different costs of random and
sequential accesses. Our algorithm merges shards from the optimal
solution carefully to allow for few extra sequential accesses while
gaining significantly by reducing the random accesses. Finally, we
empirically establish the effectiveness of our novel sharding scheme
via detailed experiments over the edit history of the English version
of Wikipedia between 2001-2005 ($\approx$ 700 GB) and an archive of
the UK governmental web sites ($\approx$ 400 GB). Our results
demonstrate the feasibility of faster time-travel query processing
with no space overhead.
%B Research Report
%@ false

Conference paper

A. Anand, S. Bedathur, K. Berberich, and R. Schenkel

“Temporal Index Sharding for Space-time Efficiency in Archive Search,” in SIGIR’11, 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 2011.