# Publications

2019
[1]
M. Abouhamra, “AligNarr: Aligning Narratives of Different Length for Movie Summarization,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
Export
BibTeX
@mastersthesis{AbouhamraMSc2019, TITLE = {{AligNarr}: Aligning Narratives of Different Length for Movie Summarization}, AUTHOR = {Abouhamra, Mostafa}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.}, }
Endnote
%0 Thesis %A Abouhamra, Mostafa %Y Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T AligNarr: Aligning Narratives of Different Length for Movie Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0004-5836-D %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2019 %P 54 p. %V master %9 master %X Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
[2]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,” in The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), Minneapolis, MN, USA, 2019.
Export
BibTeX
@inproceedings{abujabal19comqa, TITLE = {{ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-950737-13-0}, URL = {https://www.aclweb.org/anthology/N19-1027}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019)}, EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar}, PAGES = {307--317}, ADDRESS = {Minneapolis, MN, USA}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters : %G eng %U http://hdl.handle.net/21.11116/0000-0003-11A7-D %U https://www.aclweb.org/anthology/N19-1027 %D 2019 %B Annual Conference of the North American Chapter of the Association for Computational Linguistics %Z date of event: 2019-06-02 - 2019-06-07 %C Minneapolis, MN, USA %B The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies %E Burstein, Jill; Doran, Christy; Solorio, Thamar %P 307 - 317 %I ACL %@ 978-1-950737-13-0 %U https://www.aclweb.org/anthology/N19-1027
[3]
A. Abujabal, “Question Answering over Knowledge Bases with Continuous Learning,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Export
BibTeX
@phdthesis{Abujabalphd2013, TITLE = {Question Answering over Knowledge Bases with Continuous Learning}, AUTHOR = {Abujabal, Abdalghani}, LANGUAGE = {eng}, DOI = {10.22028/D291-27968}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.}, }
Endnote
[4]
M. Alikhani, S. Nag Chowdhury, G. de Melo, and M. Stone, “CITE: A Corpus Of Text-Image Discourse Relations,” in Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), Minneapolis, MN, USA, 2019.
Export
BibTeX
@inproceedings{AlikhaniEtAl2019CITETextImageDiscourse, TITLE = {{CITE}: {A} Corpus Of Text-Image Discourse Relations}, AUTHOR = {Alikhani, Malihe and Nag Chowdhury, Sreyasi and de Melo, Gerard and Stone, Matthew}, LANGUAGE = {eng}, ISBN = {978-1-950737-13-0}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019)}, EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar}, PAGES = {570--575}, ADDRESS = {Minneapolis, MN, USA}, }
Endnote
%0 Conference Proceedings %A Alikhani, Malihe %A Nag Chowdhury, Sreyasi %A de Melo, Gerard %A Stone, Matthew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CITE: A Corpus Of Text-Image Discourse Relations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-78D8-3 %D 2019 %B Annual Conference of the North American Chapter of the Association for Computational Linguistics %Z date of event: 2019-06-02 - 2019-06-07 %C Minneapolis, MN, USA %B Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics %E Burstein, Jill; Doran, Christy; Solorio, Thamar %P 570 - 575 %I ACL %@ 978-1-950737-13-0 %U https://aclweb.org/anthology/papers/N/N19/N19-1056/
[5]
S. Arora and A. Yates, “Investigating Retrieval Method Selection with Axiomatic Features,” in Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (ECIR 2019) (AMIR 2019), Cologne, Germany, 2019.
Export
BibTeX
@inproceedings{Arora_AMIR2019, TITLE = {Investigating Retrieval Method Selection with Axiomatic Features}, AUTHOR = {Arora, Siddhant and Yates, Andrew}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-2360-3}, PUBLISHER = {CEUR-WS}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (ECIR 2019) (AMIR 2019)}, EDITOR = {Beel, Joeran and Kolthoff, Lars}, PAGES = {18--31}, EID = {4}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2360}, ADDRESS = {Cologne, Germany}, }
Endnote
%0 Conference Proceedings %A Arora, Siddhant %A Yates, Andrew %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Investigating Retrieval Method Selection with Axiomatic Features : %G eng %U http://hdl.handle.net/21.11116/0000-0004-028E-A %D 2019 %B The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval %Z date of event: 2019-04-14 - 2019-04-14 %C Cologne, Germany %B Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (ECIR 2019) %E Beel, Joeran; Kolthoff, Lars %P 18 - 31 %Z sequence number: 4 %I CEUR-WS %B CEUR Workshop Proceedings %N 2360 %@ false %U http://ceur-ws.org/Vol-2360/paper4Axiomatic.pdf
[6]
S. Arora and A. Yates, “Investigating Retrieval Method Selection with Axiomatic Features,” 2019. [Online]. Available: http://arxiv.org/abs/1904.05737. (arXiv: 1904.05737)
Abstract
We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior.
Export
BibTeX
@online{Arora_arXiv1904.05737, TITLE = {Investigating Retrieval Method Selection with Axiomatic Features}, AUTHOR = {Arora, Siddhant and Yates, Andrew}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.05737}, EPRINT = {1904.05737}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior.}, }
Endnote
%0 Report %A Arora, Siddhant %A Yates, Andrew %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Investigating Retrieval Method Selection with Axiomatic Features : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02BF-3 %U http://arxiv.org/abs/1904.05737 %D 2019 %X We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior. %K Computer Science, Information Retrieval, cs.IR
[7]
A. Chakraborty, N. Mota, A. J. Biega, K. P. Gummadi, and H. Heidari, “On the Impact of Choice Architectures on Inequality in Online Donation Platforms,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Chakraborty_WWW2019, TITLE = {On the Impact of Choice Architectures on Inequality in Online Donation Platforms}, AUTHOR = {Chakraborty, Abhijnan and Mota, Nuno and Biega, Asia J. and Gummadi, Krishna P. and Heidari, Hoda}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313663}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, PAGES = {2623--2629}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Chakraborty, Abhijnan %A Mota, Nuno %A Biega, Asia J. %A Gummadi, Krishna P. %A Heidari, Hoda %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T On the Impact of Choice Architectures on Inequality in Online Donation Platforms : %G eng %U http://hdl.handle.net/21.11116/0000-0002-FC88-9 %R 10.1145/3308558.3313663 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %P 2623 - 2629 %I ACM %@ 978-1-4503-6674-8
[8]
F. Chierichetti, R. Kumar, A. Panconesi, and E. Terolli, “On the Distortion of Locality Sensitive Hashing,” SIAM Journal on Computing, vol. 48, no. 2, 2019.
Export
BibTeX
@article{Chierichetti2019, TITLE = {On the Distortion of Locality Sensitive Hashing}, AUTHOR = {Chierichetti, Flavio and Kumar, Ravi and Panconesi, Alessandro and Terolli, Erisa}, LANGUAGE = {eng}, ISSN = {0097-5397}, DOI = {10.1137/17M1127752}, PUBLISHER = {SIAM}, ADDRESS = {Philadelphia, PA}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {SIAM Journal on Computing}, VOLUME = {48}, NUMBER = {2}, PAGES = {350--372}, }
Endnote
%0 Journal Article %A Chierichetti, Flavio %A Kumar, Ravi %A Panconesi, Alessandro %A Terolli, Erisa %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T On the Distortion of Locality Sensitive Hashing : %G eng %U http://hdl.handle.net/21.11116/0000-0003-A7E7-C %R 10.1137/17M1127752 %7 2019 %D 2019 %J SIAM Journal on Computing %V 48 %N 2 %& 350 %P 350 - 372 %I SIAM %C Philadelphia, PA %@ false
[9]
C. X. Chu, S. Razniewski, and G. Weikum, “TiFi: Taxonomy Induction for Fictional Domains,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Chu_WWW2019, TITLE = {{TiFi}: {T}axonomy Induction for Fictional Domains}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313519}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {2673--2679}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TiFi: Taxonomy Induction for Fictional Domains : %G eng %U http://hdl.handle.net/21.11116/0000-0003-6558-9 %R 10.1145/3308558.3313519 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 2673 - 2679 %I ACM %@ 978-1-4503-6674-8
[10]
C. X. Chu, S. Razniewski, and G. Weikum, “TiFi: Taxonomy Induction for Fictional Domains [Extended version],” 2019. [Online]. Available: http://arxiv.org/abs/1901.10263. (arXiv: 1901.10263)
Abstract
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.
Export
BibTeX
@online{Chu_arXIv1901.10263, TITLE = {{TiFi}: Taxonomy Induction for Fictional Domains [Extended version]}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1901.10263}, EPRINT = {1901.10263}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.}, }
Endnote
%0 Report %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TiFi: Taxonomy Induction for Fictional Domains [Extended version] : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FE67-C %U http://arxiv.org/abs/1901.10263 %D 2019 %X Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR
[11]
I. Dikeoulias, J. Strötgen, and S. Razniewski, “Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties,” in Companion of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Dikeoulias_WWW2019, TITLE = {Epitaph or Breaking News? {A}nalyzing and Predicting the Stability of Knowledge Base Properties}, AUTHOR = {Dikeoulias, Ioannis and Str{\"o}tgen, Jannik and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-1-4503-6675-5}, DOI = {10.1145/3308560.3314998}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {1155--1158}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Dikeoulias, Ioannis %A Str&#246;tgen, Jannik %A Razniewski, Simon %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties : %G eng %U http://hdl.handle.net/21.11116/0000-0004-0281-7 %R 10.1145/3308560.3314998 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Companion of The World Wide Web Conference %E McAuley, Julian %P 1155 - 1158 %I ACM %@ 978-1-4503-6675-5
[12]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text ,” in WSDM’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Gad-Elrab_WSDM2019, TITLE = {{ExFaKT}: {A} Framework for Explaining Facts over Knowledge Graphs and Text}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290996}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM'19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {87--95}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text&#160; : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9C44-2 %R 10.1145/3289600.3290996 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM'19 %P 87 - 95 %I ACM %@ 978-1-4503-5940-5
[13]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “Tracy: Tracing Facts over Knowledge Graphs and Text,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Gad-Elrab_WWW2019, TITLE = {Tracy: {T}racing Facts over Knowledge Graphs and Text}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3314126}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {3516--3520}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Tracy: Tracing Facts over Knowledge Graphs and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-08AA-5 %R 10.1145/3308558.3314126 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 3516 - 3520 %I ACM %@ 978-1-4503-6674-8
[14]
A. Ghazimatin, R. Saha Roy, and G. Weikum, “FAIRY: A Framework for Understanding Relationships between Users’ Actions and their Social Feeds,” in WSDM’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Ghazimatin_WSDM2019, TITLE = {{FAIRY}: {A} Framework for Understanding Relationships between Users' Actions and their Social Feeds}, AUTHOR = {Ghazimatin, Azin and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290990}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM'19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {240--248}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Ghazimatin, Azin %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T FAIRY: A Framework for Understanding Relationships between Users' Actions and their Social Feeds : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9BD9-B %R 10.1145/3289600.3290990 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM'19 %P 240 - 248 %I ACM %@ 978-1-4503-5940-5
[15]
A. Guimarães, O. Balalau, E. Terolli, and G. Weikum, “Analyzing the Traits and Anomalies of Political Discussions on Reddit,” in Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019), Munich, Germany, 2019.
Export
BibTeX
@inproceedings{Guimaraes_ICWSM2019, TITLE = {Analyzing the Traits and Anomalies of Political Discussions on {R}eddit}, AUTHOR = {Guimar{\~a}es, Anna and Balalau, Oana and Terolli, Erisa and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {2334-0770}, PUBLISHER = {AAAI}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019)}, PAGES = {205--213}, ADDRESS = {Munich, Germany}, }
Endnote
%0 Conference Proceedings %A Guimar&#227;es, Anna %A Balalau, Oana %A Terolli, Erisa %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Analyzing the Traits and Anomalies of Political Discussions on Reddit : %G eng %U http://hdl.handle.net/21.11116/0000-0003-3649-F %D 2019 %B 13th International Conference on Web and Social Media %Z date of event: 2019-06-11 - 2019-06-14 %C Munich, Germany %B Proceedings of the Thirteenth International Conference on Web and Social Media %P 205 - 213 %I AAAI %@ false
[16]
D. Gupta and K. Berberich, “Structured Search in Annotated Document Collections,” in WSDM’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Gupta_WSDM2019Demo, TITLE = {Structured Search in Annotated Document Collections}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290618}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM'19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {794--797}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Structured Search in Annotated Document Collections : Demo paper %G eng %U http://hdl.handle.net/21.11116/0000-0002-A8D6-F %R 10.1145/3289600.3290618 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM'19 %P 794 - 797 %I ACM %@ 978-1-4503-5940-5
[17]
D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti, “Generating Semantic Aspects for Queries,” in The Semantic Web (ESWC 2019), Portorož, Slovenia, 2019.
Export
BibTeX
@inproceedings{GuptaESWC2019, TITLE = {Generating Semantic Aspects for Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-3-030-21347-3}, DOI = {10.1007/978-3-030-21348-0_11}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {The Semantic Web (ESWC 2019)}, EDITOR = {Hitzler, Pascal and Fern{\'a}ndez, Miriam and Janowicz, Krzysztof and Zaveri, Amrapali and Gray, Alasdair J. G. and Lopez, Vanessa and Haller, Armin and Hammar, Karl}, PAGES = {162--178}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11503}, ADDRESS = {Portoro{\v z}, Slovenia}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %A Str&#246;tgen, Jannik %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Generating Semantic Aspects for Queries : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FF5F-5 %R 10.1007/978-3-030-21348-0_11 %D 2019 %B 16th Extended Semantic Web Conference %Z date of event: 2019-06-02 - 2019-06-06 %C Portoro&#382;, Slovenia %B The Semantic Web %E Hitzler, Pascal; Fern&#225;ndez, Miriam; Janowicz, Krzysztof; Zaveri, Amrapali; Gray, Alasdair J. G.; Lopez, Vanessa; Haller, Armin; Hammar, Karl %P 162 - 178 %I Springer %@ 978-3-030-21347-3 %B Lecture Notes in Computer Science %N 11503
[18]
M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo, “Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” 2019. [Online]. Available: http://arxiv.org/abs/1904.01451. (arXiv: 1904.01451)
Abstract
Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.
Export
BibTeX
@online{Hedderich_arXiv1904.01451, TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries}, AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.01451}, EPRINT = {1904.01451}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.}, }
Endnote
%0 Report %A Hedderich, Michael A. %A Yates, Andrew %A Klakow, Dietrich %A de Melo, Gerard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02B4-E %U http://arxiv.org/abs/1904.01451 %D 2019 %X Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well. %K Computer Science, Computation and Language, cs.CL,Computer Science, Learning, cs.LG
[19]
M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo, “Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” in Proceedings of the 13th International Conference on Computational Semantics - Long Papers (IWCS 2019), Gothenburg, Sweden, 2019.
Export
BibTeX
@inproceedings{Hedderich_IWCS2019, TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries}, AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-950737-19-2}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 13th International Conference on Computational Semantics -- Long Papers (IWCS 2019)}, EDITOR = {Dobnik, Simon and Chatzikyriakidis, Stergios and Demberg, Vera}, PAGES = {247--258}, ADDRESS = {Gothenburg, Sweden}, }
Endnote
%0 Conference Proceedings %A Hedderich, Michael A. %A Yates, Andrew %A Klakow, Dietrich %A de Melo, Gerard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02A4-0 %D 2019 %B 13th International Conference on Computational Semantics %Z date of event: 2019-05-23 - 2019-05-27 %C Gothenburg, Sweden %B Proceedings of the 13th International Conference on Computational Semantics - Long Papers %E Dobnik, Simon; Chatzikyriakidis, Stergios; Demberg, Vera %P 247 - 258 %I ACL %@ 978-1-950737-19-2 %U https://www.aclweb.org/anthology/W19-0421
[20]
Y. Ibrahim, M. Riedewald, G. Weikum, and D. Zeinalipour-Yazti, “Bridging Quantities in Tables and Text,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.
Export
BibTeX
@inproceedings{Ibrahim_ICDE2019, TITLE = {Bridging Quantities in Tables and Text}, AUTHOR = {Ibrahim, Yusra and Riedewald, Mirek and Weikum, Gerhard and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-7474-1}, DOI = {10.1109/ICDE.2019.00094}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering}, PAGES = {1010--1021}, ADDRESS = {Macau, China}, }
Endnote
%0 Conference Proceedings %A Ibrahim, Yusra %A Riedewald, Mirek %A Weikum, Gerhard %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Bridging Quantities in Tables and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-01AB-B %R 10.1109/ICDE.2019.00094 %D 2019 %B 35th IEEE International Conference on Data Engineering %Z date of event: 2019-04-08 - 2019-04-12 %C Macau, China %B ICDE 2019 %P 1010 - 1021 %I IEEE %@ 978-1-5386-7474-1
[21]
Y. Ibrahim and G. Weikum, “ExQuisiTe: Explaining Quantities in Text,” in Proceedings of the World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Ibrahim_WWW2019, TITLE = {{ExQuisiTe}: {E}xplaining Quantities in Text}, AUTHOR = {Ibrahim, Yusra and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3314134}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {3541--3544}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ibrahim, Yusra %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ExQuisiTe: Explaining Quantities in Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-01B3-1 %R 10.1145/3308558.3314134 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of the World Wide Web Conference %E McAuley, Julian %P 3541 - 3544 %I ACM %@ 978-1-4503-6674-8
[22]
Y. Ibrahim, “Understanding Quantities in Web Tables and Text,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: - We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. - We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. - We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. - We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.
Export
BibTeX
@phdthesis{yusraphd2019, TITLE = {Understanding Quantities in Web Tables and Text}, AUTHOR = {Ibrahim, Yusra}, LANGUAGE = {eng}, DOI = {10.22028/D291-29657}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: -- We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. -- We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. -- We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. -- We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.}, }
Endnote
%0 Thesis %A Ibrahim, Yusra %Y Weikum, Gerhard %A referee: Riedewald, Mirek %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Understanding Quantities in Web Tables and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0005-4384-A %R 10.22028/D291-29657 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2019 %P 116 p. %V phd %9 phd %X There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: - We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. - We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. - We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. - We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28300
[23]
D. Kaltenpoth and J. Vreeken, “We Are Not Your Real Parents: Telling Causal from Confounded by MDL,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.
Export
BibTeX
@inproceedings{Kaltenpoth_SDM2019, TITLE = {We Are Not Your Real Parents: {T}elling Causal from Confounded by {MDL}}, AUTHOR = {Kaltenpoth, David and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-567-3}, DOI = {10.1137/1.9781611975673.23}, PUBLISHER = {SIAM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)}, EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh}, PAGES = {199--207}, ADDRESS = {Calgary, Canada}, }
Endnote
%0 Conference Proceedings %A Kaltenpoth, David %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T We Are Not Your Real Parents: Telling Causal from Confounded by MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D37-2 %R 10.1137/1.9781611975673.23 %D 2019 %B SIAM International Conference on Data Mining %Z date of event: 2019-05-02 - 2019-05-04 %C Calgary, Canada %B Proceedings of the 2019 SIAM International Conference on Data Mining %E Berger-Wolf, Tanya; Chawla, Nitesh %P 199 - 207 %I SIAM %@ 978-1-61197-567-3
[24]
D. Kaltenpoth and J. Vreeken, “We Are Not Your Real Parents: Telling Causal from Confounded using MDL,” 2019. [Online]. Available: http://arxiv.org/abs/1901.06950. (arXiv: 1901.06950)
Abstract
Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence.
Export
BibTeX
@online{Kaltenpoth_arXiv1901.06950, TITLE = {We Are Not Your Real Parents: Telling Causal from Confounded using {MDL}}, AUTHOR = {Kaltenpoth, David and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1901.06950}, EPRINT = {1901.06950}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence.}, }
Endnote
%0 Report %A Kaltenpoth, David %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T We Are Not Your Real Parents: Telling Causal from Confounded using MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FFEE-3 %U http://arxiv.org/abs/1901.06950 %D 2019 %X Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence. %K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
[25]
S. Karaev and P. Miettinen, “Algorithms for Approximate Subtropical Matrix Factorization,” Data Mining and Knowledge Discovery, vol. 33, no. 2, 2019.
Export
BibTeX
@article{Karaev_DMKD2018, TITLE = {Algorithms for Approximate Subtropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1007/s10618-018-0599-1}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Data Mining and Knowledge Discovery}, VOLUME = {33}, NUMBER = {2}, PAGES = {526--576}, }
Endnote
%0 Journal Article %A Karaev, Sanjar %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Algorithms for Approximate Subtropical Matrix Factorization : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9FD5-B %R 10.1007/s10618-018-0599-1 %7 2018 %D 2019 %J Data Mining and Knowledge Discovery %O DMKD %V 33 %N 2 %& 526 %P 526 - 576 %I Springer %C New York, NY
[26]
S. Karaev, “Matrix factorization over diods and its applications in data mining,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (“negative numbers”) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used – the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.
Export
BibTeX
@phdthesis{Karaevphd2019, TITLE = {Matrix factorization over diods and its applications in data mining}, AUTHOR = {Karaev, Sanjar}, LANGUAGE = {eng}, DOI = {10.22028/D291-28661}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse ({\textquotedblleft}negative numbers{\textquotedblright}) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used -- the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.}, }
Endnote
%0 Thesis %A Karaev, Sanjar %Y Miettinen, Pauli %A referee: Weikum, Gerhard %A referee: van Leeuwen, Matthijs %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Matrix factorization over diods and its applications in data mining : %G eng %U http://hdl.handle.net/21.11116/0000-0005-4369-A %R 10.22028/D291-28661 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2019 %P 113 p. %V phd %9 phd %X Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (&#8220;negative numbers&#8221;) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used &#8211; the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27903
[27]
A. Konstantinidis, P. Irakleous, Z. Georgiou, D. Zeinalipour-Yazti, and P. K. Chrysanthis, “IoT Data Prefetching in Indoor Navigation SOAs,” ACM Transactions on Internet Technology, vol. 19, no. 1, 2019.
Export
BibTeX
@article{Konstantinidis:2018:IDP:3283809.3177777, TITLE = {{IoT} Data Prefetching in Indoor Navigation {SOAs}}, AUTHOR = {Konstantinidis, Andreas and Irakleous, Panagiotis and Georgiou, Zacharias and Zeinalipour-Yazti, Demetrios and Chrysanthis, Panos K.}, LANGUAGE = {eng}, ISSN = {1533-5399}, DOI = {10.1145/3177777}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {ACM Transactions on Internet Technology}, VOLUME = {19}, NUMBER = {1}, EID = {10}, }
Endnote
%0 Journal Article %A Konstantinidis, Andreas %A Irakleous, Panagiotis %A Georgiou, Zacharias %A Zeinalipour-Yazti, Demetrios %A Chrysanthis, Panos K. %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T IoT Data Prefetching in Indoor Navigation SOAs : %G eng %U http://hdl.handle.net/21.11116/0000-0002-CA09-1 %R 10.1145/3177777 %7 2019 %D 2019 %J ACM Transactions on Internet Technology %O TOIT %V 19 %N 1 %Z sequence number: 10 %I ACM %C New York, NY %@ false
[28]
P. Lahoti, K. Gummadi, and G. Weikum, “iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.
Export
BibTeX
@inproceedings{Lahoti_ICDE2019, TITLE = {{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making}, AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-5386-7474-1}, DOI = {10.1109/ICDE.2019.00121}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering}, PAGES = {1334--1345}, ADDRESS = {Macau, China}, }
Endnote
%0 Conference Proceedings %A Lahoti, Preethi %A Gummadi, Krishna %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making : %G eng %U http://hdl.handle.net/21.11116/0000-0003-F395-2 %R 10.1109/ICDE.2019.00121 %D 2019 %B 35th IEEE International Conference on Data Engineering %Z date of event: 2019-04-08 - 2019-04-12 %C Macau, China %B ICDE 2019 %P 1334 - 1345 %I IEEE %@ 978-1-5386-7474-1
[29]
P. Lahoti, K. P. Gummadi, and G. Weikum, “Operationalizing Individual Fairness with Pairwise Fair Representations,” 2019. [Online]. Available: http://arxiv.org/abs/1907.01439. (arXiv: 1907.01439)
Abstract
We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.
Export
BibTeX
@online{Lahoti_arXiv1907.01439, TITLE = {Operationalizing Individual Fairness with Pairwise Fair Representations}, AUTHOR = {Lahoti, Preethi and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1907.01439}, EPRINT = {1907.01439}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.}, }
Endnote
%0 Report %A Lahoti, Preethi %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Operationalizing Individual Fairness with Pairwise Fair Representations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FF17-5 %U http://arxiv.org/abs/1907.01439 %D 2019 %X We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable. %K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
[30]
X. Lu, S. Pramanik, R. Saha Roy, A. Abujabal, Y. Wang, and G. Weikum, “Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France. (Accepted/in press)
Export
BibTeX
@inproceedings{lu19answering, TITLE = {Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs}, AUTHOR = {Lu, Xiaolu and Pramanik, Soumajit and Saha Roy, Rishiraj and Abujabal, Abdalghani and Wang, Yafang and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2019}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A Lu, Xiaolu %A Pramanik, Soumajit %A Saha Roy, Rishiraj %A Abujabal, Abdalghani %A Wang, Yafang %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0003-7085-8 %D 2019 %B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2019-07-21 - 2019-07-25 %C Paris, France %B Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %I ACM
[31]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Overcoming Low-Utility Facets for Complex Answer Retrieval,” Information Retrieval Journal, vol. 22, no. 3–4, 2019.
Export
BibTeX
@article{MacAvaney2019, TITLE = {Overcoming Low-Utility Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, ISSN = {1386-4564}, DOI = {10.1007/s10791-018-9343-0}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Information Retrieval Journal}, VOLUME = {22}, NUMBER = {3-4}, PAGES = {395--418}, }
Endnote
%0 Journal Article %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Overcoming Low-Utility Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0003-C4A1-9 %R 10.1007/s10791-018-9343-0 %7 2019 %D 2019 %J Information Retrieval Journal %V 22 %N 3-4 %& 395 %P 395 - 418 %I Springer %C New York, NY %@ false
[32]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, “CEDR: Contextualized Embeddings for Document Ranking,” 2019. [Online]. Available: http://arxiv.org/abs/1904.07094. (arXiv: 1904.07094)
Abstract
Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.
Export
BibTeX
@online{MacAvaney_arXiv1904.07094, TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.07094}, EPRINT = {1904.07094}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CEDR: Contextualized Embeddings for Document Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02C7-9 %U http://arxiv.org/abs/1904.07094 %D 2019 %X Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[33]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, “CEDR: Contextualized Embeddings for Document Ranking,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France. (Accepted/in press)
Export
BibTeX
@inproceedings{MacAvaney_SIGIR2019, TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2019}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CEDR: Contextualized Embeddings for Document Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02D3-B %D 2019 %B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2019-07-21 - 2019-07-25 %C Paris, France %B Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %I ACM
[34]
A. Marx and J. Vreeken, “Causal Inference on Multivariate and Mixed-Type Data,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018), Dublin, Ireland, 2019.
Export
BibTeX
@inproceedings{marx:18:crack, TITLE = {Causal Inference on Multivariate and Mixed-Type Data}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-3-030-10927-1}, DOI = {10.1007/978-3-030-10928-8_39}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018)}, EDITOR = {Berlingerio, Michele and Bonchi, Francesco and G{\"a}rtner, Thomas and Hurley, Neil and Ifrim, Georgiana}, PAGES = {655--671}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {11052}, ADDRESS = {Dublin, Ireland}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference on Multivariate and Mixed-Type Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9E86-5 %R 10.1007/978-3-030-10928-8_39 %D 2019 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2018-09-10 - 2018-09-14 %C Dublin, Ireland %B Machine Learning and Knowledge Discovery in Databases %E Berlingerio, Michele; Bonchi, Francesco; G&#228;rtner, Thomas; Hurley, Neil; Ifrim, Georgiana %P 655 - 671 %I Springer %@ 978-3-030-10927-1 %B Lecture Notes in Artificial Intelligence %N 11052
[35]
A. Marx and J. Vreeken, “Telling Cause from Effect by Local and Global Regression,” Knowledge and Information Systems, vol. 60, no. 3, 2019.
Export
BibTeX
@article{marx:19:crack, TITLE = {Telling Cause from Effect by Local and Global Regression}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISSN = {0219-1377}, DOI = {10.1007/s10115-018-1286-7}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Knowledge and Information Systems}, VOLUME = {60}, NUMBER = {3}, PAGES = {1277--1305}, }
Endnote
%0 Journal Article %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Telling Cause from Effect by Local and Global Regression : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EAD-A %R 10.1007/s10115-018-1286-7 %7 2018-12-07 %D 2019 %J Knowledge and Information Systems %V 60 %N 3 %& 1277 %P 1277 - 1305 %I Springer %C New York, NY %@ false
[36]
A. Marx and J. Vreeken, “Testing Conditional Independence on Discrete Data using Stochastic Complexity,” in Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019), Naha, Okinawa, Japan, 2019.
Export
BibTeX
@inproceedings{Marx_AISTATS2019, TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, PUBLISHER = {PMLR}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)}, EDITOR = {Chaudhuri, Kamalika and Sugiyama, Masashi}, PAGES = {496--505}, SERIES = {Proceedings of the Machine Learning Research}, VOLUME = {89}, ADDRESS = {Naha, Okinawa, Japan}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Testing Conditional Independence on Discrete Data using Stochastic Complexity : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D3C-D %D 2019 %B 22nd International Conference on Artificial Intelligence and Statistics %Z date of event: 2019-04-16 - 2019-04-18 %C Naha, Okinawa, Japan %B Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics %E Chaudhuri, Kamalika; Sugiyama, Masashi %P 496 - 505 %I PMLR %B Proceedings of the Machine Learning Research %N 89 %U http://proceedings.mlr.press/v89/marx19a/marx19a.pdf
[37]
A. Marx and J. Vreeken, “Approximating Algorithmic Conditional Independence for Discrete Data,” in Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI, Stanford, CA, USA. (Accepted/in press)
Export
BibTeX
@inproceedings{Marx_AAAISpringSymp2019, TITLE = {Approximating Algorithmic Conditional Independence for Discrete Data}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, YEAR = {2019}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI}, ADDRESS = {Stanford, CA, USA}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Approximating Algorithmic Conditional Independence for Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D4C-B %D 2019 %B First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI %Z date of event: 2019-05-25 - 2019-05-27 %C Stanford, CA, USA %B Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI
[38]
A. Marx and J. Vreeken, “Testing Conditional Independence on Discrete Data using Stochastic Complexity,” 2019. [Online]. Available: http://arxiv.org/abs/1903.04829. (arXiv: 1903.04829)
Abstract
Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.
Export
BibTeX
@online{Marx_arXiv1903.04829, TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1903.04829}, EPRINT = {1903.04829}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.}, }
Endnote
%0 Report %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Testing Conditional Independence on Discrete Data using Stochastic Complexity : %G eng %U http://hdl.handle.net/21.11116/0000-0004-027A-1 %U http://arxiv.org/abs/1903.04829 %D 2019 %X Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision. %K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG
[39]
A. Marx and J. Vreeken, “Identifiability of Cause and Effect using Regularized Regression,” in KDD’19, 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019.
Export
BibTeX
@inproceedings{Marx_KDD2019, TITLE = {Identifiability of Cause and Effect using Regularized Regression}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-4503-6201-6}, DOI = {10.1145/3292500.3330854}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {KDD'19, 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining}, PAGES = {852--861}, ADDRESS = {Anchorage, AK, USA}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Identifiability of Cause and Effect using Regularized Regression : %G eng %U http://hdl.handle.net/21.11116/0000-0004-858C-8 %R 10.1145/3292500.3330854 %D 2019 %B 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining %Z date of event: 2019-08-04 - 2019-08-08 %C Anchorage, AK, USA %B KDD'19 %P 852 - 861 %I ACM %@ 978-1-4503-6201-6
[40]
S. Metzler, S. Günnemann, and P. Miettinen, “Stability and Dynamics of Communities on Online Question-Answer Sites,” Social Networks, vol. 58, 2019.
Export
BibTeX
@article{Metzler2019, TITLE = {Stability and Dynamics of Communities on Online Question-Answer Sites}, AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {0378-8733}, DOI = {10.1016/j.socnet.2018.12.004}, PUBLISHER = {Elsevier}, ADDRESS = {Amsterdam}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Social Networks}, VOLUME = {58}, PAGES = {50--58}, }
Endnote
%0 Journal Article %A Metzler, Saskia %A G&#252;nnemann, Stephan %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Stability and Dynamics of Communities on Online Question-Answer Sites : %G eng %U http://hdl.handle.net/21.11116/0000-0002-BCC1-0 %R 10.1016/j.socnet.2018.12.004 %7 2019 %D 2019 %J Social Networks %V 58 %& 50 %P 50 - 58 %I Elsevier %C Amsterdam %@ false
[41]
M. Mohanty, M. Ramanath, M. Yahya, and G. Weikum, “Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs,” in Advances in Database Technology (EDBT 2019), Lisbon, Portugal, 2019.
Export
BibTeX
@inproceedings{Mohanty:EDBT2019, TITLE = {{Spec-QP}: {S}peculative Query Planning for Joins over Knowledge Graphs}, AUTHOR = {Mohanty, Madhulika and Ramanath, Maya and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-89318-081-3}, DOI = {10.5441/002/edbt.2019.07}, PUBLISHER = {OpenProceedings.org}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Advances in Database Technology (EDBT 2019)}, EDITOR = {Herschel, Melanie and Galhardas, Helena and Reinwald, Berthold and Fundlaki, Irini and Binning, Carsten and Kaoudi, Zoi}, PAGES = {61--72}, ADDRESS = {Lisbon, Portugal}, }
Endnote
%0 Conference Proceedings %A Mohanty, Madhulika %A Ramanath, Maya %A Yahya, Mohamed %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0003-3A7D-1 %R 10.5441/002/edbt.2019.07 %D 2019 %B 22nd International Conference on Extending Database Technology %Z date of event: 2019-03-26 - 2019-03-29 %C Lisbon, Portugal %B Advances in Database Technology %E Herschel, Melanie; Galhardas, Helena; Reinwald, Berthold; Fundlaki, Irini; Binning, Carsten; Kaoudi, Zoi %P 61 - 72 %I OpenProceedings.org %@ 978-3-89318-081-3
[42]
S. Paramonov, D. Stepanova, and P. Miettinen, “Hybrid ASP-based Approach to Pattern Mining,” Theory and Practice of Logic Programming, vol. 19, no. 4, 2019.
Export
BibTeX
@article{ParamonovTPLP, TITLE = {Hybrid {ASP}-based Approach to Pattern Mining}, AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {1471-0684}, DOI = {10.1017/S1471068418000467}, PUBLISHER = {Cambridge University Press}, ADDRESS = {Cambridge}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Theory and Practice of Logic Programming}, VOLUME = {19}, NUMBER = {4}, PAGES = {505--535}, }
Endnote
%0 Journal Article %A Paramonov, Sergey %A Stepanova, Daria %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hybrid ASP-based Approach to Pattern Mining : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0CC4-3 %R 10.1017/S1471068418000467 %7 2019 %D 2019 %J Theory and Practice of Logic Programming %O TPLP %V 19 %N 4 %& 505 %P 505 - 535 %I Cambridge University Press %C Cambridge %@ false
[43]
J. Romero, S. Razniewski, K. Pal, J. Z. Pan, A. Sakhadeo, and G. Weikum, “Commonsense Properties from Query Logs and Question Answering Forums,” 2019. [Online]. Available: http://arxiv.org/abs/1905.10989. (arXiv: 1905.10989)
Abstract
Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.
Export
BibTeX
@online{Romero_arXiv1905.10989, TITLE = {Commonsense Properties from Query Logs and Question Answering Forums}, AUTHOR = {Romero, Julien and Razniewski, Simon and Pal, Koninika and Pan, Jeff Z. and Sakhadeo, Archit and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1905.10989}, EPRINT = {1905.10989}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.}, }
Endnote
%0 Report %A Romero, Julien %A Razniewski, Simon %A Pal, Koninika %A Pan, Jeff Z. %A Sakhadeo, Archit %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Commonsense Properties from Query Logs and Question Answering Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FEEE-4 %U http://arxiv.org/abs/1905.10989 %D 2019 %X Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[44]
N. Tatti and P. Miettinen, “Boolean Matrix Factorization Meets Consecutive Ones Property,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.
Export
BibTeX
@inproceedings{Tatti_SDM2019, TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property}, AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-61197-567-3}, DOI = {10.1137/1.9781611975673.82}, PUBLISHER = {SIAM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)}, EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh}, PAGES = {729--737}, ADDRESS = {Calgary, Canada}, }
Endnote
%0 Conference Proceedings %A Tatti, Nikolaj %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Boolean Matrix Factorization Meets Consecutive Ones Property : %G eng %U http://hdl.handle.net/21.11116/0000-0004-030A-E %R 10.1137/1.9781611975673.82 %D 2019 %B SIAM International Conference on Data Mining %Z date of event: 2019-05-02 - 2019-05-04 %C Calgary, Canada %B Proceedings of the 2019 SIAM International Conference on Data Mining %E Berger-Wolf, Tanya; Chawla, Nitesh %P 729 - 737 %I SIAM %@ 978-1-61197-567-3
[45]
N. Tatti and P. Miettinen, “Boolean Matrix Factorization Meets Consecutive Ones Property,” 2019. [Online]. Available: http://arxiv.org/abs/1901.05797. (arXiv: 1901.05797)
Abstract
Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well.
Export
BibTeX
@online{Tatti_arXiv1901.05797, TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property}, AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli}, URL = {http://arxiv.org/abs/1901.05797}, EPRINT = {1901.05797}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well.}, }
Endnote
%0 Report %A Tatti, Nikolaj %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Boolean Matrix Factorization Meets Consecutive Ones Property : %U http://hdl.handle.net/21.11116/0000-0004-02F0-A %U http://arxiv.org/abs/1901.05797 %D 2019 %X Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well. %K Computer Science, Data Structures and Algorithms, cs.DS,Computer Science, Discrete Mathematics, cs.DM,Computer Science, Learning, cs.LG
[46]
A. Tigunova, A. Yates, P. Mirza, and G. Weikum, “Listening between the Lines: Learning Personal Attributes from Conversations,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{tigunova2019listening, TITLE = {Listening between the Lines: {L}earning Personal Attributes from Conversations}, AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313498}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {1818--1828}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Tigunova, Anna %A Yates, Andrew %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Listening between the Lines: Learning Personal Attributes from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-1460-A %R 10.1145/3308558.3313498 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 1818 - 1828 %I ACM %@ 978-1-4503-6674-8
[47]
A. Tigunova, A. Yates, P. Mirza, and G. Weikum, “Listening between the Lines: Learning Personal Attributes from Conversations,” 2019. [Online]. Available: http://arxiv.org/abs/1904.10887. (arXiv: 1904.10887)
Abstract
Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.
Export
BibTeX
@online{Tigunova_arXiv1904.10887, TITLE = {Listening between the Lines: Learning Personal Attributes from Conversations}, AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.10887}, EPRINT = {1904.10887}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.}, }
Endnote
%0 Report %A Tigunova, Anna %A Yates, Andrew %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Listening between the Lines: Learning Personal Attributes from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FE7F-2 %U http://arxiv.org/abs/1904.10887 %D 2019 %X Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines. %K Computer Science, Computation and Language, cs.CL
[48]
M. Unterkalmsteiner and A. Yates, “Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation,” in Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019), Essen, Germany, 2019.
Export
BibTeX
@inproceedings{Unterkalmsteiner_NLP4RE2019, TITLE = {Expert-sourcing Domain-specific Knowledge: {The} Case of Synonym Validation}, AUTHOR = {Unterkalmsteiner, Michael and Yates, Andrew}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-2376-8}, PUBLISHER = {CEUR-WS}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019)}, EDITOR = {Dalpiaz, Fabiano and Ferrari, Alessio and Franch, Xavier and Gregory, Sarah and Houdek, Frank and Palomares, Cristina}, EID = {8}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2376}, ADDRESS = {Essen, Germany}, }
Endnote
%0 Conference Proceedings %A Unterkalmsteiner, Michael %A Yates, Andrew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02AE-6 %D 2019 %B 2nd Workshop on Natural Language Processing for Requirements Engineering and NLP Tool Showcase %Z date of event: 2019-03-18 - 2019-03-18 %C Essen, Germany %B Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) %E Dalpiaz, Fabiano; Ferrari, Alessio; Franch, Xavier; Gregory, Sarah; Houdek, Frank; Palomares, Cristina %Z sequence number: 8 %I CEUR-WS %B CEUR Workshop Proceedings %N 2376 %@ false %U http://ceur-ws.org/Vol-2376/NLP4RE19_paper08.pdf
[49]
M. van Leeuwen, P. Chau, J. Vreeken, D. Shahaf, and C. Faloutsos, “Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors,” ACM Transactions on Knowledge Discovery from Data, vol. 13, no. 1, 2019.
Export
BibTeX
@article{vanLeeuwen2019, TITLE = {Addendum to the Special Issue on Interactive Data Exploration and Analytics ({TKDD}, Vol. 12, Iss. 1): Introduction by the Guest Editors}, AUTHOR = {van Leeuwen, Matthijs and Chau, Polo and Vreeken, Jilles and Shahaf, Dafna and Faloutsos, Christos}, LANGUAGE = {eng}, ISSN = {1556-4681}, DOI = {10.1145/3298786}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Knowledge Discovery from Data}, VOLUME = {13}, NUMBER = {1}, EID = {13}, }
Endnote
%0 Journal Article %A van Leeuwen, Matthijs %A Chau, Polo %A Vreeken, Jilles %A Shahaf, Dafna %A Faloutsos, Christos %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FFD5-E %R 10.1145/3298786 %7 2019 %D 2019 %J ACM Transactions on Knowledge Discovery from Data %V 13 %N 1 %Z sequence number: 13 %I ACM %C New York, NY %@ false
[50]
A. Yates and M. Unterkalmsteiner, “Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain,” in Advances in Information Retrieval (ECIR 2019), Cologne, Germany, 2019.
Export
BibTeX
@inproceedings{Yates_ECIR2019, TITLE = {Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain}, AUTHOR = {Yates, Andrew and Unterkalmsteiner, Michael}, LANGUAGE = {eng}, ISBN = {978-3-030-15711-1}, DOI = {10.1007/978-3-030-15712-8_28}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2019)}, EDITOR = {Azzopardi, Leif and Stein, Benno and Fuhr, Norbert and Mayr, Philipp and Hauff, Claudia and Hiemstra, Djoerd}, PAGES = {429--442}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11437}, ADDRESS = {Cologne, Germany}, }
Endnote
%0 Conference Proceedings %A Yates, Andrew %A Unterkalmsteiner, Michael %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain : %G eng %U http://hdl.handle.net/21.11116/0000-0004-029B-B %R 10.1007/978-3-030-15712-8_28 %D 2019 %B 41st European Conference on IR Research %Z date of event: 2019-04-14 - 2019-04-18 %C Cologne, Germany %B Advances in Information Retrieval %E Azzopardi, Leif; Stein, Benno; Fuhr, Norbert; Mayr, Philipp; Hauff, Claudia; Hiemstra, Djoerd %P 429 - 442 %I Springer %@ 978-3-030-15711-1 %B Lecture Notes in Computer Science %N 11437
2018
[51]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,” 2018. [Online]. Available: http://arxiv.org/abs/1809.09528. (arXiv: 1809.09528)
Abstract
To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.
Export
BibTeX
@online{Abujabal_arXiv1809.09528, TITLE = {{ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1809.09528}, EPRINT = {1809.09528}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.}, }
Endnote
%0 Report %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A0FE-B %U http://arxiv.org/abs/1809.09528 %D 2018 %X To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA. %K Computer Science, Computation and Language, cs.CL
[52]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases,” in Proceedings of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{AbujabalWWW_2018, TITLE = {Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5639-8}, DOI = {10.1145/3178876.3186004}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel and Lalmas, Mounia and Ipeirotis, Panagiotis G.}, PAGES = {1053--1062}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3C91-8 %R 10.1145/3178876.3186004 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Proceedings of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel; Lalmas, Mounia; Ipeirotis, Panagiotis G. %P 1053 - 1062 %I ACM %@ 978-1-4503-5639-8
[53]
P. Agarwal, J. Strötgen, L. Del Corro, J. Hoffart, and G. Weikum, “diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora,” in The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018.
Export
BibTeX
@inproceedings{AgrawalACL2018a, TITLE = {{diaNED}: {T}ime-Aware Named Entity Disambiguation for Diachronic Corpora}, AUTHOR = {Agarwal, Prabal and Str{\"o}tgen, Jannik and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-948087-34-6}, URL = {https://aclanthology.coli.uni-saarland.de/volumes/proceedings-of-the-56th-annual-meeting-of-the-association-for-computational-linguistics-volume-2-short-papers}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)}, EDITOR = {Gurevych, Iryna and Miyao, Yusuke}, PAGES = {686--693}, EID = {602}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Agarwal, Prabal %A Str&#246;tgen, Jannik %A Del Corro, Luciano %A Hoffart, Johannes %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora : %G eng %U http://hdl.handle.net/21.11116/0000-0001-9055-C %D 2018 %B The 56th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2018-07-15 - 2018-07-20 %C Melbourne, Australia %B The 56th Annual Meeting of the Association for Computational Linguistics %E Gurevych, Iryna; Miyao, Yusuke %P 686 - 693 %Z sequence number: 602 %I ACL %@ 978-1-948087-34-6 %U http://aclweb.org/anthology/P18-2109
[54]
M. Antenore, G. Leone, A. Panconesi, and E. Terolli, “Together We Buy, Alone I Quit: Some Experimental Studies of Online Persuaders,” in DTUC’18 Digital Tools & Uses Congres, Paris, France, 2018.
Export
BibTeX
@inproceedings{Antenore:2018:TWB:3240117.3240119, TITLE = {Together We Buy, Alone {I} Quit: {S}ome Experimental Studies of Online Persuaders}, AUTHOR = {Antenore, Marzia and Leone, Giovanna and Panconesi, Alessandro and Terolli, Erisa}, LANGUAGE = {eng}, ISBN = {978-1-4503-6451-5}, DOI = {10.1145/3240117.3240119}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {DTUC'18 Digital Tools \& Uses Congres}, EDITOR = {Reyes, E. and Szoniecky, S. and Mkadmi, A. and Kembellec, G. and Fournier-S'niehotta, R. and Siala-Kallel, F. and Ammi, M. and Labelle, S.}, EID = {2}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A Antenore, Marzia %A Leone, Giovanna %A Panconesi, Alessandro %A Terolli, Erisa %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Together We Buy, Alone I Quit: Some Experimental Studies of Online Persuaders : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A89D-0 %R 10.1145/3240117.3240119 %D 2018 %B First International Digital Tools & Uses Congress %Z date of event: 2018-10-03 - 2018-10-05 %C Paris, France %B DTUC'18 Digital Tools & Uses Congres %E Reyes, E.; Szoniecky, S.; Mkadmi, A.; Kembellec, G.; Fournier-S'niehotta, R.; Siala-Kallel, F.; Ammi, M.; Labelle, S. %Z sequence number: 2 %I ACM %@ 978-1-4503-6451-5
[55]
O. Balalau, C. Castillo, and M. Sozio, “EviDense: A Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions,” in Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018), Stanford, CA, USA, 2018.
Export
BibTeX
@inproceedings{Balalau_ICWSM2018, TITLE = {{EviDense}: {A} Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions}, AUTHOR = {Balalau, Oana and Castillo, Carlos and Sozio, Mauro}, LANGUAGE = {eng}, ISBN = {978-1-57735-798-8}, PUBLISHER = {AAAI}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018)}, PAGES = {560--563}, ADDRESS = {Stanford, CA, USA}, }
Endnote
%0 Conference Proceedings %A Balalau, Oana %A Castillo, Carlos %A Sozio, Mauro %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T EviDense: A Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9CE8-9 %D 2018 %B 12th International AAAI Conference on Web and Social Media %Z date of event: 2018-06-25 - 2018-06-28 %C Stanford, CA, USA %B Proceedings of the Twelfth International AAAI Conference on Web and Social Media %P 560 - 563 %I AAAI %@ 978-1-57735-798-8 %U https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17889
[56]
V. Balaraman, S. Razniewski, and W. Nutt, “Recoin: Relative Completeness in Wikidata,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{BalaramanWWW2017, TITLE = {Recoin: {R}elative Completeness in {W}ikidata}, AUTHOR = {Balaraman, Vevake and Razniewski, Simon and Nutt, Werner}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3191641}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {1787--1792}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Balaraman, Vevake %A Razniewski, Simon %A Nutt, Werner %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Recoin: Relative Completeness in Wikidata : %G eng %U http://hdl.handle.net/21.11116/0000-0001-414A-3 %R 10.1145/3184558.3191641 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel %P 1787 - 1792 %I ACM %@ 978-1-4503-5640-4
[57]
A. J. Biega, K. P. Gummadi, and G. Weikum, “Equity of Attention: Amortizing Individual Fairness in Rankings,” in SIGIR’18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 2018.
Export
BibTeX
@inproceedings{BiegaSIGIR2018, TITLE = {Equity of Attention: {A}mortizing Individual Fairness in Rankings}, AUTHOR = {Biega, Asia J. and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5022-8}, DOI = {10.1145/3209978.3210063}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {SIGIR'18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {405--414}, ADDRESS = {Ann Arbor, MI, USA}, }
Endnote
%0 Conference Proceedings %A Biega, Asia J. %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Equity of Attention: Amortizing Individual Fairness in Rankings : %G eng %U http://hdl.handle.net/21.11116/0000-0002-0D8A-5 %R 10.1145/3209978.3210063 %D 2018 %B 41st International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2018-07-08 - 2018-07-12 %C Ann Arbor, MI, USA %B SIGIR'18 %P 405 - 414 %I ACM %@ 978-1-4503-5022-8
[58]
A. J. Biega, K. P. Gummadi, and G. Weikum, “Equity of Attention: Amortizing Individual Fairness in Rankings,” 2018. [Online]. Available: http://arxiv.org/abs/1805.01788. (arXiv: 1805.01788)
Abstract
Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality.
Export
BibTeX
@online{Biega_arXiv1805.01788, TITLE = {Equity of Attention: Amortizing Individual Fairness in Rankings}, AUTHOR = {Biega, Asia J. and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1805.01788}, EPRINT = {1805.01788}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality.}, }
Endnote
%0 Report %A Biega, Asia J. %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Equity of Attention: Amortizing Individual Fairness in Rankings : %G eng %U http://hdl.handle.net/21.11116/0000-0002-1563-7 %U http://arxiv.org/abs/1805.01788 %D 2018 %X Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computers and Society, cs.CY
[59]
N. Boldyrev, M. Spaniol, and G. Weikum, “Multi-Cultural Interlinking of Web Taxonomies with ACROSS,” The Journal of Web Science, vol. 4, no. 2, 2018.
Export
BibTeX
@article{Boldyrev2018, TITLE = {Multi-Cultural Interlinking of Web Taxonomies with {ACROSS}}, AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1561/106.00000012}, PUBLISHER = {Now Publishers}, ADDRESS = {Boston}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, JOURNAL = {The Journal of Web Science}, VOLUME = {4}, NUMBER = {2}, PAGES = {20--33}, }
Endnote
%0 Journal Article %A Boldyrev, Natalia %A Spaniol, Marc %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Multi-Cultural Interlinking of Web Taxonomies with ACROSS : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3CA4-3 %R 10.1561/106.00000012 %7 2018 %D 2018 %J The Journal of Web Science %O Web Science %V 4 %N 2 %& 20 %P 20 - 33 %I Now Publishers %C Boston
[60]
K. Budhathoki and J. Vreeken, “Causal Inference on Event Sequences,” in Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018), San Diego, CA, USA, 2018.
Export
BibTeX
@inproceedings{budhathoki_SDM2018, TITLE = {Causal Inference on Event Sequences}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-532-1}, DOI = {10.1137/1.9781611975321.7}, PUBLISHER = {SIAM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018)}, EDITOR = {Ester, Martin and Pedreschi, Dino}, PAGES = {55--63}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference on Event Sequences : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5F34-A %R 10.1137/1.9781611975321.7 %D 2018 %B SIAM International Conference on Data Mining %Z date of event: 2018-05-03 - 2018-05-05 %C San Diego, CA, USA %B Proceedings of the 2018 SIAM International Conference on Data Mining %E Ester, Martin; Pedreschi, Dino %P 55 - 63 %I SIAM %@ 978-1-61197-532-1
[61]
K. Budhathoki and J. Vreeken, “Accurate Causal Inference on Discrete Data,” in IEEE International Conference on Data Mining (ICDM 2018), Singapore, Singapore, 2018.
Export
BibTeX
@inproceedings{budhathoki:18:acid, TITLE = {Accurate Causal Inference on Discrete Data}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-9159-5}, DOI = {10.1109/ICDM.2018.00105}, PUBLISHER = {IEEE}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE International Conference on Data Mining (ICDM 2018)}, PAGES = {881--886}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Accurate Causal Inference on Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9E96-3 %R 10.1109/ICDM.2018.00105 %D 2018 %B IEEE International Conference on Data Mining %Z date of event: 2018-11-17 - 2018-11-20 %C Singapore, Singapore %B IEEE International Conference on Data Mining %P 881 - 886 %I IEEE %@ 978-1-5386-9159-5
[62]
K. Budhathoki, M. Boley, and J. Vreeken, “Rule Discovery for Exploratory Causal Reasoning,” in Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018), Montréal, Canada, 2018.
Export
BibTeX
@inproceedings{budhathoki:18:dice, TITLE = {Rule Discovery for Exploratory Causal Reasoning}, AUTHOR = {Budhathoki, Kailash and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {https://drive.google.com/file/d/1r-KTsok3VLIz-wUh0YtsK5YaEu53DcTf/view}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018)}, EID = {14}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Rule Discovery for Exploratory Causal Reasoning : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EBC-9 %U https://drive.google.com/file/d/1r-KTsok3VLIz-wUh0YtsK5YaEu53DcTf/view %D 2018 %B NeurIPS 2018 Workshop on Causal Learning %Z date of event: 2018-12-07 - 2018-12-07 %C Montr&#233;al, Canada %B Proceedings of the NeurIPS 2018 workshop on Causal Learning %Z sequence number: 14
[63]
K. Budhathoki and J. Vreeken, “Origo: Causal Inference by Compression,” Knowledge and Information Systems, vol. 56, no. 2, 2018.
Export
BibTeX
@article{Budhathoki2018, TITLE = {Origo: {C}ausal Inference by Compression}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISSN = {0219-1377}, DOI = {10.1007/s10115-017-1130-5}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, JOURNAL = {Knowledge and Information Systems}, VOLUME = {56}, NUMBER = {2}, PAGES = {285--307}, }
Endnote
%0 Journal Article %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Origo: Causal Inference by Compression : %G eng %U http://hdl.handle.net/21.11116/0000-0001-AF2B-B %R 10.1007/s10115-017-1130-5 %7 2018 %D 2018 %J Knowledge and Information Systems %V 56 %N 2 %& 285 %P 285 - 307 %I Springer %C New York, NY %@ false
[64]
A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney, and N. Goharian, “SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions,” in The 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, NM, USA, 2018.
Export
BibTeX
@inproceedings{Cohan_COLING2018, TITLE = {{SMHD}: {A} Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions}, AUTHOR = {Cohan, Arman and Desmet, Bart and Yates, Andrew and Soldaini, Luca and MacAvaney, Sean and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {978-1-948087-50-6}, URL = {http://aclweb.org/anthology/C18-1126}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 27th International Conference on Computational Linguistics (COLING 2018)}, EDITOR = {Bender, Emily M. and Derczynski, Leon and Isabelle, Pierre}, PAGES = {1485--1497}, ADDRESS = {Santa Fe, NM, USA}, }
Endnote
%0 Conference Proceedings %A Cohan, Arman %A Desmet, Bart %A Yates, Andrew %A Soldaini, Luca %A MacAvaney, Sean %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E91-1 %U http://aclweb.org/anthology/C18-1126 %D 2018 %B 27th International Conference on Computational Linguistics %Z date of event: 2018-08-20 - 2018-08-26 %C Santa Fe, NM, USA %B The 27th International Conference on Computational Linguistics %E Bender, Emily M.; Derczynski, Leon; Isabelle, Pierre %P 1485 - 1497 %I ACL %@ 978-1-948087-50-6
[65]
A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney, and N. Goharian, “SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions,” 2018. [Online]. Available: http://arxiv.org/abs/1806.05258. (arXiv: 1806.05258)
Abstract
Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language.
Export
BibTeX
@online{cohan_arXiv1806.05258, TITLE = {{SMHD}: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions}, AUTHOR = {Cohan, Arman and Desmet, Bart and Yates, Andrew and Soldaini, Luca and MacAvaney, Sean and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1806.05258}, EPRINT = {1806.05258}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language.}, }
Endnote
%0 Report %A Cohan, Arman %A Desmet, Bart %A Yates, Andrew %A Soldaini, Luca %A MacAvaney, Sean %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ED4-6 %U http://arxiv.org/abs/1806.05258 %D 2018 %X Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language. %K Computer Science, Computation and Language, cs.CL
[66]
M. Danisch, O. Balalau, and M. Sozio, “Listing k-cliques in Sparse Real-World Graphs,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{Danisch_WWW2018, TITLE = {Listing k-cliques in Sparse Real-World Graphs}, AUTHOR = {Danisch, Maximilien and Balalau, Oana and Sozio, Mauro}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3178876.3186125}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {589--598}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Danisch, Maximilien %A Balalau, Oana %A Sozio, Mauro %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Listing k-cliques in Sparse Real-World Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9CDE-5 %R 10.1145/3178876.3186125 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel %P 589 - 598 %I ACM %@ 978-1-4503-5640-4
[67]
F. Darari, W. Nutt, and S. Razniewski, “Comparing Index Structures for Completeness Reasoning,” in IWBIS 2018, International Workshop on Big Data and Information Security, Jakarta, Indonesia, 2018.
Export
BibTeX
@inproceedings{DarariIWBIS2018, TITLE = {Comparing Index Structures for Completeness Reasoning}, AUTHOR = {Darari, Fariz and Nutt, Werner and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-1-5386-5525-2}, DOI = {10.1109/IWBIS.2018.8471712}, PUBLISHER = {IEEE}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {IWBIS 2018, International Workshop on Big Data and Information Security}, PAGES = {49--56}, ADDRESS = {Jakarta, Indonesia}, }
Endnote
%0 Conference Proceedings %A Darari, Fariz %A Nutt, Werner %A Razniewski, Simon %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Comparing Index Structures for Completeness Reasoning : %G eng %U http://hdl.handle.net/21.11116/0000-0001-E193-A %R 10.1109/IWBIS.2018.8471712 %D 2018 %B International Workshop on Big Data and Information Security %Z date of event: 2018-05-12 - 2018-05-13 %C Jakarta, Indonesia %B IWBIS 2018 %P 49 - 56 %I IEEE %@ 978-1-5386-5525-2
[68]
F. Darari, W. Nutt, G. Pirrò, and S. Razniewski, “Completeness Management for RDF Data Sources,” ACM Transactions on the Web, vol. 12, no. 3, 2018.
Export
BibTeX
@article{Darari2018, TITLE = {Completeness Management for {RDF} Data Sources}, AUTHOR = {Darari, Fariz and Nutt, Werner and Pirr{\o}, Giuseppe and Razniewski, Simon}, LANGUAGE = {eng}, DOI = {10.1145/3196248}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, JOURNAL = {ACM Transactions on the Web}, VOLUME = {12}, NUMBER = {3}, EID = {18}, }
Endnote
%0 Journal Article %A Darari, Fariz %A Nutt, Werner %A Pirr&#242;, Giuseppe %A Razniewski, Simon %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Completeness Management for RDF Data Sources : %G eng %U http://hdl.handle.net/21.11116/0000-0001-E17F-3 %R 10.1145/3196248 %7 2018 %D 2018 %J ACM Transactions on the Web %V 12 %N 3 %Z sequence number: 18 %I ACM %C New York, NY
[69]
S. Degaetano-Ortlieb and J. Strötgen, “Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy,” in Language Technologies for the Challenges of the Digital Age (GSCL 2017), Berlin, Germany, 2018.
Export
BibTeX
@inproceedings{DegaetanoortliebStroetgen2017, TITLE = {Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy}, AUTHOR = {Degaetano-Ortlieb, Stefania and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-3-319-73705-8}, DOI = {10.1007/978-3-319-73706-5_22}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Language Technologies for the Challenges of the Digital Age (GSCL 2017)}, EDITOR = {Rehm, Georg and Declerck, Thierry}, PAGES = {259--275}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {10713}, ADDRESS = {Berlin, Germany}, }
Endnote
%0 Conference Proceedings %A Degaetano-Ortlieb, Stefania %A Str&#246;tgen, Jannik %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-A8E8-5 %R 10.1007/978-3-319-73706-5_22 %D 2018 %B Conference of the German Society for Computational Linguistics and Language Technology %Z date of event: 2017-09-13 - 2017-09-14 %C Berlin, Germany %B Language Technologies for the Challenges of the Digital Age %E Rehm, Georg; Declerck, Thierry %P 259 - 275 %I Springer %@ 978-3-319-73705-8 %B Lecture Notes in Artificial Intelligence %N 10713
[70]
P. Ernst, “Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems,” Universität des Saarlandes, Saarbrücken, 2018.
Abstract
While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.
Export
BibTeX
@phdthesis{Ernstphd2017, TITLE = {Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems}, AUTHOR = {Ernst, Patrick}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-271051}, DOI = {10.22028/D291-27105}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: -- To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. -- To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. -- To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.}, }
Endnote
%0 Thesis %A Ernst, Patrick %Y Weikum, Gerhard %A referee: Verspoor, Karin %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems : %G eng %U http://hdl.handle.net/21.11116/0000-0001-1864-4 %U urn:nbn:de:bsz:291-scidok-ds-271051 %R 10.22028/D291-27105 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2018 %8 20.02.2018 %P 147 p. %V phd %9 phd %X While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26987
[71]
P. Ernst, A. Siu, and G. Weikum, “HighLife: Higher-arity Fact Harvesting,” in Proceedings of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{ErnstlWWW_2018, TITLE = {{HighLife}: Higher-arity Fact Harvesting}, AUTHOR = {Ernst, Patrick and Siu, Amy and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5639-8}, DOI = {10.1145/3178876.3186000}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel and Lalmas, Mounia and Ipeirotis, Panagiotis G.}, PAGES = {1013--1022}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Ernst, Patrick %A Siu, Amy %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T HighLife: Higher-arity Fact Harvesting : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3C96-3 %R 10.1145/3178876.3186000 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Proceedings of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel; Lalmas, Mounia; Ipeirotis, Panagiotis G. %P 1013 - 1022 %I ACM %@ 978-1-4503-5639-8
[72]
A. K. Fischer, J. Vreeken, and D. Klakov, “Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL,” Computación y Sistemas, vol. 21, no. 4, 2018.
Export
BibTeX
@article{Fischer2018, TITLE = {Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by {MDL}}, AUTHOR = {Fischer, Andrea K. and Vreeken, Jilles and Klakov, Dietrich}, LANGUAGE = {eng}, DOI = {10.13053/CyS-21-4-2865}, PUBLISHER = {Instituto Polit{\'e}cnico Nacional}, ADDRESS = {M{\'e}xico}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, JOURNAL = {Computaci{\'o}n y Sistemas}, VOLUME = {21}, NUMBER = {4}, PAGES = {829--839}, }
Endnote
%0 Journal Article %A Fischer, Andrea K. %A Vreeken, Jilles %A Klakov, Dietrich %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0001-4156-5 %R 10.13053/CyS-21-4-2865 %7 2018 %D 2018 %J Computaci&#243;n y Sistemas %V 21 %N 4 %& 829 %P 829 - 839 %I Instituto Polit&#233;cnico Nacional %C M&#233;xico %U http://www.redalyc.org/articulo.oa?id=61553900023
[73]
E. Galbrun and P. Miettinen, “Mining Redescriptions with Siren,” ACM Transactions on Knowledge Discovery from Data, vol. 12, no. 1, 2018.
Export
BibTeX
@article{galbrun17mining, TITLE = {Mining Redescriptions with {Siren}}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1145/3007212}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Knowledge Discovery from Data}, VOLUME = {12}, NUMBER = {1}, EID = {6}, }
Endnote
%0 Journal Article %A Galbrun, Esther %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Mining Redescriptions with Siren : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-227B-F %R 10.1145/3007212 %7 2018 %D 2018 %J ACM Transactions on Knowledge Discovery from Data %V 12 %N 1 %Z sequence number: 6 %I ACM %C New York, NY
[74]
E. Gius, N. Reiter, J. Strötgen, and M. Willand, “SANTA: Systematische Analyse Narrativer Texte durch Annotation,” in DHd 2018, 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V., Köln, Germany, 2018.
Export
BibTeX
@inproceedings{GiusDHd2018, TITLE = {{{SANTA}: {Systematische Analyse Narrativer Texte durch Annotation}}}, AUTHOR = {Gius, Evelyn and Reiter, Nils and Str{\"o}tgen, Jannik and Willand, Marcus}, LANGUAGE = {deu}, ISBN = {978-3-946275-02-2}, URL = {http://dhd2018.uni-koeln.de/}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {DHd 2018, 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.}, PAGES = {302--305}, ADDRESS = {K{\"o}ln, Germany}, }
Endnote
%0 Conference Proceedings %A Gius, Evelyn %A Reiter, Nils %A Str&#246;tgen, Jannik %A Willand, Marcus %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T SANTA: Systematische Analyse Narrativer Texte durch Annotation : %G deu %U http://hdl.handle.net/11858/00-001M-0000-002E-73EC-4 %D 2018 %B 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V. %Z date of event: 2018-02-26 - 2018-03-02 %C K&#246;ln, Germany %B DHd 2018 %P 302 - 305 %@ 978-3-946275-02-2
[75]
D. Gupta and K. Berberich, “GYANI: An Indexing Infrastructure for Knowledge-Centric Tasks,” in CIKM’18, 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 2018.
Export
BibTeX
@inproceedings{Gupta_CIKM2018, TITLE = {{GYANI}: {A}n Indexing Infrastructure for Knowledge-Centric Tasks}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-6014-2}, DOI = {10.1145/3269206.3271745}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {CIKM'18, 27th ACM International Conference on Information and Knowledge Management}, EDITOR = {Cuzzocrea, Alfredo and Allan, James and Paton, Norman and Srivastava, Divesh and Agrawal, Rakesh and Broder, Andrei and Zaki, Mohamed and Candan, Selcuk and Labrinidis, Alexandros and Schuster, Assaf and Wang, Haixun}, PAGES = {487--496}, ADDRESS = {Torino, Italy}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T GYANI: An Indexing Infrastructure for Knowledge-Centric Tasks : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A8B7-2 %R 10.1145/3269206.3271745 %D 2018 %B 27th ACM International Conference on Information and Knowledge Management %Z date of event: 2018-10-22 - 2018-10-26 %C Torino, Italy %B CIKM'18 %E Cuzzocrea, Alfredo; Allan, James; Paton, Norman; Srivastava, Divesh; Agrawal, Rakesh; Broder, Andrei; Zaki, Mohamed; Candan, Selcuk; Labrinidis, Alexandros; Schuster, Assaf; Wang, Haixun %P 487 - 496 %I ACM %@ 978-1-4503-6014-2
[76]
D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti, “Generating Semantic Aspects for Queries,” in JCDL’18, Joint Conference on Digital Libraries, Fort Worth, TX, USA, 2018.
Export
BibTeX
@inproceedings{GuptaJCDL2018, TITLE = {Generating Semantic Aspects for Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-4503-5178-2}, DOI = {10.1145/3197026.3203900}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {JCDL'18, Joint Conference on Digital Libraries}, PAGES = {335--336}, ADDRESS = {Fort Worth, TX, USA}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %A Str&#246;tgen, Jannik %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Generating Semantic Aspects for Queries : %G eng %U http://hdl.handle.net/21.11116/0000-0001-904D-6 %R 10.1145/3197026.3203900 %D 2018 %B Joint Conference on Digital Libraries %Z date of event: 2018-06-03 - 2018-06-07 %C Fort Worth, TX, USA %B JCDL'18 %P 335 - 336 %I ACM %@ 978-1-4503-5178-2
[77]
D. Gupta and K. Berberich, “Identifying Time Intervals for Knowledge Graph Facts,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{GuptaWWW2017, TITLE = {Identifying Time Intervals for Knowledge Graph Facts}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3186917}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {37--38}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Identifying Time Intervals for Knowledge Graph Facts : %G eng %U http://hdl.handle.net/21.11116/0000-0001-411F-4 %R 10.1145/3184558.3186917 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel %P 37 - 38 %I ACM %@ 978-1-4503-5640-4
[78]
G. Haratinezhad Torbati, “Joint Disambiguation of Named Entities and Concepts,” Universität des Saarlandes, Saarbrücken, 2018.
Export
BibTeX
@mastersthesis{torbati2018concept, TITLE = {Joint Disambiguation of Named Entities and Concepts}, AUTHOR = {Haratinezhad Torbati, Ghazaleh}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, }
Endnote
%0 Thesis %A Haratinezhad Torbati, Ghazaleh %Y Del Corro, Luciano %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Joint Disambiguation of Named Entities and Concepts : %G eng %U http://hdl.handle.net/21.11116/0000-0003-38D0-3 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2018 %P XIII, 70 p. %V master %9 master
[79]
A. Horňáková, M. List, J. Vreeken, and M. H. Schulz, “JAMI: Fast Computation of Conditional Mutual Information for ceRNA Network Analysis,” Bioinformatics, vol. 34, no. 17, 2018.
Export
BibTeX
@article{Hornakova_Bioinformatics2018, TITLE = {{JAMI}: {F}ast Computation of Conditional Mutual Information for {ceRNA} Network Analysis}, AUTHOR = {Hor{\v n}{\'a}kov{\'a}, Andrea and List, Markus and Vreeken, Jilles and Schulz, Marcel H.}, LANGUAGE = {eng}, ISSN = {1367-4803}, DOI = {10.1093/bioinformatics/bty221}, PUBLISHER = {Oxford University Press}, ADDRESS = {Oxford}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, JOURNAL = {Bioinformatics}, VOLUME = {34}, NUMBER = {17}, PAGES = {3050--3051}, }
Endnote
%0 Journal Article %A Hor&#328;&#225;kov&#225;, Andrea %A List, Markus %A Vreeken, Jilles %A Schulz, Marcel H. %+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society %T JAMI: Fast Computation of Conditional Mutual Information for ceRNA Network Analysis : %G eng %U http://hdl.handle.net/21.11116/0000-0002-573A-C %R 10.1093/bioinformatics/bty221 %7 2018 %D 2018 %J Bioinformatics %V 34 %N 17 %& 3050 %P 3050 - 3051 %I Oxford University Press %C Oxford %@ false
[80]
V. T. Ho, D. Stepanova, M. H. Gad-Elrab, E. Kharlamov, and G. Weikum, “Learning Rules from Incomplete KGs using Embeddings,” in ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks (ISWC-P&D-Industry-BlueSky 2018), Monterey, CA, USA, 2018.
Export
BibTeX
@inproceedings{StepanovaISWC2018b, TITLE = {Learning Rules from Incomplete {KGs} using Embeddings}, AUTHOR = {Ho, Vinh Thinh and Stepanova, Daria and Gad-Elrab, Mohamed Hassan and Kharlamov, Evgeny and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://ceur-ws.org/Vol-2180/paper-25.pdf; urn:nbn:de:0074-2180-3}, PUBLISHER = {ceur.ws.org}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ISWC 2018 Posters \& Demonstrations, Industry and Blue Sky Ideas Tracks (ISWC-P\&D-Industry-BlueSky 2018)}, EDITOR = {van Erp, Marieke and Atre, Medha and Lopez, Vanessa and Srinivas, Kavitha and Fortuna, Carolina}, EID = {25}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2180}, ADDRESS = {Monterey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Stepanova, Daria %A Gad-Elrab, Mohamed Hassan %A Kharlamov, Evgeny %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Learning Rules from Incomplete KGs using Embeddings : %G eng %U http://hdl.handle.net/21.11116/0000-0001-905B-6 %U http://ceur-ws.org/Vol-2180/paper-25.pdf %D 2018 %B The 17th International Semantic Web Conference %Z date of event: 2018-10-08 - 2018-10-12 %C Monterey, CA, USA %B ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks %E van Erp, Marieke; Atre, Medha; Lopez, Vanessa; Srinivas, Kavitha; Fortuna, Carolina %Z sequence number: 25 %I ceur.ws.org %B CEUR Workshop Proceedings %N 2180
[81]
V. T. Ho, D. Stepanova, M. H. Gad-Elrab, E. Kharlamov, and G. Weikum, “Rule Learning from Knowledge Graphs Guided by Embedding Models,” in The Semantic Web -- ISWC 2018, Monterey, CA, USA, 2018.
Export
BibTeX
@inproceedings{StepanovaISWC2018, TITLE = {Rule Learning from Knowledge Graphs Guided by Embedding Models}, AUTHOR = {Ho, Vinh Thinh and Stepanova, Daria and Gad-Elrab, Mohamed Hassan and Kharlamov, Evgeny and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-00670-9}, DOI = {10.1007/978-3-030-00671-6_5}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {The Semantic Web -- ISWC 2018}, EDITOR = {Vrande{\v c}i{\'c}, Denny and Bontcheva, Kalina and Su{\'a}rez-Figueroa, Mari Carmen and Presutti, Valentina and Celino, Irene and Sabou, Marta and Kaffee, Lucie-Aim{\'e}e and Simperl, Elena}, PAGES = {72--90}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11136}, ADDRESS = {Monterey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Stepanova, Daria %A Gad-Elrab, Mohamed Hassan %A Kharlamov, Evgeny %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Rule Learning from Knowledge Graphs Guided by Embedding Models : %G eng %U http://hdl.handle.net/21.11116/0000-0001-9058-9 %R 10.1007/978-3-030-00671-6_5 %D 2018 %B The 17th International Semantic Web Conference %Z date of event: 2018-10-08 - 2018-10-12 %C Monterey, CA, USA %B The Semantic Web -- ISWC 2018 %E Vrande&#269;i&#263;, Denny; Bontcheva, Kalina; Su&#225;rez-Figueroa, Mari Carmen; Presutti, Valentina; Celino, Irene; Sabou, Marta; Kaffee, Lucie-Aim&#233;e; Simperl, Elena %P 72 - 90 %I Springer %@ 978-3-030-00670-9 %B Lecture Notes in Computer Science %N 11136
[82]
V. T. Ho, “An Embedding-based Approach to Rule Learning from Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2018.
Abstract
Knowledge Graphs (KGs) play an important role in various information systems and have application in many ﬁelds such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts. Rules over a Knowledge Graph capture interpretable patterns in data and various methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as conﬁdence reﬂect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated. Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external information sources including text corpora. The contributions of this thesis are as follows: • We introduce a framework for rule learning guided by external sources. • We propose a concrete instantiation of our framework to show how to learn high- quality rules by utilizing feedback from a pretrained embedding model. • We conducted experiments on real-world KGs that demonstrate the eﬀectiveness of our novel approach with respect to both the quality of the learned rules and fact predictions that they produce.
Export
BibTeX
@mastersthesis{HoMaster2018, TITLE = {An Embedding-based Approach to Rule Learning from Knowledge Graphs}, AUTHOR = {Ho, Vinh Thinh}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, ABSTRACT = {Knowledge Graphs (KGs) play an important role in various information systems and have application in many {fi}elds such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts. Rules over a Knowledge Graph capture interpretable patterns in data and various methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as con{fi}dence re{fl}ect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated. Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external information sources including text corpora. The contributions of this thesis are as follows: \mbox{$\bullet$} We introduce a framework for rule learning guided by external sources. \mbox{$\bullet$} We propose a concrete instantiation of our framework to show how to learn high- quality rules by utilizing feedback from a pretrained embedding model. \mbox{$\bullet$} We conducted experiments on real-world KGs that demonstrate the e&#64256;ectiveness of our novel approach with respect to both the quality of the learned rules and fact predictions that they produce.}, }
Endnote
%0 Thesis %A Ho, Vinh Thinh %A referee: Weikum, Gerhard %Y Stepanova, Daria %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Embedding-based Approach to Rule Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0001-DE06-F %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2018 %P 60 %V master %9 master %X Knowledge Graphs (KGs) play an important role in various information systems and have application in many &#64257;elds such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts. Rules over a Knowledge Graph capture interpretable patterns in data and various methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as con&#64257;dence re&#64258;ect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated. Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external information sources including text corpora. The contributions of this thesis are as follows: &#8226; We introduce a framework for rule learning guided by external sources. &#8226; We propose a concrete instantiation of our framework to show how to learn high- quality rules by utilizing feedback from a pretrained embedding model. &#8226; We conducted experiments on real-world KGs that demonstrate the e&#64256;ectiveness of our novel approach with respect to both the quality of the learned rules and fact predictions that they produce.
[83]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.
Export
BibTeX
@inproceedings{Hui_WSDM2018, TITLE = {Co-{PACRR}: {A} Context-Aware Neural {IR} Model for Ad-hoc Retrieval}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5581-0}, DOI = {10.1145/3159652.3159689}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining}, PAGES = {279--287}, ADDRESS = {Marina Del Rey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0000-6367-D %R 10.1145/3159652.3159689 %D 2018 %B 11th ACM International Conference on Web Search and Data Mining %Z date of event: 2018-02-05 - 2018-02-09 %C Marina Del Rey, CA, USA %B WSDM'18 %P 279 - 287 %I ACM %@ 978-1-4503-5581-0
[84]
M. Humble, “Redescription Mining on Financial Time Series Data,” Universität des Saarlandes, Saarbrücken, 2018.
Export
BibTeX
@mastersthesis{Humble_BSc2017, TITLE = {Redescription Mining on Financial Time Series Data}, AUTHOR = {Humble, Megan}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, TYPE = {Bachelor's thesis}, }
Endnote
%0 Thesis %A Humble, Megan %Y Miettinen, Pauli %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Redescription Mining on Financial Time Series Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-F042-4 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2018 %P XV, 100 p. %V bachelor %9 bachelor
[85]
H. Jhavar and P. Mirza, “EMOFIEL: Mapping Emotions of Relationships in a Story,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{JhavarWWW2018, TITLE = {{EMOFIEL}: {M}apping Emotions of Relationships in a Story}, AUTHOR = {Jhavar, Harshita and Mirza, Paramita}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3186989}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {243--246}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Jhavar, Harshita %A Mirza, Paramita %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T EMOFIEL: Mapping Emotions of Relationships in a Story : %G eng %U http://hdl.handle.net/21.11116/0000-0001-4B96-2 %R 10.1145/3184558.3186989 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel %P 243 - 246 %I ACM %@ 978-1-4503-5640-4
[86]
Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum, “TEQUILA: Temporal Question Answering over Knowledge Bases,” in CIKM’18, 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 2018.
Export
BibTeX
@inproceedings{Jia_CIKM2018, TITLE = {{TEQUILA}: {T}emporal Question Answering over Knowledge Bases}, AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6014-2}, DOI = {10.1145/3269206.3269247}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {CIKM'18, 27th ACM International Conference on Information and Knowledge Management}, EDITOR = {Cuzzocrea, Alfredo and Allan, James and Paton, Norman and Srivastava, Divesh and Agrawal, Rakesh and Broder, Andrei and Zaki, Mohamed and Candan, Selcuk and Labrinidis, Alexandros and Schuster, Assaf and Wang, Haixun}, PAGES = {1807--1810}, ADDRESS = {Torino, Italy}, }
Endnote
%0 Conference Proceedings %A Jia, Zhen %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Str&#246;tgen, Jannik %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T TEQUILA: Temporal Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A106-1 %R 10.1145/3269206.3269247 %D 2018 %B 27th ACM International Conference on Information and Knowledge Management %Z date of event: 2018-10-22 - 2018-10-26 %C Torino, Italy %B CIKM'18 %E Cuzzocrea, Alfredo; Allan, James; Paton, Norman; Srivastava, Divesh; Agrawal, Rakesh; Broder, Andrei; Zaki, Mohamed; Candan, Selcuk; Labrinidis, Alexandros; Schuster, Assaf; Wang, Haixun %P 1807 - 1810 %I ACM %@ 978-1-4503-6014-2
[87]
Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum, “TempQuestions: A Benchmark for Temporal Question Answering,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{JiaWWW2017, TITLE = {{TempQuestions}: {A} Benchmark for Temporal Question Answering}, AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3191536}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {1057--1062}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Jia, Zhen %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Str&#246;tgen, Jannik %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TempQuestions: A Benchmark for Temporal Question Answering : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3C80-B %R 10.1145/3184558.3191536 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel %P 1057 - 1062 %I ACM %@ 978-1-4503-5640-4
[88]
J. Kalofolias, E. Galbrun, and P. Miettinen, “From Sets of Good Redescriptions to Good Sets of Redescriptions,” Knowledge and Information Systems, vol. 57, no. 1, 2018.
Export
BibTeX
@article{kalofolias18from, TITLE = {From Sets of Good Redescriptions to Good Sets of Redescriptions}, AUTHOR = {Kalofolias, Janis and Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {0219-1377}, DOI = {10.1007/s10115-017-1149-7}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, JOURNAL = {Knowledge and Information Systems}, VOLUME = {57}, NUMBER = {1}, PAGES = {21--54}, }
Endnote
%0 Journal Article %A Kalofolias, Janis %A Galbrun, Esther %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T From Sets of Good Redescriptions to Good Sets of Redescriptions : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90D1-5 %R 10.1007/s10115-017-1149-7 %7 2018-01-19 %D 2018 %J Knowledge and Information Systems %V 57 %N 1 %& 21 %P 21 - 54 %I Springer %C New York, NY %@ false
[89]
S. Karaev, J. Hook, and P. Miettinen, “Latitude: A Model for Mixed Linear-Tropical Matrix Factorization,” 2018. [Online]. Available: http://arxiv.org/abs/1801.06136. (arXiv: 1801.06136)
Abstract
Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.
Export
BibTeX
@online{Karaev2018, TITLE = {Latitude: A Model for Mixed Linear-Tropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Hook, James and Miettinen, Pauli}, URL = {http://arxiv.org/abs/1801.06136}, EPRINT = {1801.06136}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.}, }
Endnote
%0 Report %A Karaev, Sanjar %A Hook, James %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Latitude: A Model for Mixed Linear-Tropical Matrix Factorization : %U http://hdl.handle.net/21.11116/0000-0000-636B-9 %U http://arxiv.org/abs/1801.06136 %D 2018 %X Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone. %K Computer Science, Learning, cs.LG
[90]
S. Karaev, J. Hook, and P. Miettinen, “Latitude: A Model for Mixed Linear-Tropical Matrix Factorization,” in Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018), San Diego, CA, USA, 2018.
Export
BibTeX
@inproceedings{Karaev_SDM2018, TITLE = {Latitude: A Model for Mixed Linear-Tropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Hook, James and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-61197-532-1}, DOI = {10.1137/1.9781611975321.41}, PUBLISHER = {SIAM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018)}, EDITOR = {Ester, Martin and Pedreschi, Dino}, PAGES = {360--368}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Karaev, Sanjar %A Hook, James %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Latitude: A Model for Mixed Linear-Tropical Matrix Factorization : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E2D-4 %R 10.1137/1.9781611975321.41 %D 2018 %B SIAM International Conference on Data Mining %Z date of event: 2018-05-03 - 2018-05-05 %C San Diego, CA, USA %B Proceedings of the 2018 SIAM International Conference on Data Mining %E Ester, Martin; Pedreschi, Dino %P 360 - 368 %I SIAM %@ 978-1-61197-532-1
[91]
S. Karaev, S. Metzler, and P. Miettinen, “Logistic-Tropical Decompositions and Nested Subgraphs,” in Proceedings of the 14th International Workshop on Mining and Learning with Graphs (MLG 2018), London, UK, 2018.
Export
BibTeX
@inproceedings{Karaev_MLG2018, TITLE = {Logistic-Tropical Decompositions and Nested Subgraphs}, AUTHOR = {Karaev, Sanjar and Metzler, Saskia and Miettinen, Pauli}, LANGUAGE = {eng}, PUBLISHER = {MLG Workshop}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 14th International Workshop on Mining and Learning with Graphs (MLG 2018)}, EID = {35}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Karaev, Sanjar %A Metzler, Saskia %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Logistic-Tropical Decompositions and Nested Subgraphs : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A91F-E %D 2018 %B 14th International Workshop on Mining and Learning with Graphs %Z date of event: 2018-08-20 - 2018-08-20 %C London, UK %B Proceedings of the 14th International Workshop on Mining and Learning with Graphs %Z sequence number: 35 %I MLG Workshop %U http://www.mlgworkshop.org/2018/papers/MLG2018_paper_35.pdf
[92]
P. Lahoti, K. Garimella, and A. Gionis, “Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.
Export
BibTeX
@inproceedings{Lahoti_WSDM2018, TITLE = {Joint Non-negative Matrix Factorization for Learning Ideological Leaning on {T}witter}, AUTHOR = {Lahoti, Preethi and Garimella, Kiran and Gionis, Aristides}, LANGUAGE = {eng}, ISBN = {978-1-4503-5581-0}, DOI = {10.1145/3159652.3159669}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining}, PAGES = {351--359}, ADDRESS = {Marina Del Rey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Lahoti, Preethi %A Garimella, Kiran %A Gionis, Aristides %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9C4F-7 %R 10.1145/3159652.3159669 %D 2018 %B 11th ACM International Conference on Web Search and Data Mining %Z date of event: 2018-02-05 - 2018-02-09 %C Marina Del Rey, CA, USA %B WSDM'18 %P 351 - 359 %I ACM %@ 978-1-4503-5581-0
[93]
P. Lahoti, G. Weikum, and K. P. Gummadi, “iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making,” 2018. [Online]. Available: http://arxiv.org/abs/1806.01059. (arXiv: 1806.01059)
Abstract
People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.
Export
BibTeX
@online{Lahoti_arXiv1806.01059, TITLE = {{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making}, AUTHOR = {Lahoti, Preethi and Weikum, Gerhard and Gummadi, Krishna P.}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1806.01059}, EPRINT = {1806.01059}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.}, }
Endnote
%0 Report %A Lahoti, Preethi %A Weikum, Gerhard %A Gummadi, Krishna P. %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making : %G eng %U http://hdl.handle.net/21.11116/0000-0002-1545-9 %U http://arxiv.org/abs/1806.01059 %D 2018 %X People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting. %K Computer Science, Learning, cs.LG,Computer Science, Information Retrieval, cs.IR,Statistics, Machine Learning, stat.ML
[94]
C. Li, Y. Sun, B. He, L. Wang, K. Hui, A. Yates, L. Sun, and J. Xu, “NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.
Export
BibTeX
@inproceedings{DBLP:conf/emnlp/LiSHWHYSX18, TITLE = {{NPRF}: {A} Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval}, AUTHOR = {Li, Canjia and Sun, Yingfei and He, Ben and Wang, Le and Hui, Kai and Yates, Andrew and Sun, Le and Xu, Jungang}, LANGUAGE = {eng}, ISBN = {978-1-948087-84-1}, URL = {https://aclanthology.info/papers/D18-1478/d18-1478}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)}, EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii}, PAGES = {4482--4491}, ADDRESS = {Brussels, Belgium}, }
Endnote
%0 Conference Proceedings %A Li, Canjia %A Sun, Yingfei %A He, Ben %A Wang, Le %A Hui, Kai %A Yates, Andrew %A Sun, Le %A Xu, Jungang %+ External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0003-11BB-7 %U https://aclanthology.info/papers/D18-1478/d18-1478 %D 2018 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2018-10-31 - 2018-11-04 %C Brussels, Belgium %B The Conference on Empirical Methods in Natural Language Processing %E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii %P 4482 - 4491 %I ACL %@ 978-1-948087-84-1
[95]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Overcoming Low-Utility Facets for Complex Answer Retrieval,” in SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH (ProfS-KG4IR-Data:Search 2018), Ann Arbor, MI, USA, 2018.
Export
BibTeX
@inproceedings{MacAvaney_KG4IR2018, TITLE = {Overcoming Low-Utility Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, URL = {http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf; urn:nbn:de:0074-2127-8}, PUBLISHER = {ceur.ws.org}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH (ProfS-KG4IR-Data:Search 2018)}, EDITOR = {Dietz, Laura and Koetzen, Laura and Verberne, Suzan}, PAGES = {46--47}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2127}, ADDRESS = {Ann Arbor, MI, USA}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Overcoming Low-Utility Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E9C-6 %U http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf %D 2018 %B Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding %Z date of event: 2018-07-12 - 2018-07-12 %C Ann Arbor, MI, USA %B SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH %E Dietz, Laura; Koetzen, Laura; Verberne, Suzan %P 46 - 47 %I ceur.ws.org %B CEUR Workshop Proceedings %N 2127 %U http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf
[96]
S. MacAvaney, B. Desmet, A. Cohan, L. Soldaini, A. Yates, A. Zirikly, and N. Goharian, “RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses,” in Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2018), New Orleans, LA, USA, 2018.
Export
BibTeX
@inproceedings{MacAvaney_NAACL_HLT2018, TITLE = {{RSDD}-Time: {T}emporal Annotation of Self-Reported Mental Health Diagnoses}, AUTHOR = {MacAvaney, Sean and Desmet, Bart and Cohan, Arman and Soldaini, Luca and Yates, Andrew and Zirikly, Ayah and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {978-1-948087-12-4}, URL = {http://aclweb.org/anthology/W18-0618}, DOI = {10.18653/v1/W18-0618}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2018)}, EDITOR = {Loveys, Kate and Niederhoffer, Kate and Prud'hommeaux, Emily and Resnik, Rebecca and Resnik, Philip}, PAGES = {168--173}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Desmet, Bart %A Cohan, Arman %A Soldaini, Luca %A Yates, Andrew %A Zirikly, Ayah %A Goharian, Nazli %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E8C-8 %U http://aclweb.org/anthology/W18-0618 %R 10.18653/v1/W18-0618 %D 2018 %B Fifth Workshop on Computational Linguistics and Clinical Psychology %Z date of event: 2018-06-05 - 2018-06-05 %C New Orleans, LA, USA %B Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology %E Loveys, Kate; Niederhoffer, Kate; Prud'hommeaux, Emily; Resnik, Rebecca; Resnik, Philip %P 168 - 173 %I ACL %@ 978-1-948087-12-4 %U https://aclanthology.info/papers/W18-0618/w18-0618
[97]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Characterizing Question Facets for Complex Answer Retrieval,” in SIGIR’18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 2018.
Export
BibTeX
@inproceedings{MacAvaney_SIGIR2018, TITLE = {Characterizing Question Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, ISBN = {978-1-4503-5657-2}, DOI = {10.1145/3209978.3210135}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {SIGIR'18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {1205--1208}, ADDRESS = {Ann Arbor, MI, USA}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Characterizing Question Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ECA-2 %R 10.1145/3209978.3210135 %D 2018 %B 41st International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2018-07-08 - 2018-07-12 %C Ann Arbor, MI, USA %B SIGIR'18 %P 1205 - 1208 %I ACM %@ 978-1-4503-5657-2
[98]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Characterizing Question Facets for Complex Answer Retrieval,” 2018. [Online]. Available: http://arxiv.org/abs/1805.00791. (arXiv: 1805.00791)
Abstract
Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method.
Export
BibTeX
@online{MacAvernay_arXIv1805.00791, TITLE = {Characterizing Question Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1805.00791}, EPRINT = {1805.00791}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Characterizing Question Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ECE-E %U http://arxiv.org/abs/1805.00791 %D 2018 %X Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method. %K Computer Science, Information Retrieval, cs.IR
[99]
S. MacAvaney, B. Desmet, A. Cohan, L. Soldaini, A. Yates, A. Zirikly, and N. Goharian, “RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses,” 2018. [Online]. Available: http://arxiv.org/abs/1806.07916. (arXiv: 1806.07916)
Abstract
Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Furthermore, we include exact temporal spans that relate to the date of diagnosis. This information is valuable for various computational methods to examine mental health through social media because one's mental health state is not static. We also test several baseline classification and extraction approaches, which suggest that extracting temporal information from self-reported diagnosis statements is challenging.
Export
BibTeX
@online{MacAveray_arXiv1806.07916, TITLE = {{RSDD}-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses}, AUTHOR = {MacAvaney, Sean and Desmet, Bart and Cohan, Arman and Soldaini, Luca and Yates, Andrew and Zirikly, Ayah and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1806.07916}, EPRINT = {1806.07916}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Furthermore, we include exact temporal spans that relate to the date of diagnosis. This information is valuable for various computational methods to examine mental health through social media because one's mental health state is not static. We also test several baseline classification and extraction approaches, which suggest that extracting temporal information from self-reported diagnosis statements is challenging.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Desmet, Bart %A Cohan, Arman %A Soldaini, Luca %A Yates, Andrew %A Zirikly, Ayah %A Goharian, Nazli %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ED9-1 %U http://arxiv.org/abs/1806.07916 %D 2018 %X Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Furthermore, we include exact temporal spans that relate to the date of diagnosis. This information is valuable for various computational methods to examine mental health through social media because one's mental health state is not static. We also test several baseline classification and extraction approaches, which suggest that extracting temporal information from self-reported diagnosis statements is challenging. %K Computer Science, Computation and Language, cs.CL
[100]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” in IEEE International Conference on Data Mining (ICDM 2018), Singapore, Singapore, 2018.
Export
BibTeX
@inproceedings{mandros:18:fedora, TITLE = {Discovering Reliable Dependencies from Data: {H}ardness and Improved Algorithms}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-9159-5}, DOI = {10.1109/ICDM.2018.00047}, PUBLISHER = {IEEE}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {IEEE International Conference on Data Mining (ICDM 2018)}, PAGES = {317--326}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EA2-5 %R 10.1109/ICDM.2018.00047 %D 2018 %B IEEE International Conference on Data Mining %Z date of event: 2018-11-17 - 2018-11-20 %C Singapore, Singapore %B IEEE International Conference on Data Mining %P 317 - 326 %I IEEE %@ 978-1-5386-9159-5
[101]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” 2018. [Online]. Available: http://arxiv.org/abs/1809.05467. (arXiv: 1809.05467)
Abstract
The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.
Export
BibTeX
@online{Mandros_arXiv1809.05467, TITLE = {Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1809.05467}, EPRINT = {1809.05467}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.}, }
Endnote
%0 Report %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EC9-A %U http://arxiv.org/abs/1809.05467 %D 2018 %X The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT
[102]
A. Marx and J. Vreeken, “Causal Discovery by Telling Apart Parents and Children,” 2018. [Online]. Available: http://arxiv.org/abs/1808.06356. (arXiv: 1808.06356)
Abstract
We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence---and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket---and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected---and show this improves their precision, recall and F1 scores by up to 20%.
Export
BibTeX
@online{Marx_arXiv1808.06356, TITLE = {Causal Discovery by Telling Apart Parents and Children}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1808.06356}, EPRINT = {1808.06356}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence---and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket---and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected---and show this improves their precision, recall and F1 scores by up to 20%.}, }
Endnote
%0 Report %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Discovery by Telling Apart Parents and Children : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5F36-8 %U http://arxiv.org/abs/1808.06356 %D 2018 %X We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence---and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket---and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected---and show this improves their precision, recall and F1 scores by up to 20%. %K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG
[103]
A. Marx and J. Vreeken, “Stochastic Complexity for Testing Conditional Independence on Discrete Data,” in Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018), Montréal, Canada, 2018.
Export
BibTeX
@inproceedings{marx:18:dice, TITLE = {Stochastic Complexity for Testing Conditional Independence on Discrete Data}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {https://drive.google.com/file/d/1mMkO5YZ5gkBRRFbfYb4DDRCsCN243eb2/view}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018)}, EID = {10}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Stochastic Complexity for Testing Conditional Independence on Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EC2-1 %U https://drive.google.com/file/d/1mMkO5YZ5gkBRRFbfYb4DDRCsCN243eb2/view %D 2018 %B NeurIPS 2018 Workshop on Causal Learning %Z date of event: 2018-12-07 - 2018-12-07 %C Montr&#233;al, Canada %B Proceedings of the NeurIPS 2018 workshop on Causal Learning %Z sequence number: 10
[104]
S. Metzler and P. Miettinen, “Random Graph Generators for Hyperbolic Community Structures,” in Complex Networks and Their Applications VII, Cambridge, UK, 2018.
Export
BibTeX
@inproceedings{Metzler_COMPLEXNETWORKS2018, TITLE = {Random Graph Generators for Hyperbolic Community Structures}, AUTHOR = {Metzler, Saskia and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-3-030-05410-6; 978-3-030-05411-3}, DOI = {10.1007/978-3-030-05411-3_54}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Complex Networks and Their Applications VII}, EDITOR = {Aiello, Luca Maria and Cherifi, Chantal and Cherifi, Hocine and Lambiotte, Renaud and Li{\'o}, Pietro and Rocha, Luis M.}, PAGES = {680--693}, SERIES = {Studies in Computational Intelligence}, VOLUME = {812}, ADDRESS = {Cambridge, UK}, }
Endnote
%0 Conference Proceedings %A Metzler, Saskia %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Random Graph Generators for Hyperbolic Community Structures : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A929-2 %R 10.1007/978-3-030-05411-3_54 %D 2018 %B 7th International Conference on Complex Networks and Their Applications %Z date of event: 2018-12-11 - 2018-12-13 %C Cambridge, UK %B Complex Networks and Their Applications VII %E Aiello, Luca Maria; Cherifi, Chantal; Cherifi, Hocine; Lambiotte, Renaud; Li&#243;, Pietro; Rocha, Luis M. %P 680 - 693 %I Springer %@ 978-3-030-05410-6 978-3-030-05411-3 %B Studies in Computational Intelligence %N 812
[105]
P. Mirza, F. Darari, and R. Mahendra, “KOI at SemEval-2018 Task 5: Building Knowledge Graph of Incidents,” in Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, 2018.
Export
BibTeX
@inproceedings{S18-1010, TITLE = {{KOI} at {SemEval}-2018 Task 5: {B}uilding Knowledge Graph of Incidents}, AUTHOR = {Mirza, Paramita and Darari, Fariz and Mahendra, Rahmad}, LANGUAGE = {eng}, ISBN = {978-1-948087-20-9}, DOI = {10.18653/v1/S18-1010}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018)}, EDITOR = {Apidianaki, Marianna and Mohammad, Saif M. and May, Jonathan and Shutova, Ekatarina and Bethard, Steven and Carpuat, Marine}, PAGES = {81--87}, ADDRESS = {New Orleans, LA}, }
Endnote
%0 Conference Proceedings %A Mirza, Paramita %A Darari, Fariz %A Mahendra, Rahmad %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T KOI at SemEval-2018 Task 5: Building Knowledge Graph of Incidents : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A818-6 %R 10.18653/v1/S18-1010 %D 2018 %B Twelfth International Workshop on Semantic Evaluation %Z date of event: 2018-06-05 - 2018-06-06 %C New Orleans, LA %B Proceedings of the 12th International Workshop on Semantic Evaluation %E Apidianaki, Marianna; Mohammad, Saif M.; May, Jonathan; Shutova, Ekatarina; Bethard, Steven; Carpuat, Marine %P 81 - 87 %I ACL %@ 978-1-948087-20-9 %U http://aclweb.org/anthology/S18-1010
[106]
P. Mirza, S. Razniewski, F. Darari, and G. Weikum, “Enriching Knowledge Bases with Counting Quantifiers,” in The Semantic Web -- ISWC 201, Monterey, CA, USA, 2018.
Export
BibTeX
@inproceedings{MirzaISWC2018, TITLE = {Enriching Knowledge Bases with Counting Quantifiers}, AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-00670-9}, DOI = {10.1007/978-3-030-00671-6_11}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {The Semantic Web -- ISWC 201}, EDITOR = {Vrande{\v c}i{\'c}, Denny and Bontcheva, Kalina and Su{\'a}rez-Figueroa, Mari Carmen and Presutti, Valentina and Celino, Irene and Sabou, Marta and Kaffee, Luci-Aim{\'e}e and Simperl, Elena}, PAGES = {179--197}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11136}, ADDRESS = {Monterey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Mirza, Paramita %A Razniewski, Simon %A Darari, Fariz %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Enriching Knowledge Bases with Counting Quantifiers : %G eng %U http://hdl.handle.net/21.11116/0000-0001-E170-2 %R 10.1007/978-3-030-00671-6_11 %D 2018 %B The 17th International Semantic Web Conference %Z date of event: 2018-10-08 - 2018-10-12 %C Monterey, CA, USA %B The Semantic Web -- ISWC 201 %E Vrande&#269;i&#263;, Denny; Bontcheva, Kalina; Su&#225;rez-Figueroa, Mari Carmen; Presutti, Valentina; Celino, Irene; Sabou, Marta; Kaffee, Luci-Aim&#233;e; Simperl, Elena %P 179 - 197 %I Springer %@ 978-3-030-00670-9 %B Lecture Notes in Computer Science %N 11136
[107]
P. Mirza, S. Razniewski, F. Darari, and G. Weikum, “Enriching Knowledge Bases with Counting Quantifiers,” 2018. [Online]. Available: http://arxiv.org/abs/1807.03656. (arXiv: 1807.03656)
Abstract
Information extraction traditionally focuses on extracting relations between identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.
Export
BibTeX
@online{Mirza_arXiv:1807.03656, TITLE = {Enriching Knowledge Bases with Counting Quantifiers}, AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1807.03656}, EPRINT = {1807.03656}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Information extraction traditionally focuses on extracting relations between identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.}, }
Endnote
%0 Report %A Mirza, Paramita %A Razniewski, Simon %A Darari, Fariz %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Enriching Knowledge Bases with Counting Quantifiers : %G eng %U http://hdl.handle.net/21.11116/0000-0001-E16D-7 %U http://arxiv.org/abs/1807.03656 %D 2018 %X Information extraction traditionally focuses on extracting relations between identifiable entities, such as <Monterey, locatedIn, California>. Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations. %K Computer Science, Computation and Language, cs.CL
[108]
A. Mishra, “Leveraging Semantic Annotations for Event-focused Search & Summarization,” Universität des Saarlandes, Saarbrücken, 2018.
Abstract
Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.
Export
BibTeX
@phdthesis{Mishraphd2018, TITLE = {Leveraging Semantic Annotations for Event-focused Search \& Summarization}, AUTHOR = {Mishra, Arunav}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-271081}, DOI = {10.22028/D291-27108}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: \mbox{$\bullet$} We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. \mbox{$\bullet$} We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. \mbox{$\bullet$} To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.}, }
Endnote
%0 Thesis %A Mishra, Arunav %Y Berberich, Klaus %A referee: Weikum, Gerhard %A referee: Hauff, Claudia %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Leveraging Semantic Annotations for Event-focused Search & Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0001-1844-8 %U urn:nbn:de:bsz:291-scidok-ds-271081 %R 10.22028/D291-27108 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2018 %8 08.02.2018 %P 252 p. %V phd %9 phd %X Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: &#8226; We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. &#8226; We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. &#8226; To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26995
[109]
S. Nag Chowdhury, N. Tandon, H. Ferhatosmanoglu, and G. Weikum, “VISIR: Visual and Semantic Image Label Refinement,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.
Export
BibTeX
@inproceedings{NagChowdhury_WSDM2018, TITLE = {{VISIR}: {V}isual and Semantic Image Label Refinement}, AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Ferhatosmanoglu, Hakan and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5581-0}, DOI = {10.1145/3159652.3159693}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining}, PAGES = {117--125}, ADDRESS = {Marina Del Rey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Nag Chowdhury, Sreyasi %A Tandon, Niket %A Ferhatosmanoglu, Hakan %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T VISIR: Visual and Semantic Image Label Refinement : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3CA2-5 %R 10.1145/3159652.3159693 %D 2018 %B 11th ACM International Conference on Web Search and Data Mining %Z date of event: 2018-02-05 - 2018-02-09 %C Marina Del Rey, CA, USA %B WSDM'18 %P 117 - 125 %I ACM %@ 978-1-4503-5581-0
[110]
S. Paramonov, D. Stepanova, and P. Miettinen, “Hybrid ASP-based Approach to Pattern Mining,” 2018. [Online]. Available: http://arxiv.org/abs/1808.07302. (arXiv: 1808.07302)
Abstract
Detecting small sets of relevant patterns from a given dataset is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like Answer Set Programming (ASP) seem well-suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods either focus on scalability or on generality. In this paper we make steps towards combining local (frequency, size, cost) and global (various condensed representations like maximal, closed, skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework we apply it to a problem of approximately tiling a database. Experiments on real-world datasets show the effectiveness of the proposed method and computational gains for itemset, sequence and graph mining, as well as approximate tiling. Under consideration in Theory and Practice of Logic Programming (TPLP).
Export
BibTeX
@online{Paramonov_arXiv1808.07302, TITLE = {Hybrid {ASP}-based Approach to Pattern Mining}, AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1808.07302}, EPRINT = {1808.07302}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Detecting small sets of relevant patterns from a given dataset is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like Answer Set Programming (ASP) seem well-suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods either focus on scalability or on generality. In this paper we make steps towards combining local (frequency, size, cost) and global (various condensed representations like maximal, closed, skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework we apply it to a problem of approximately tiling a database. Experiments on real-world datasets show the effectiveness of the proposed method and computational gains for itemset, sequence and graph mining, as well as approximate tiling. Under consideration in Theory and Practice of Logic Programming (TPLP).}, }
Endnote
%0 Report %A Paramonov, Sergey %A Stepanova, Daria %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hybrid ASP-based Approach to Pattern Mining : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E60-9 %U http://arxiv.org/abs/1808.07302 %D 2018 %X Detecting small sets of relevant patterns from a given dataset is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like Answer Set Programming (ASP) seem well-suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods either focus on scalability or on generality. In this paper we make steps towards combining local (frequency, size, cost) and global (various condensed representations like maximal, closed, skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework we apply it to a problem of approximately tiling a database. Experiments on real-world datasets show the effectiveness of the proposed method and computational gains for itemset, sequence and graph mining, as well as approximate tiling. Under consideration in Theory and Practice of Logic Programming (TPLP). %K Computer Science, Artificial Intelligence, cs.AI
[111]
T. Pellissier Tanon, D. Stepanova, S. Razniewski, P. Mirza, and G. Weikum, “Completeness-aware Rule Learning from Knowledge Graphs,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, 2018.
Export
BibTeX
@inproceedings{PellissierIJCAI2018, TITLE = {Completeness-aware Rule Learning from Knowledge Graphs}, AUTHOR = {Pellissier Tanon, Thomas and Stepanova, Daria and Razniewski, Simon and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-0-9992411-2-7}, DOI = {10.24963/ijcai.2018/749}, PUBLISHER = {IJCAI}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018)}, EDITOR = {Lang, J{\'e}r{\^o}me}, PAGES = {5339--5343}, ADDRESS = {Stockholm, Sweden}, }
Endnote
%0 Conference Proceedings %A Pellissier Tanon, Thomas %A Stepanova, Daria %A Razniewski, Simon %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Completeness-aware Rule Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0001-9070-D %R 10.24963/ijcai.2018/749 %D 2018 %B 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence %Z date of event: 2018-07-13 - 2018-07-19 %C Stockholm, Sweden %B Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence %E Lang, J&#233;r&#244;me %P 5339 - 5343 %I IJCAI %@ 978-0-9992411-2-7 %U https://doi.org/10.24963/ijcai.2018/749
[112]
M. Ponza, L. Del Corro, and G. Weikum, “Facts That Matter,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.
Export
BibTeX
@inproceedings{D18-1129, TITLE = {Facts That Matter}, AUTHOR = {Ponza, Marco and Del Corro, Luciano and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-948087-84-1}, URL = {https://aclanthology.coli.uni-saarland.de/papers/D18-1129/d18-1129}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)}, EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii}, PAGES = {1043--1048}, ADDRESS = {Brussels, Belgium}, }
Endnote
%0 Conference Proceedings %A Ponza, Marco %A Del Corro, Luciano %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Facts That Matter : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A2C1-C %U https://aclanthology.coli.uni-saarland.de/papers/D18-1129/d18-1129 %D 2018 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2018-10-31 - 2018-11-04 %C Brussels, Belgium %B The Conference on Empirical Methods in Natural Language Processing %E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii %P 1043 - 1048 %I ACL %@ 978-1-948087-84-1
[113]
K. Popat, S. Mukherjee, A. Yates, and G. Weikum, “DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning,” 2018. [Online]. Available: http://arxiv.org/abs/1809.06416. (arXiv: 1809.06416)
Abstract
Misinformation such as fake news is one of the big challenges of our society. Research on automated fact-checking has proposed methods based on supervised learning, but these approaches do not consider external evidence apart from labeled training instances. Recent approaches counter this deficit by considering external sources related to a claim. However, these methods require substantial feature modeling and rich lexicons. This paper overcomes these limitations of prior work with an end-to-end model for evidence-aware credibility assessment of arbitrary textual claims, without any human intervention. It presents a neural network model that judiciously aggregates signals from external evidence articles, the language of these articles and the trustworthiness of their sources. It also derives informative features for generating user-comprehensible explanations that makes the neural network predictions transparent to the end-user. Experiments with four datasets and ablation studies show the strength of our method.
Export
BibTeX
@online{Popat_arXiv1809.06416, TITLE = {{DeClarE}: {D}ebunking Fake News and False Claims using Evidence-Aware Deep Learning}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1809.06416}, EPRINT = {1809.06416}, EPRINTTYPE = {arXiv}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Misinformation such as fake news is one of the big challenges of our society. Research on automated fact-checking has proposed methods based on supervised learning, but these approaches do not consider external evidence apart from labeled training instances. Recent approaches counter this deficit by considering external sources related to a claim. However, these methods require substantial feature modeling and rich lexicons. This paper overcomes these limitations of prior work with an end-to-end model for evidence-aware credibility assessment of arbitrary textual claims, without any human intervention. It presents a neural network model that judiciously aggregates signals from external evidence articles, the language of these articles and the trustworthiness of their sources. It also derives informative features for generating user-comprehensible explanations that makes the neural network predictions transparent to the end-user. Experiments with four datasets and ablation studies show the strength of our method.}, }
Endnote
%0 Report %A Popat, Kashyap %A Mukherjee, Subhabrata %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5EE1-7 %U http://arxiv.org/abs/1809.06416 %D 2018 %X Misinformation such as fake news is one of the big challenges of our society. Research on automated fact-checking has proposed methods based on supervised learning, but these approaches do not consider external evidence apart from labeled training instances. Recent approaches counter this deficit by considering external sources related to a claim. However, these methods require substantial feature modeling and rich lexicons. This paper overcomes these limitations of prior work with an end-to-end model for evidence-aware credibility assessment of arbitrary textual claims, without any human intervention. It presents a neural network model that judiciously aggregates signals from external evidence articles, the language of these articles and the trustworthiness of their sources. It also derives informative features for generating user-comprehensible explanations that makes the neural network predictions transparent to the end-user. Experiments with four datasets and ablation studies show the strength of our method. %K Computer Science, Computation and Language, cs.CL,Computer Science, Learning, cs.LG
[114]
K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum, “CredEye: A Credibility Lens for Analyzing and Explaining Misinformation,” in Companion of the Word Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{PopatWWW2017, TITLE = {{CredEye}: {A} Credibility Lens for Analyzing and Explaining Misinformation}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3186967}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Companion of the Word Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {155--158}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %A Mukherjee, Subhabrata %A Str&#246;tgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T CredEye: A Credibility Lens for Analyzing and Explaining Misinformation : %G eng %U http://hdl.handle.net/21.11116/0000-0000-B546-5 %R 10.1145/3184558.3186967 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the Word Wide Web Conference %E Champin, Pierre-Antoine; Gandon , Fabien; M&#233;dini, Lionel %P 155 - 158 %I ACM %@ 978-1-4503-5640-4
[115]
K. Popat, S. Mukherjee, A. Yates, and G. Weikum, “DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.
Export
BibTeX
@inproceedings{D18-1003, TITLE = {{DeClarE}: {D}ebunking Fake News and False Claims using Evidence-Aware Deep Learning}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-948087-84-1}, URL = {https://aclanthology.coli.uni-saarland.de/papers/D18-1003/d18-1003}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)}, EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii}, PAGES = {22--32}, ADDRESS = {Brussels, Belgium}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %A Mukherjee, Subhabrata %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning : %G eng %U http://hdl.handle.net/21.11116/0000-0002-B348-3 %U https://aclanthology.coli.uni-saarland.de/papers/D18-1003/d18-1003 %D 2018 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2018-10-31 - 2018-11-04 %C Brussels, Belgium %B The Conference on Empirical Methods in Natural Language Processing %E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii %P 22 - 32 %I ACL %@ 978-1-948087-84-1
[116]
Y. Ran, B. He, K. Hui, J. Xu, and L. Sun, “Neural Relevance Model Using Similarities with Elite Documents for Effective Clinical Decision Support,” International Journal of Data Mining and Bioinformatics, vol. 20, no. 2, 2018.
Export
BibTeX
@article{Ran_2018, TITLE = {Neural Relevance Model Using Similarities with Elite Documents for Effective Clinical Decision Support}, AUTHOR = {Ran, Yanhua and He, Ben and Hui, Kai and Xu, Jungang and Sun, Le}, LANGUAGE = {eng}, ISSN = {1748-5673}, DOI = {10.1504/IJDMB.2018.10015098}, PUBLISHER = {Inderscience Publ.}, ADDRESS = {Gen{\e}ve}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, JOURNAL = {International Journal of Data Mining and Bioinformatics}, VOLUME = {20}, NUMBER = {2}, PAGES = {91--108}, }
Endnote
%0 Journal Article %A Ran, Yanhua %A He, Ben %A Hui, Kai %A Xu, Jungang %A Sun, Le %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Neural Relevance Model Using Similarities with Elite Documents for Effective Clinical Decision Support : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5743-1 %R 10.1504/IJDMB.2018.10015098 %7 2018 %D 2018 %J International Journal of Data Mining and Bioinformatics %V 20 %N 2 %& 91 %P 91 - 108 %I Inderscience Publ. %C Gen&#232;ve %@ false
[117]
S. Razniewski and G. Weikum, “Knowledge Base Recall: Detecting and Resolving the Unknown Unknowns,” ACM SIGWEB Newsletter, no. Spring, 2018.
Export
BibTeX
@article{Razniewski2018, TITLE = {Knowledge Base Recall: Detecting and Resolving the Unknown Unknowns}, AUTHOR = {Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1145/3210578.3210581}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM SIGWEB Newsletter}, NUMBER = {Spring}, EID = {3}, }
Endnote
%0 Journal Article %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Knowledge Base Recall: Detecting and Resolving the Unknown Unknowns : %G eng %U http://hdl.handle.net/21.11116/0000-0001-E175-D %R 10.1145/3210578.3210581 %7 2018 %D 2018 %J ACM SIGWEB Newsletter %N Spring %Z sequence number: 3 %I ACM %C New York, NY
[118]
M. Ringsquandl, E. Kharlamov, D. Stepanova, M. Hildebrandt, S. Lamparter, R. Lepratti, I. Horrocks, and P. Kröger, “Event-Enhanced Learning for KG Completion,” in The Semantic Web (ESWC 2018), Heraklion, Crete, Greece, 2018.
Export
BibTeX
@inproceedings{Ringsquandl_ESWC2018, TITLE = {Event-Enhanced Learning for {KG} Completion}, AUTHOR = {Ringsquandl, Martin and Kharlamov, Evgeny and Stepanova, Daria and Hildebrandt, Marcel and Lamparter, Steffen and Lepratti, Raffaello and Horrocks, Ian and Kr{\"o}ger, Peer}, LANGUAGE = {eng}, ISBN = {978-3-319-93416-7}, DOI = {10.1007/978-3-319-93417-4_35}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {The Semantic Web (ESWC 2018)}, EDITOR = {Gangem, Aldo and Navigli, Roberto and Vidal, Maria-Esther and Hitzler, Pascal and Troncy, Rapha{\"e}l and Hollink, Laura and Tordai, Anna and Alam, Mehwish}, PAGES = {541--559}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10843}, ADDRESS = {Heraklion, Crete, Greece}, }
Endnote
%0 Conference Proceedings %A Ringsquandl, Martin %A Kharlamov, Evgeny %A Stepanova, Daria %A Hildebrandt, Marcel %A Lamparter, Steffen %A Lepratti, Raffaello %A Horrocks, Ian %A Kr&#246;ger, Peer %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Event-Enhanced Learning for KG Completion : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E82-2 %R 10.1007/978-3-319-93417-4_35 %D 2018 %B 15th Extended Semantic Web Conference %Z date of event: 2018-06-03 - 2018-06-07 %C Heraklion, Crete, Greece %B The Semantic Web %E Gangem, Aldo; Navigli, Roberto; Vidal, Maria-Esther; Hitzler, Pascal; Troncy, Rapha&#235;l; Hollink, Laura; Tordai, Anna; Alam, Mehwish %P 541 - 559 %I Springer %@ 978-3-319-93416-7 %B Lecture Notes in Computer Science %N 10843
[119]
M. Ringsquandl, E. Kharlamov, D. Stepanova, M. Hildebrandt, S. Lamparter, R. Lepratti, I. Horrocks, and P. Kroeger, “Filling Gaps in Industrial Knowledge Graphs via Event-Enhanced Embedding,” in ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018) (ISWC-P&D-Industry-BlueSky 2018), Monterey, CA, USA, 2018.
Export
BibTeX
@inproceedings{Ringsquandl_ISWC2018_Poster, TITLE = {Filling Gaps in Industrial Knowledge Graphs via Event-Enhanced Embedding}, AUTHOR = {Ringsquandl, Martin and Kharlamov, Evgeny and Stepanova, Daria and Hildebrandt, Marcel and Lamparter, Steffen and Lepratti, Raffaello and Horrocks, Ian and Kroeger, Peer}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {http://ceur-ws.org/Vol-2180/paper-52.pdf; urn:nbn:de:0074-2180-3}, PUBLISHER = {CEUR-WS.org}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ISWC 2018 Posters \& Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018) (ISWC-P\&D-Industry-BlueSky 2018)}, EDITOR = {van Erp, Marieke and Atre, Medha and Lopez, Vanessa and Srinivas, Kavitha and Fortuna, Carolina}, EID = {52}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2180}, ADDRESS = {Monterey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ringsquandl, Martin %A Kharlamov, Evgeny %A Stepanova, Daria %A Hildebrandt, Marcel %A Lamparter, Steffen %A Lepratti, Raffaello %A Horrocks, Ian %A Kroeger, Peer %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Filling Gaps in Industrial Knowledge Graphs via Event-Enhanced Embedding : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E67-2 %U http://ceur-ws.org/Vol-2180/paper-52.pdf %D 2018 %B 17th International Semantic Web Conference %Z date of event: 2018-10-08 - 2018-10-12 %C Monterey, CA, USA %B ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018) %E van Erp, Marieke; Atre, Medha; Lopez, Vanessa; Srinivas, Kavitha; Fortuna, Carolina %Z sequence number: 52 %I CEUR-WS.org %B CEUR Workshop Proceedings %N 2180 %@ false
[120]
D. Seyler, T. Dembelova, L. Del Corro, J. Hoffart, and G. Weikum, “A Study of the Importance of External Knowledge in the Named Entity Recognition Task,” in The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018.
Export
BibTeX
@inproceedings{AgrawalACL2018b, TITLE = {A Study of the Importance of External Knowledge in the Named Entity Recognition Task}, AUTHOR = {Seyler, Dominic and Dembelova, Tatiana and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-948087-34-6}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)}, PAGES = {241--246}, EID = {602}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Seyler, Dominic %A Dembelova, Tatiana %A Del Corro, Luciano %A Hoffart, Johannes %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T A Study of the Importance of External Knowledge in the Named Entity Recognition Task : %G eng %U http://hdl.handle.net/21.11116/0000-0002-0C65-0 %D 2018 %B The 56th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2018-07-15 - 2018-07-20 %C Melbourne, Australia %B The 56th Annual Meeting of the Association for Computational Linguistics %P 241 - 246 %Z sequence number: 602 %I ACL %@ 978-1-948087-34-6 %U http://aclweb.org/anthology/P18-2039
[121]
X. Shen, H. Su, S. Niu, and V. Demberg, “Improving Variational Encoder-Decoders in Dialogue Generation,” in Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2018.
Export
BibTeX
@inproceedings{shen2018improving, TITLE = {Improving Variational Encoder-Decoders in Dialogue Generation}, AUTHOR = {Shen, Xiaoyu and Su, Hui and Niu, Shuzi and Demberg, Vera}, LANGUAGE = {eng}, ISBN = {978-1-57735-800-8}, URL = {https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16402/16100}, PUBLISHER = {AAAI}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Thirty-Second AAAI Conference on Artificial Intelligence}, PAGES = {5456--5463}, EID = {16402}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Shen, Xiaoyu %A Su, Hui %A Niu, Shuzi %A Demberg, Vera %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Improving Variational Encoder-Decoders in Dialogue Generation : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0DAB-F %U https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16402/16100 %D 2018 %B Thirty-Second AAAI Conference on Artificial Intelligence %Z date of event: 2018-02-02 - 2018-02-07 %C New Orleans, LA, USA %B Thirty-Second AAAI Conference on Artificial Intelligence %P 5456 - 5463 %Z sequence number: 16402 %I AAAI %@ 978-1-57735-800-8
[122]
X. Shen, H. Su, W. Li, and D. Klakow, “NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.
Export
BibTeX
@inproceedings{shen2018nexus, TITLE = {{NEXUS} Network: {C}onnecting the Preceding and the Following in Dialogue Generation}, AUTHOR = {Shen, Xiaoyu and Su, Hui and Li, Wenjie and Klakow, Dietrich}, LANGUAGE = {eng}, ISBN = {978-1-948087-84-1}, URL = {http://aclweb.org/anthology/D18-1463}, PUBLISHER = {ACL}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)}, EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii}, PAGES = {4316--4327}, ADDRESS = {Brussels, Belgium}, }
Endnote
%0 Conference Proceedings %A Shen, Xiaoyu %A Su, Hui %A Li, Wenjie %A Klakow, Dietrich %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0DBD-B %U http://aclweb.org/anthology/D18-1463 %D 2018 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2018-10-31 - 2018-11-04 %C Brussels, Belgium %B The Conference on Empirical Methods in Natural Language Processing %E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii %P 4316 - 4327 %I ACL %@ 978-1-948087-84-1
[123]
M. Singh, A. Mishra, Y. Oualil, K. Berberich, and D. Klakow, “Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization,” in Advances in Information Retrieval (ECIR 2018), Grenoble, France, 2018.
Export
BibTeX
@inproceedings{SinghECIR2ss18, TITLE = {Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization}, AUTHOR = {Singh, Mittul and Mishra, Arunav and Oualil, Youssef and Berberich, Klaus and Klakow, Dietrich}, LANGUAGE = {eng}, ISBN = {978-3-319-76940-0}, DOI = {10.1007/978-3-319-76941-7_59}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2018)}, EDITOR = {Pasi, Gabriella and Piwowarski, Benjamin and Azzopardi, Leif and Hanbury, Allan}, PAGES = {657--664}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10772}, ADDRESS = {Grenoble, France}, }
Endnote
%0 Conference Proceedings %A Singh, Mittul %A Mishra, Arunav %A Oualil, Youssef %A Berberich, Klaus %A Klakow, Dietrich %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0001-413D-2 %R 10.1007/978-3-319-76941-7_59 %D 2018 %B 40th European Conference on IR Research %Z date of event: 2018-03-26 - 2018-03-29 %C Grenoble, France %B Advances in Information Retrieval %E Pasi, Gabriella; Piwowarski, Benjamin; Azzopardi, Leif; Hanbury, Allan %P 657 - 664 %I Springer %@ 978-3-319-76940-0 %B Lecture Notes in Computer Science %N 10772
[124]
A. Spitz, J. Strötgen, and M. Gertz, “Predicting Document Creation Times in News Citation Networks,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{SpitzWWW2017, TITLE = {Predicting Document Creation Times in News Citation Networks}, AUTHOR = {Spitz, Andreas and Str{\"o}tgen, Jannik and Gertz, Michael}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3191633}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {1731--1736}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Spitz, Andreas %A Str&#246;tgen, Jannik %A Gertz, Michael %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Predicting Document Creation Times in News Citation Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0000-B544-7 %R 10.1145/3184558.3191633 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; M&#233;dini, Lionel %P 1731 - 1736 %I ACM %@ 978-1-4503-5640-4
[125]
D. Stepanova, V. T. Ho, and M. H. Gad-Elrab, “Rule Induction and Reasoning over Knowledge Graphs,” in Reasoning Web, Esch-sur-Alzette, Luxembourg, 2018.
Export
BibTeX
@inproceedings{StepanovaRW2018, TITLE = {Rule Induction and Reasoning over Knowledge Graphs}, AUTHOR = {Stepanova, Daria and Ho, Vinh Thinh and Gad-Elrab, Mohamed Hassan}, LANGUAGE = {eng}, ISBN = {978-3-030-00337-1}, DOI = {10.1007/978-3-030-00338-8_6}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Reasoning Web}, EDITOR = {D'Amato, Claudia and Theobald, Martin}, PAGES = {142--172}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11078}, ADDRESS = {Esch-sur-Alzette, Luxembourg}, }
Endnote
%0 Conference Proceedings %A Stepanova, Daria %A Ho, Vinh Thinh %A Gad-Elrab, Mohamed Hassan %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Rule Induction and Reasoning over Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0001-9066-9 %R 10.1007/978-3-030-00338-8_6 %D 2018 %B 14th Reasoning Web Summer School %Z date of event: 2018-09-22 - 2018-09-26 %C Esch-sur-Alzette, Luxembourg %B Reasoning Web %E D'Amato, Claudia; Theobald, Martin %P 142 - 172 %I Springer %@ 978-3-030-00337-1 %B Lecture Notes in Computer Science %N 11078
[126]
J. Strötgen, R. Andrade, and D. Gupta, “Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations,” in JCDL’18, Joint Conference on Digital Libraries, Fort Worth, TX, USA, 2018.
Export
BibTeX
@inproceedings{StroetgenJCDL2018, TITLE = {Putting Dates on the Map: {H}arvesting and Analyzing Street Names with Date Mentions and their Explanations}, AUTHOR = {Str{\"o}tgen, Jannik and Andrade, Rosita and Gupta, Dhruv}, LANGUAGE = {eng}, ISBN = {978-1-4503-5178-2}, DOI = {10.1145/3197026.3197035}, PUBLISHER = {ACM}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {JCDL'18, Joint Conference on Digital Libraries}, PAGES = {79--88}, ADDRESS = {Fort Worth, TX, USA}, }
Endnote
%0 Conference Proceedings %A Str&#246;tgen, Jannik %A Andrade, Rosita %A Gupta, Dhruv %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations : %G eng %U http://hdl.handle.net/21.11116/0000-0000-B548-3 %R 10.1145/3197026.3197035 %D 2018 %B Joint Conference on Digital Libraries %Z date of event: 2018-06-03 - 2018-06-07 %C Fort Worth, TX, USA %B JCDL'18 %P 79 - 88 %I ACM %@ 978-1-4503-5178-2
[127]
J. Strötgen, A.-L. Minard, L. Lange, M. Speranza, and B. Magnini, “KRAUTS: A German Temporally Annotated News Corpus,” in Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.
Export
BibTeX
@inproceedings{StroetgenELREC2018, TITLE = {{KRAUTS}: {A German} Temporally Annotated News Corpus}, AUTHOR = {Str{\"o}tgen, Jannik and Minard, Anne-Lyse and Lange, Lukas and Speranza, Manuela and Magnini, Bernardo}, LANGUAGE = {eng}, ISBN = {979-10-95546-00-9}, URL = {http://lrec2018.lrec-conf.org/en/}, PUBLISHER = {ELRA}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, EDITOR = {Calzolari, Nicoletta and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Hasida, Koiti}, PAGES = {536--540}, ADDRESS = {Miyazaki, Japan}, }
Endnote
%0 Conference Proceedings %A Str&#246;tgen, Jannik %A Minard, Anne-Lyse %A Lange, Lukas %A Speranza, Manuela %A Magnini, Bernardo %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T KRAUTS: A German Temporally Annotated News Corpus : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-8B8C-E %U http://lrec2018.lrec-conf.org/en/ %D 2018 %B 11th Language Resources and Evaluation Conference %Z date of event: 2018-05-07 - 2018-05-12 %C Miyazaki, Japan %B Eleventh International Conference on Language Resources and Evaluation %E Calzolari, Nicoletta; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Hasida, Koiti %P 536 - 540 %I ELRA %@ 979-10-95546-00-9
[128]
H. Su, X. Shen, P. Hu, W. Li, and Y. Chen, “Dialogue Generation with GAN,” in Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2018.
Export
BibTeX
@inproceedings{Su_AAAI2018, TITLE = {Dialogue Generation with {GAN}}, AUTHOR = {Su, Hui and Shen, Xiaoyu and Hu, Pengwei and Li, Wenjie and Chen, Yun}, LANGUAGE = {eng}, ISBN = {978-1-57735-800-8}, URL = {https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16508/16519}, PUBLISHER = {AAAI}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Thirty-Second AAAI Conference on Artificial Intelligence}, PAGES = {8163--8164}, EID = {16402}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Su, Hui %A Shen, Xiaoyu %A Hu, Pengwei %A Li, Wenjie %A Chen, Yun %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Dialogue Generation with GAN : %G eng %U http://hdl.handle.net/21.11116/0000-0004-E562-B %U https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16508/16519 %D 2018 %B Thirty-Second AAAI Conference on Artificial Intelligence %Z date of event: 2018-02-02 - 2018-02-07 %C New Orleans, LA, USA %B Thirty-Second AAAI Conference on Artificial Intelligence %P 8163 - 8164 %Z sequence number: 16402 %I AAAI %@ 978-1-57735-800-8
[129]
L. Wang, Y. Wang, G. de Melo, and G. Weikum, “Five Shades of Untruth: Finer-Grained Classification of Fake News,” in Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mini (ASONAM 2918), Barcelona, Spain, 2018.
Export
BibTeX
@inproceedings{DBLP:conf/asunam/WangWMW18, TITLE = {Five Shades of Untruth: {F}iner-Grained Classification of Fake News}, AUTHOR = {Wang, Liqiang and Wang, Yafang and de Melo, Gerard and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-5386-6051-5}, DOI = {10.1109/ASONAM.2018.8508256}, PUBLISHER = {IEEE}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, BOOKTITLE = {Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mini (ASONAM 2918)}, EDITOR = {Brandes, Ulrik and Reddy, Chandan and Tagarelli, Andrea}, PAGES = {553--594}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Wang, Liqiang %A Wang, Yafang %A de Melo, Gerard %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Five Shades of Untruth: Finer-Grained Classification of Fake News : %G eng %U http://hdl.handle.net/21.11116/0000-0003-3633-7 %R 10.1109/ASONAM.2018.8508256 %D 2018 %B IEEE/ACM International Conference on Advances in Social Networks Analysis and Mini %Z date of event: 2018-08-28 - 2018-08-31 %C Barcelona, Spain %B Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mini %E Brandes, Ulrik; Reddy, Chandan; Tagarelli, Andrea %P 553 - 594 %I IEEE %@ 978-1-5386-6051-5
[130]
H. Wu, Y. Ning, P. Chakraborty, J. Vreeken, N. Tatti, and N. Ramakrishnan, “Generating Realistic Synthetic Population Datasets,” ACM Transactions on Knowledge Discovery from Data, vol. 12, no. 4, 2018.
Export
BibTeX
@article{Wu_2018, TITLE = {Generating Realistic Synthetic Population Datasets}, AUTHOR = {Wu, Hao and Ning, Yue and Chakraborty, Prithwish and Vreeken, Jilles and Tatti, Nikolaj and Ramakrishnan, Naren}, LANGUAGE = {eng}, DOI = {10.1145/3182383}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, JOURNAL = {ACM Transactions on Knowledge Discovery from Data}, VOLUME = {12}, NUMBER = {4}, PAGES = {1--22}, EID = {45}, }
Endnote
%0 Journal Article %A Wu, Hao %A Ning, Yue %A Chakraborty, Prithwish %A Vreeken, Jilles %A Tatti, Nikolaj %A Ramakrishnan, Naren %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Generating Realistic Synthetic Population Datasets : %G eng %U http://hdl.handle.net/21.11116/0000-0002-16ED-B %R 10.1145/3182383 %7 2018 %D 2018 %J ACM Transactions on Knowledge Discovery from Data %O TKDD %V 12 %N 4 %& 1 %P 1 - 22 %Z sequence number: 45 %I ACM %C New York, NY
[131]
Y. Zhao, X. Shen, H. Senuma, and A. Aizawa, “A Comprehensive Study: Sentence Compression with Linguistic Knowledge-enhanced Gated Neural Network,” Data & Knowledge Engineering, vol. 117, 2018.
Export
BibTeX
@article{Zhao_2018, TITLE = {A Comprehensive Study: Sentence Compression with Linguistic Knowledge-enhanced Gated Neural Network}, AUTHOR = {Zhao, Yang and Shen, Xiaoyu and Senuma, Hajime and Aizawa, Akiko}, LANGUAGE = {eng}, ISSN = {0169-023X}, DOI = {10.1016/j.datak.2018.05.007}, PUBLISHER = {Elsevier}, ADDRESS = {Amsterdam}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2018}, JOURNAL = {Data \& Knowledge Engineering}, VOLUME = {117}, PAGES = {307--318}, }
Endnote
%0 Journal Article %A Zhao, Yang %A Shen, Xiaoyu %A Senuma, Hajime %A Aizawa, Akiko %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T A Comprehensive Study: Sentence Compression with Linguistic Knowledge-enhanced Gated Neural Network : %G eng %U http://hdl.handle.net/21.11116/0000-0002-72D7-B %R 10.1016/j.datak.2018.05.007 %7 2018 %D 2018 %J Data & Knowledge Engineering %V 117 %& 307 %P 307 - 318 %I Elsevier %C Amsterdam %@ false
2017
[132]
A. Abujabal, M. Yahya, M. Riedewald, and G. Weikum, “Automated Template Generation for Question Answering over Knowledge Graphs,” in WWW’17, 26th International Conference on World Wide Web, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{AbujabalWWW2017, TITLE = {Automated Template Generation for Question Answering over Knowledge Graphs}, AUTHOR = {Abujabal, Abdalghani and Yahya, Mohamed and Riedewald, Mirek and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4913-0}, DOI = {10.1145/3038912.3052583}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17, 26th International Conference on World Wide Web}, PAGES = {1191--1200}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Yahya, Mohamed %A Riedewald, Mirek %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Automated Template Generation for Question Answering over Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4F9C-E %R 10.1145/3038912.3052583 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 %P 1191 - 1200 %I ACM %@ 978-1-4503-4913-0
[133]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “QUINT: Interpretable Question Answering over Knowledge Bases,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{AbujabalENMLP2017, TITLE = {{QUINT}: {I}nterpretable Question Answering over Knowledge Bases}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-97-5}, URL = {http://aclweb.org/anthology/D17-2011}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {61--66}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T QUINT: Interpretable Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-F97C-E %U http://aclweb.org/anthology/D17-2011 %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 61 - 66 %I ACL %@ 978-1-945626-97-5 %U http://aclweb.org/anthology/D17-2011
[134]
P. Agarwal and J. Strötgen, “Tiwiki: Searching Wikipedia with Temporal Constraints,” in WWW ’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{AgarwalStroetgen2017_TempWeb, TITLE = {Tiwiki: Searching {W}ikipedia with Temporal Constraints}, AUTHOR = {Agarwal, Prabal and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3051112}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW '17 Companion}, PAGES = {1595--1600}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Agarwal, Prabal %A Str&#246;tgen, Jannik %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Tiwiki: Searching Wikipedia with Temporal Constraints : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-53AE-9 %R 10.1145/3041021.3051112 %D 2017 %B 26th International Conference on World Wide Web Companion %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW '17 Companion %P 1595 - 1600 %I ACM %@ 978-1-4503-4914-7
[135]
R. Andrade and J. Strötgen, “All Dates Lead to Rome: Extracting and Explaining Temporal References in Street Names,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{AndradeWWW2017, TITLE = {All Dates Lead to {R}ome: {E}xtracting and Explaining Temporal References in Street Names}, AUTHOR = {Andrade, Rosita and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3054249}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {757--758}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Andrade, Rosita %A Str&#246;tgen, Jannik %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T All Dates Lead to Rome: Extracting and Explaining Temporal References in Street Names : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-62AE-1 %R 10.1145/3041021.3054249 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 757 - 758 %I ACM %@ 978-1-4503-4914-7
[136]
A. Bhattacharyya and J. Vreeken, “Efficiently Summarising Event Sequences with Rich Interleaving Patterns,” 2017. [Online]. Available: http://arxiv.org/abs/1701.08096. (arXiv: 1701.08096)
Abstract
Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values.
Export
BibTeX
@online{DBLP:journals/corr/BhattacharyyaV17, TITLE = {Efficiently Summarising Event Sequences with Rich Interleaving Patterns}, AUTHOR = {Bhattacharyya, Apratim and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1701.08096}, EPRINT = {1701.08096}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values.}, }
Endnote
%0 Report %A Bhattacharyya, Apratim %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Summarising Event Sequences with Rich Interleaving Patterns : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90E4-A %U http://arxiv.org/abs/1701.08096 %D 2017 %X Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[137]
A. Bhattacharyya and J. Vreeken, “Efficiently Summarising Event Sequences with Rich Interleaving Patterns,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{bhattacharyya:17:squish, TITLE = {Efficiently Summarising Event Sequences with Rich Interleaving Patterns}, AUTHOR = {Bhattacharyya, Apratim and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-497-3}, DOI = {10.1137/1.9781611974973.89}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, PAGES = {795--803}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Bhattacharyya, Apratim %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Summarising Event Sequences with Rich Interleaving Patterns : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BDC-D %R 10.1137/1.9781611974973.89 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %P 795 - 803 %I SIAM %@ 978-1-61197-497-3
[138]
A. J. Biega, R. Saha Roy, and G. Weikum, “Privacy through Solidarity: A User-Utility-Preserving Framework to Counter Profiling,” in SIGIR’17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 2017.
Export
BibTeX
@inproceedings{BiegaSIGIR2017, TITLE = {Privacy through Solidarity: {A} User-Utility-Preserving Framework to Counter Profiling}, AUTHOR = {Biega, Asia J. and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5022-8}, DOI = {10.1145/3077136.3080830}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {SIGIR'17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {675--684}, ADDRESS = {Shinjuku, Tokyo, Japan}, }
Endnote
%0 Conference Proceedings %A Biega, Asia J. %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Privacy through Solidarity: A User-Utility-Preserving Framework to Counter Profiling : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-F901-2 %R 10.1145/3077136.3080830 %D 2017 %B 40th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2017-08-07 - 2017-08-11 %C Shinjuku, Tokyo, Japan %B SIGIR'17 %P 675 - 684 %I ACM %@ 978-1-4503-5022-8
[139]
A. J. Biega, A. Ghazimatin, H. Ferhatosmanoglu, K. P. Gummadi, and G. Weikum, “Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Biega_CIKM2017, TITLE = {Learning to Un-Rank: {Q}uantifying Search Exposure for Users in Online Communities}, AUTHOR = {Biega, Asia J. and Ghazimatin, Azin and Ferhatosmanoglu, Hakan and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133040}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {267--276}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Biega, Asia J. %A Ghazimatin, Azin %A Ferhatosmanoglu, Hakan %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3BA4-5 %R 10.1145/3132847.3133040 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 267 - 276 %I ACM %@ 978-1-4503-4918-5
[140]
N. Boldyrev, M. Spaniol, J. Strötgen, and G. Weikum, “SESAME: European Statistics Explored via Semantic Alignment onto Wikipedia,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{BoldyrevWWW2017, TITLE = {{SESAME}: {E}uropean Statistics Explored via Semantic Alignment onto {Wikipedia}}, AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3054732}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {177--181}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Boldyrev, Natalia %A Spaniol, Marc %A Str&#246;tgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T SESAME: European Statistics Explored via Semantic Alignment onto Wikipedia : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80B0-0 %R 10.1145/3041021.3054732 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 177 - 181 %I ACM %@ 978-1-4503-4914-7
[141]
N. Boldyrev, “Alignment of Multi-Cultural Knowledge Repositories,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.
Export
BibTeX
@phdthesis{BOLDYREVPHD2017, TITLE = {Alignment of Multi-Cultural Knowledge Repositories}, AUTHOR = {Boldyrev, Natalia}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-269407}, DOI = {10.22028/D291-26940}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.}, }
Endnote
%0 Thesis %A Boldyrev, Natalia %Y Weikum, Gerhard %A referee: Berberich, Klaus %A referee: Spaniol, Marc %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Alignment of Multi-Cultural Knowledge Repositories : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-87D8-2 %R 10.22028/D291-26940 %U urn:nbn:de:bsz:291-scidok-ds-269407 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %8 06.12.2017 %P X, 124 p. %V phd %9 phd %X The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26891
[142]
M. Boley, B. R. Goldsmith, L. M. Ghiringhelli, and J. Vreeken, “Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery,” Data Mining and Knowledge Discovery, vol. 31, no. 5, 2017.
Export
BibTeX
@article{Boley2017, TITLE = {Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery}, AUTHOR = {Boley, Mario and Goldsmith, Bryan R. and Ghiringhelli, Luca M. and Vreeken, Jilles}, LANGUAGE = {eng}, DOI = {10.1007/s10618-017-0520-3}, PUBLISHER = {Springer}, ADDRESS = {London}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {Data Mining and Knowledge Discovery}, VOLUME = {31}, NUMBER = {5}, PAGES = {1391--1418}, }
Endnote
%0 Journal Article %A Boley, Mario %A Goldsmith, Bryan R. %A Ghiringhelli, Luca M. %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90E1-0 %R 10.1007/s10618-017-0520-3 %7 2017-06-28 %D 2017 %8 28.06.2017 %J Data Mining and Knowledge Discovery %V 31 %N 5 %& 1391 %P 1391 - 1418 %I Springer %C London
[143]
M. Boley, B. R. Goldsmith, L. M. Ghiringhelli, and J. Vreeken, “Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery,” 2017. [Online]. Available: http://arxiv.org/abs/1701.07696. (arXiv: 1701.07696)
Abstract
Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.
Export
BibTeX
@online{DBLP:journals/corr/BoleyGGV17, TITLE = {Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery}, AUTHOR = {Boley, Mario and Goldsmith, Bryan R. and Ghiringhelli, Luca M. and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1701.07696}, EPRINT = {1701.07696}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.}, }
Endnote
%0 Report %A Boley, Mario %A Goldsmith, Bryan R. %A Ghiringhelli, Luca M. %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90DB-F %U http://arxiv.org/abs/1701.07696 %D 2017 %X Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[144]
K. Budhathoki and J. Vreeken, “Causal Inference by Compression,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{budhathoki:16:origo, TITLE = {Causal Inference by Compression}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5090-5473-2}, DOI = {10.1109/ICDM.2016.0015}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, EDITOR = {Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong}, PAGES = {41--50}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference by Compression : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1CC0-6 %R 10.1109/ICDM.2016.0015 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %E Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong %P 41 - 50 %I IEEE %@ 978-1-5090-5473-2
[145]
K. Budhathoki and J. Vreeken, “Causal Inference by Stochastic Complexity,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06776. (arXiv: 1702.06776)
Abstract
The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.
Export
BibTeX
@online{DBLP:journals/corr/BudhathokiV17, TITLE = {Causal Inference by Stochastic Complexity}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1702.06776}, EPRINT = {1702.06776}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.}, }
Endnote
%0 Report %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference by Stochastic Complexity : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90F2-A %U http://arxiv.org/abs/1702.06776 %D 2017 %X The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes. %K Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI
[146]
K. Budhathoki and J. Vreeken, “Correlation by Compression,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{budhathoki:17:cbc, TITLE = {Correlation by Compression}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-611974-87-4}, DOI = {10.1137/1.9781611974973.59}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, EDITOR = {Chawla, Nitesh}, PAGES = {525--533}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Correlation by Compression : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BD8-6 %R 10.1137/1.9781611974973.59 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %E Chawla, Nitesh; Wang, Wei %P 525 - 533 %I SIAM %@ 978-1-611974-87-4
[147]
K. Budhathoki and J. Vreeken, “MDL for Causal Inference on Discrete Data,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{BudhathokiICDM2017, TITLE = {{MDL} for Causal Inference on Discrete Data}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.87}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {751--756}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T MDL for Causal Inference on Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0000-6458-D %R 10.1109/ICDM.2017.87 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 751 - 756 %I IEEE %@ 978-1-5386-3835-4
[148]
A. Chakraborty, A. Hannak, A. J. Biega, and K. Gummadi, “Fair Sharing for Sharing Economy Platforms,” in FATREC-Workshop on Responsible Recommendation, Como, Itlay, 2017.
Export
BibTeX
@inproceedings{Chakraborty_FATREC2017, TITLE = {Fair Sharing for Sharing Economy Platforms}, AUTHOR = {Chakraborty, Abhijnan and Hannak, Aniko and Biega, Asia J. and Gummadi, Krishna}, LANGUAGE = {eng}, DOI = {10.18122/B2BX2S}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {FATREC-Workshop on Responsible Recommendation}, ADDRESS = {Como, Itlay}, }
Endnote
%0 Conference Proceedings %A Chakraborty, Abhijnan %A Hannak, Aniko %A Biega, Asia J. %A Gummadi, Krishna %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Fair Sharing for Sharing Economy Platforms : %G eng %U http://hdl.handle.net/21.11116/0000-0002-57E1-E %R 10.18122/B2BX2S %D 2017 %B Fairness, Accountability and Transparency in Recommender Systems - Workshop on Responsible Recommendation %Z date of event: 2017-08-31 - 2017-08-31 %C Como, Itlay %B FATREC-Workshop on Responsible Recommendation
[149]
C. X. Chu, N. Tandon, and G. Weikum, “Distilling Task Knowledge from How-To Communities,” in WWW’17, 26th International Conference on World Wide Web, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{Cuong:WWW2017, TITLE = {Distilling Task Knowledge from How-To Communities}, AUTHOR = {Chu, Cuong Xuan and Tandon, Niket and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4913-0}, DOI = {10.1145/3038912.3052715}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17, 26th International Conference on World Wide Web}, PAGES = {805--814}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Chu, Cuong Xuan %A Tandon, Niket %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Distilling Task Knowledge from How-To Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-54BE-E %R 10.1145/3038912.3052715 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 %P 805 - 814 %I ACM %@ 978-1-4503-4913-0
[150]
A. Cohan, S. Young, A. Yates, and N. Goharian, “Triaging Content Severity in Online Mental Health Forums,” Journal of the Association for Information Science and Technology, vol. 68, no. 11, 2017.
Export
BibTeX
@article{Cohan2017, TITLE = {Triaging Content Severity in Online Mental Health Forums}, AUTHOR = {Cohan, Arman and Young, Sydney and Yates, Andrew and Goharian, Nazli}, LANGUAGE = {eng}, ISSN = {2330-1635}, DOI = {10.1002/asi.23865}, PUBLISHER = {Wiley}, ADDRESS = {Chichester, UK}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {Journal of the Association for Information Science and Technology}, VOLUME = {68}, NUMBER = {11}, PAGES = {2675--2689}, }
Endnote
%0 Journal Article %A Cohan, Arman %A Young, Sydney %A Yates, Andrew %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Triaging Content Severity in Online Mental Health Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06B9-8 %R 10.1002/asi.23865 %7 2017-09-25 %D 2017 %8 25.09.2017 %J Journal of the Association for Information Science and Technology %O asis&t %V 68 %N 11 %& 2675 %P 2675 - 2689 %I Wiley %C Chichester, UK %@ false
[151]
A. Cohan, S. Young, A. Yates, and N. Goharian, “Triaging Content Severity in Online Mental Health Forums,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06875. (arXiv: 1702.06875)
Abstract
Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We present a framework for triaging user content into four severity categories which are defined based on indications of self-harm ideation. Our models are based on a feature-rich classification framework which includes lexical, psycholinguistic, contextual and topic modeling features. Our approaches improve the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Using the proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate a decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.
Export
BibTeX
@online{Cohan_arXiv2017, TITLE = {Triaging Content Severity in Online Mental Health Forums}, AUTHOR = {Cohan, Arman and Young, Sydney and Yates, Andrew and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1702.06875}, EPRINT = {1702.06875}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We present a framework for triaging user content into four severity categories which are defined based on indications of self-harm ideation. Our models are based on a feature-rich classification framework which includes lexical, psycholinguistic, contextual and topic modeling features. Our approaches improve the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Using the proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate a decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.}, }
Endnote
%0 Report %A Cohan, Arman %A Young, Sydney %A Yates, Andrew %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Triaging Content Severity in Online Mental Health Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06AF-F %U http://arxiv.org/abs/1702.06875 %D 2017 %X Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We present a framework for triaging user content into four severity categories which are defined based on indications of self-harm ideation. Our models are based on a feature-rich classification framework which includes lexical, psycholinguistic, contextual and topic modeling features. Our approaches improve the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Using the proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate a decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need. %K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI
[152]
C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel, “Efficient Exploration of Telco Big Data with Compression and Decaying,” in ICDE 2017, 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 2017.
Export
BibTeX
@inproceedings{icde17-spate, TITLE = {Efficient Exploration of Telco Big Data with Compression and Decaying}, AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.}, LANGUAGE = {eng}, ISBN = {978-1-5090-6544-8}, DOI = {10.1109/ICDE.2017.175}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICDE 2017, 33rd IEEE International Conference on Data Engineering}, PAGES = {1332--1343}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Costa, Constantinos %A Chatzimilioudis, Georgios %A Zeinalipour-Yazti, Demetrios %A Mokbel, Mohamed F. %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Efficient Exploration of Telco Big Data with Compression and Decaying : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-62B3-4 %R 10.1109/ICDE.2017.175 %D 2017 %B 33rd IEEE International Conference on Data Engineering %Z date of event: 2017-04-19 - 2017-04-22 %C San Diego, CA, USA %B ICDE 2017 %P 1332 - 1343 %I IEEE %@ 978-1-5090-6544-8
[153]
C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel, “SPATE: Compacting and Exploring Telco Big Data,” in ICDE 2017, 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 2017.
Export
BibTeX
@inproceedings{icde17-spate-demo, TITLE = {{SPATE}: Compacting and Exploring Telco Big Data}, AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.}, LANGUAGE = {eng}, ISBN = {978-1-5090-6544-8}, DOI = {10.1109/ICDE.2017.203}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICDE 2017, 33rd IEEE International Conference on Data Engineering}, PAGES = {1419--1420}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Costa, Constantinos %A Chatzimilioudis, Georgios %A Zeinalipour-Yazti, Demetrios %A Mokbel, Mohamed F. %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T SPATE: Compacting and Exploring Telco Big Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-62BA-5 %R 10.1109/ICDE.2017.203 %D 2017 %B 33rd IEEE International Conference on Data Engineering %Z date of event: 2017-04-19 - 2017-04-22 %C San Diego, CA, USA %B ICDE 2017 %P 1419 - 1420 %I IEEE %@ 978-1-5090-6544-8
[154]
C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel, “Towards Real-Time Road Traffic Analytics using Telco Big Data,” in BIRTE ’17, Eleventh International Workshop on Real-Time Business Intelligence and Analytics, Munich, Germany, 2017.
Export
BibTeX
@inproceedings{birte17traffictbd, TITLE = {Towards Real-Time Road Traffic Analytics using {Telco Big Data}}, AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.}, LANGUAGE = {eng}, ISBN = {978-1-4503-5425-7}, DOI = {10.1145/3129292.3129296}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {BIRTE '17, Eleventh International Workshop on Real-Time Business Intelligence and Analytics}, EDITOR = {Chatziantoniou, Damianos and Castellanos, Malu and Chrysanthis, Panos K.}, EID = {5}, ADDRESS = {Munich, Germany}, }
Endnote
%0 Conference Proceedings %A Costa, Constantinos %A Chatzimilioudis, Georgios %A Zeinalipour-Yazti, Demetrios %A Mokbel, Mohamed F. %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Towards Real-Time Road Traffic Analytics using Telco Big Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-DDB7-A %R 10.1145/3129292.3129296 %D 2017 %B Eleventh International Workshop on Real-Time Business Intelligence and Analytics %Z date of event: 2017-08-28 - 2017-08-28 %C Munich, Germany %B BIRTE '17 %E Chatziantoniou, Damianos; Castellanos, Malu; Chrysanthis, Panos K. %Z sequence number: 5 %I ACM %@ 978-1-4503-5425-7
[155]
S. Das, A. Mishra, K. Berberich, and V. Setty, “Estimating Event Focus Time Using Neural Word Embeddings,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Das_CIKM2017, TITLE = {Estimating Event Focus Time Using Neural Word Embeddings}, AUTHOR = {Das, Supratim and Mishra, Arunav and Berberich, Klaus and Setty, Vinay}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133131}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {2039--2042}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Das, Supratim %A Mishra, Arunav %A Berberich, Klaus %A Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Estimating Event Focus Time Using Neural Word Embeddings : %G eng %U http://hdl.handle.net/21.11116/0000-0000-635B-B %R 10.1145/3132847.3133131 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 2039 - 2042 %I ACM %@ 978-1-4503-4918-5
[156]
S. Das, K. Berberich, D. Klakow, A. Mishra, and V. Setty, “Estimating Event Focus Time with Distributed Representation of Words,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Time is an important dimension as it aids in disambiguating and understanding news- worthy events that happened in the past. It helps in chronological ordering of events to understand its causality, evolution, and ramifications. In Information Retrieval, time alongside text is known to improve the quality of search results. So, making use of the temporal dimensionality in the text-based analysis is an interesting idea to explore. Considering the importance of time, methods to automatically resolve temporal foci’s of events are essential. In this thesis, we try to solve this research question by training our models on two different kinds of corpora and then evaluate on a set of historical event-queries.
Export
BibTeX
@mastersthesis{dasmaster17, TITLE = {Estimating Event Focus Time with Distributed Representation of Words}, AUTHOR = {Das, Supratim and Berberich, Klaus and Klakow, Dietrich and Mishra, Arunav and Setty, Vinay}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Time is an important dimension as it aids in disambiguating and understanding news- worthy events that happened in the past. It helps in chronological ordering of events to understand its causality, evolution, and ramifications. In Information Retrieval, time alongside text is known to improve the quality of search results. So, making use of the temporal dimensionality in the text-based analysis is an interesting idea to explore. Considering the importance of time, methods to automatically resolve temporal foci{\textquoteright}s of events are essential. In this thesis, we try to solve this research question by training our models on two different kinds of corpora and then evaluate on a set of historical event-queries.}, }
Endnote
%0 Thesis %A Das, Supratim %A Berberich, Klaus %A Klakow, Dietrich %A Mishra, Arunav %A Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Estimating Event Focus Time with Distributed Representation of Words : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-DFF1-7 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P 83 p. %V master %9 master %X Time is an important dimension as it aids in disambiguating and understanding news- worthy events that happened in the past. It helps in chronological ordering of events to understand its causality, evolution, and ramifications. In Information Retrieval, time alongside text is known to improve the quality of search results. So, making use of the temporal dimensionality in the text-based analysis is an interesting idea to explore. Considering the importance of time, methods to automatically resolve temporal foci&#8217;s of events are essential. In this thesis, we try to solve this research question by training our models on two different kinds of corpora and then evaluate on a set of historical event-queries.
[157]
S. Dutta, “Efficient knowledge Management for Named Entities from Text,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations. A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge acquisition from texts and its representation in knowledge repositories. This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented.
Export
BibTeX
@phdthesis{duttaphd17, TITLE = {Efficient knowledge Management for Named Entities from Text}, AUTHOR = {Dutta, Sourav}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-67924}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations. A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge acquisition from texts and its representation in knowledge repositories. This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented.}, }
Endnote
%0 Thesis %A Dutta, Sourav %Y Weikum, Gerhard %A referee: Nejdl, Wolfgang %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficient knowledge Management for Named Entities from Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-A793-E %U urn:nbn:de:bsz:291-scidok-67924 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P xv, 134 p. %V phd %9 phd %X The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations. A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge acquisition from texts and its representation in knowledge repositories. This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented. %U http://scidok.sulb.uni-saarland.de/volltexte/2017/6792/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de
[158]
P. Ernst, A. Mishra, A. Anand, and V. Setty, “BioNex: A System For Biomedical News Event Exploration,” in SIGIR’17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 2017.
Export
BibTeX
@inproceedings{Ernst_SIGIR2017, TITLE = {{BioNex}: {A} System For Biomedical News Event Exploration}, AUTHOR = {Ernst, Patrick and Mishra, Arunav and Anand, Avishek and Setty, Vinay}, LANGUAGE = {eng}, ISBN = {978-1-4503-5022-8}, DOI = {10.1145/3077136.3084150}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {SIGIR'17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {1277--1280}, ADDRESS = {Shinjuku, Tokyo, Japan}, }
Endnote
%0 Conference Proceedings %A Ernst, Patrick %A Mishra, Arunav %A Anand, Avishek %A Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T BioNex: A System For Biomedical News Event Exploration : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A2D1-A %R 10.1145/3077136.3084150 %D 2017 %B 40th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2017-08-07 - 2017-08-11 %C Shinjuku, Tokyo, Japan %B SIGIR'17 %P 1277 - 1280 %I ACM %@ 978-1-4503-5022-8
[159]
S. Eslami, A. J. Biega, R. Saha Roy, and G. Weikum, “Privacy of Hidden Profiles: Utility-Preserving Profile Removal in Online Forums,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Eslami_CIKM2017, TITLE = {Privacy of Hidden Profiles: {U}tility-Preserving Profile Removal in Online Forums}, AUTHOR = {Eslami, Sedigheh and Biega, Asia J. and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133140}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {2063--2066}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Eslami, Sedigheh %A Biega, Asia J. %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Privacy of Hidden Profiles: Utility-Preserving Profile Removal in Online Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3BA2-7 %R 10.1145/3132847.3133140 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 2063 - 2066 %I ACM %@ 978-1-4503-4918-5
[160]
S. Eslami, “Utility-preserving Profile Removal in Online Forums,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{EslamiMSc2017, TITLE = {Utility-preserving Profile Removal in Online Forums}, AUTHOR = {Eslami, Sedigheh}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Eslami, Sedigheh %Y Weikum, Gerhard %A referee: Saha Roy, Rishiraj %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Utility-preserving Profile Removal in Online Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-9236-4 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P XII, 66 p. %V master %9 master
[161]
E. Galbrun and P. Miettinen, “Analysing Political Opinions Using Redescription Mining,” in 16th IEEE International Conference on Data Mining Workshops (ICDMW 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{galbrun16analysing, TITLE = {Analysing Political Opinions Using Redescription Mining}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5090-5910-2}, DOI = {10.1109/ICDMW.2016.121}, PUBLISHER = {IEEE}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining Workshops (ICDMW 2016)}, EDITOR = {Domeniconi, Carlotta and Gullo, Francesco and Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong}, PAGES = {422--427}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Galbrun, Esther %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Analysing Political Opinions Using Redescription Mining : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2247-5 %R 10.1109/ICDMW.2016.121 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2015-12-12 - 2015-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining Workshops %E Domeniconi, Carlotta; Gullo, Francesco; Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong %P 422 - 427 %I IEEE %@ 978-1-5090-5910-2
[162]
E. Galbrun and P. Miettinen, Redescription Mining. Cham: Springer International, 2017.
Export
BibTeX
@book{galbrun18redescription, TITLE = {Redescription Mining}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-3-319-72889-6}, DOI = {10.1007/978-3-319-72889-6}, PUBLISHER = {Springer International}, ADDRESS = {Cham}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, PAGES = {XI, 80 p.}, }
Endnote
%0 Book %A Galbrun, Esther %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Redescription Mining : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90D3-1 %R 10.1007/978-3-319-72889-6 %@ 978-3-319-72889-6 %I Springer International %C Cham %D 2017 %P XI, 80 p.
[163]
E. Galbrun and P. Miettinen, “Redescription Mining: An Overview,” IEEE Intelligent Informatics Bulletin, vol. 18, no. 2, 2017.
Export
BibTeX
@article{Galbrun_2017c, TITLE = {Redescription Mining: An Overview}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {1727-5997}, PUBLISHER = {IEEE Computer Society}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, JOURNAL = {IEEE Intelligent Informatics Bulletin}, VOLUME = {18}, NUMBER = {2}, PAGES = {7--12}, EID = {2}, }
Endnote
%0 Journal Article %A Galbrun, Esther %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Redescription Mining: An Overview : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E2B-6 %7 2017 %D 2017 %J IEEE Intelligent Informatics Bulletin %V 18 %N 2 %& 7 %P 7 - 12 %Z sequence number: 2 %I IEEE Computer Society %@ false %U http://www.comp.hkbu.edu.hk/~iib/2017/Dec/article2/iib_vol18no2_article2.pdf
[164]
K. Gashteovski, R. Gemulla, and L. Del Corro, “MinIE: Minimizing Facts in Open Information Extraction,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{DBLP:conf/emnlp/GashteovskiGC17, TITLE = {{MinIE}: {M}inimizing Facts in Open Information Extraction}, AUTHOR = {Gashteovski, Kiril and Gemulla, Rainer and Del Corro, Luciano}, LANGUAGE = {eng}, ISBN = {978-1-945626-83-8}, URL = {http://aclanthology.info/papers/D17-1277/d17-1277}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {2620--2630}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Gashteovski, Kiril %A Gemulla, Rainer %A Del Corro, Luciano %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T MinIE: Minimizing Facts in Open Information Extraction : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-30F4-2 %U http://aclanthology.info/papers/D17-1277/d17-1277 %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 2620 - 2630 %I ACL %@ 978-1-945626-83-8 %U http://www.aclweb.org/anthology/D17-1277
[165]
X. Ge, A. Daphalapurkar, M. Shmipi, K. Darpun, K. Pelechrinis, P. K. Chrysanthis, and D. Zeinalipour-Yazti, “Data-driven Serendipity Navigation in Urban Places,” in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017), Atlanta, GA, USA, 2017.
Export
BibTeX
@inproceedings{icdcs17-serendipity-demo, TITLE = {Data-driven Serendipity Navigation in Urban Places}, AUTHOR = {Ge, Xiaoyi and Daphalapurkar, Ameya and Shmipi, Manali and Darpun, Kohli and Pelechrinis, Konstantinos and Chrysanthis, Panos K. and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-1792-2}, DOI = {10.1109/ICDCS.2017.286}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017)}, EDITOR = {Lee, Kisung and Liu, Ling}, PAGES = {2501--2504}, ADDRESS = {Atlanta, GA, USA}, }
Endnote
%0 Conference Proceedings %A Ge, Xiaoyi %A Daphalapurkar, Ameya %A Shmipi, Manali %A Darpun, Kohli %A Pelechrinis, Konstantinos %A Chrysanthis, Panos K. %A Zeinalipour-Yazti, Demetrios %+ External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Data-driven Serendipity Navigation in Urban Places : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-082B-7 %R 10.1109/ICDCS.2017.286 %D 2017 %B 37th IEEE International Conference on Distributed Computing Systems %Z date of event: 2017-06-05 - 2017-06-08 %C Atlanta, GA, USA %B IEEE 37th International Conference on Distributed Computing Systems %E Lee, Kisung; Liu, Ling %P 2501 - 2504 %I IEEE %@ 978-1-5386-1792-2
[166]
B. Goldsmith, M. Boley, J. Vreeken, M. Scheffler, and L. Ghiringhelli,, “Uncovering Structure-property Relationships of Materials by Subgroup Discovery,” New Journal of Physics, vol. 19, no. 1, 2017.
Export
BibTeX
@article{goldsmith:17:gold, TITLE = {Uncovering Structure-property Relationships of Materials by Subgroup Discovery}, AUTHOR = {Goldsmith, Brian and Boley, Mario and Vreeken, Jilles and Scheffler, Matthias and Ghiringhelli,, Luca}, LANGUAGE = {eng}, ISSN = {1367-2630}, DOI = {10.1088/1367-2630/aa57c2}, PUBLISHER = {IOP Publishing}, ADDRESS = {Bristol}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {New Journal of Physics}, VOLUME = {19}, NUMBER = {1}, EID = {013031}, }
Endnote
%0 Journal Article %A Goldsmith, Brian %A Boley, Mario %A Vreeken, Jilles %A Scheffler, Matthias %A Ghiringhelli,, Luca %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Uncovering Structure-property Relationships of Materials by Subgroup Discovery : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BF5-4 %R 10.1088/1367-2630/aa57c2 %7 2017 %D 2017 %J New Journal of Physics %O New J. Phys. %V 19 %N 1 %Z sequence number: 013031 %I IOP Publishing %C Bristol %@ false %U http://iopscience.iop.org/article/10.1088/1367-2630/aa57c2
[167]
A. Grycner, “Constructing Lexicons of Relational Phrases,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus.
Export
BibTeX
@phdthesis{Grynerphd17, TITLE = {Constructing Lexicons of Relational Phrases}, AUTHOR = {Grycner, Adam}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-69101}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus.}, }
Endnote
%0 Thesis %A Grycner, Adam %Y Weikum, Gerhard %A referee: Klakow, Dietrich %A referee: Ponzetto, Simone Paolo %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Constructing Lexicons of Relational Phrases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-933B-1 %U urn:nbn:de:bsz:291-scidok-69101 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P 125 p. %V phd %9 phd %X Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus. %U http://scidok.sulb.uni-saarland.de/volltexte/2017/6910/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de
[168]
A. Guimarães, L. Wang, and G. Weikum, “Us and Them: Adversarial Politics on Twitter,” in 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017 ), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{Guimaraes_ICDMW2017, TITLE = {Us and Them: {A}dversarial Politics on {Twitter}}, AUTHOR = {Guimar{\~a}es, Anna and Wang, Liqiang and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-5386-1480-8}, DOI = {10.1109/ICDMW.2017.119}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining Workshops (ICDMW 2017 )}, EDITOR = {Gottumukkala, Raju and Ning, Xia and Dong, Guozhu and Raghavan, Vijav and Aluru, Srinivas and Karypis, George and Miele, Lucio and Wu, Xindong}, PAGES = {872--877}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Guimar&#227;es, Anna %A Wang, Liqiang %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Us and Them: Adversarial Politics on Twitter : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B89-4 %R 10.1109/ICDMW.2017.119 %D 2017 %B 17th International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining Workshops %E Gottumukkala, Raju; Ning, Xia; Dong, Guozhu; Raghavan, Vijav; Aluru, Srinivas; Karypis, George; Miele, Lucio; Wu, Xindong %P 872 - 877 %I IEEE %@ 978-1-5386-1480-8
[169]
D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti, “Generating Semantic Aspects for Queries,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2017-5-001, 2017.
Abstract
Ambiguous information needs expressed in a limited number of keywords often result in long-winded query sessions and many query reformulations. In this work, we tackle ambiguous queries by providing automatically gen- erated semantic aspects that can guide users to satisfying results regarding their information needs. To generate semantic aspects, we use semantic an- notations available in the documents and leverage models representing the semantic relationships between annotations of the same type. The aspects in turn provide us a foundation for representing text in a completely structured manner, thereby allowing for a semantically-motivated organization of search results. We evaluate our approach on a testbed of over 5,000 aspects on Web scale document collections amounting to more than 450 million documents, with temporal, geographic, and named entity annotations as example dimen- sions. Our experimental results show that our general approach is Web-scale ready and finds relevant aspects for highly ambiguous queries.
Export
BibTeX
@techreport{Guptareport2007, TITLE = {Generating Semantic Aspects for Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISSN = {0946-011X}, NUMBER = {MPI-I-2017-5-001}, INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Ambiguous information needs expressed in a limited number of keywords often result in long-winded query sessions and many query reformulations. In this work, we tackle ambiguous queries by providing automatically gen- erated semantic aspects that can guide users to satisfying results regarding their information needs. To generate semantic aspects, we use semantic an- notations available in the documents and leverage models representing the semantic relationships between annotations of the same type. The aspects in turn provide us a foundation for representing text in a completely structured manner, thereby allowing for a semantically-motivated organization of search results. We evaluate our approach on a testbed of over 5,000 aspects on Web scale document collections amounting to more than 450 million documents, with temporal, geographic, and named entity annotations as example dimen- sions. Our experimental results show that our general approach is Web-scale ready and finds relevant aspects for highly ambiguous queries.}, TYPE = {Research Report}, }
Endnote
%0 Report %A Gupta, Dhruv %A Berberich, Klaus %A Str&#246;tgen, Jannik %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Generating Semantic Aspects for Queries : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-07DD-0 %Y Max-Planck-Institut f&#252;r Informatik %C Saarbr&#252;cken %D 2017 %P 39 p. %X Ambiguous information needs expressed in a limited number of keywords often result in long-winded query sessions and many query reformulations. In this work, we tackle ambiguous queries by providing automatically gen- erated semantic aspects that can guide users to satisfying results regarding their information needs. To generate semantic aspects, we use semantic an- notations available in the documents and leverage models representing the semantic relationships between annotations of the same type. The aspects in turn provide us a foundation for representing text in a completely structured manner, thereby allowing for a semantically-motivated organization of search results. We evaluate our approach on a testbed of over 5,000 aspects on Web scale document collections amounting to more than 450 million documents, with temporal, geographic, and named entity annotations as example dimen- sions. Our experimental results show that our general approach is Web-scale ready and finds relevant aspects for highly ambiguous queries. %B Research Report %@ false
[170]
S. Gurajada, “Distributed Querying of Large Labeled Graphs,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the “Labeled Graph”, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner. • Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices. • Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner. • Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner. To this end, we propose a prototype distributed engine, coined “TriAD” (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets.
Export
BibTeX
@phdthesis{guraphd2017, TITLE = {Distributed Querying of Large Labeled Graphs}, AUTHOR = {Gurajada, Sairam}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-67738}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the {\textquotedblleft}Labeled Graph{\textquotedblright}, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner. \mbox{$\bullet$} Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices. \mbox{$\bullet$} Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner. \mbox{$\bullet$} Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner. To this end, we propose a prototype distributed engine, coined {\textquotedblleft}TriAD{\textquotedblright} (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets.}, }
Endnote
[171]
K. Hui, “Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieval,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.
Export
BibTeX
@phdthesis{HUiphd2017, TITLE = {Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieval}, AUTHOR = {Hui, Kai}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-269423}, DOI = {10.22028/D291-26942}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.}, }
Endnote
%0 Thesis %A Hui, Kai %Y Berberich, Klaus %A referee: Weikum, Gerhard %A referee: Dietz, Laura %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieval : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-8921-E %U urn:nbn:de:bsz:291-scidok-ds-269423 %R 10.22028/D291-26942 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P xiv, 130 p. %V phd %9 phd %X An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26894
[172]
K. Hui, K. Berberich, and I. Mele, “Dealing with Incomplete Judgments in Cascade Measures,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.
Export
BibTeX
@inproceedings{HuiICTIR2017, TITLE = {Dealing with Incomplete Judgments in Cascade Measures}, AUTHOR = {Hui, Kai and Berberich, Klaus and Mele, Ida}, LANGUAGE = {eng}, ISBN = {978-1-4503-4490-6}, DOI = {10.1145/3121050.3121064}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval}, PAGES = {83--90}, ADDRESS = {Amsterdam, The Netherlands}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %A Mele, Ida %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Dealing with Incomplete Judgments in Cascade Measures : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0649-6 %R 10.1145/3121050.3121064 %D 2017 %B 7th International Conference on the Theory of Information Retrieval %Z date of event: 2017-10-01 - 2017-10-04 %C Amsterdam, The Netherlands %B ICTIR'17 %P 83 - 90 %I ACM %@ 978-1-4503-4490-6
[173]
K. Hui and K. Berberich, “Low-Cost Preference Judgment via Ties,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{hui2017short, TITLE = {Low-Cost Preference Judgment via Ties}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_58}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {626--632}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Low-Cost Preference Judgment via Ties : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1F7B-A %R 10.1007/978-3-319-56608-5_58 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2017-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 626 - 632 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[174]
K. Hui and K. Berberich, “Merge-Tie-Judge: Low-Cost Preference Judgments with Ties,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.
Export
BibTeX
@inproceedings{HuiICTIR2017b, TITLE = {{Merge-Tie-Judge}: Low-Cost Preference Judgments with Ties}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4490-6}, DOI = {10.1145/3121050.3121095}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval}, PAGES = {277--280}, ADDRESS = {Amsterdam, The Netherlands}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Merge-Tie-Judge: Low-Cost Preference Judgments with Ties : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-064B-2 %R 10.1145/3121050.3121095 %D 2017 %B 7th International Conference on the Theory of Information Retrieval %Z date of event: 2017-10-01 - 2017-10-04 %C Amsterdam, The Netherlands %B ICTIR'17 %P 277 - 280 %I ACM %@ 978-1-4503-4490-6
[175]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “PACRR: A Position-Aware Neural IR Model for Relevance Matching,” 2017. [Online]. Available: http://arxiv.org/abs/1704.03940. (arXiv: 1704.03940)
Abstract
In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query. While previous works have successfully captured unigram term matches, how to fully employ position-dependent information such as proximity and term dependencies has been insufficiently explored. In this work, we propose a novel neural IR model named PACRR (Position-Aware Convolutional-Recurrent Relevance), aiming at better modeling position-dependent interactions between a query and a document via convolutional layers as well as recurrent layers. Extensive experiments on six years' TREC Web Track data confirm that the proposed model yields better results under different benchmarks.
Export
BibTeX
@online{DBLP:journals/corr/HuiYBM17, TITLE = {{PACRR}: A Position-Aware Neural {IR} Model for Relevance Matching}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1704.03940}, EPRINT = {1704.03940}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query. While previous works have successfully captured unigram term matches, how to fully employ position-dependent information such as proximity and term dependencies has been insufficiently explored. In this work, we propose a novel neural IR model named PACRR (Position-Aware Convolutional-Recurrent Relevance), aiming at better modeling position-dependent interactions between a query and a document via convolutional layers as well as recurrent layers. Extensive experiments on six years' TREC Web Track data confirm that the proposed model yields better results under different benchmarks.}, }
Endnote
%0 Report %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T PACRR: A Position-Aware Neural IR Model for Relevance Matching : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90A8-3 %U http://arxiv.org/abs/1704.03940 %D 2017 %X In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query. While previous works have successfully captured unigram term matches, how to fully employ position-dependent information such as proximity and term dependencies has been insufficiently explored. In this work, we propose a novel neural IR model named PACRR (Position-Aware Convolutional-Recurrent Relevance), aiming at better modeling position-dependent interactions between a query and a document via convolutional layers as well as recurrent layers. Extensive experiments on six years' TREC Web Track data confirm that the proposed model yields better results under different benchmarks. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[176]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “PACRR: A Position-Aware Neural IR Model for Relevance Matching,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{HuiENMLP2017, TITLE = {{PACRR}: A Position-Aware Neural {IR} Model for Relevance Matching}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-945626-83-8}, URL = {https://aclanthology.info/pdf/D/D17/D17-1111.pdf}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {1060--1069}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T PACRR: A Position-Aware Neural IR Model for Relevance Matching : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-063F-D %U https://aclanthology.info/pdf/D/D17/D17-1111.pdf %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 1060 - 1069 %I ACL %@ 978-1-945626-83-8 %U https://aclanthology.info/pdf/D/D17/D17-1111.pdf
[177]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “Position-Aware Representations for Relevance Matching in Neural Information Retrieval,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{HuiWWW2017, TITLE = {Position-Aware Representations for Relevance Matching in Neural Information Retrieval}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3054258}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {799--800}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Position-Aware Representations for Relevance Matching in Neural Information Retrieval : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90A4-B %R 10.1145/3041021.3054258 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 799 - 800 %I ACM %@ 978-1-4503-4914-7
[178]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model,” 2017. [Online]. Available: http://arxiv.org/abs/1706.10192. (arXiv: 1706.10192)
Abstract
Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results.
Export
BibTeX
@online{HuiarXiv2017b, TITLE = {{RE-PACRR}: {A} Context and Density-Aware Neural Information Retrieval Model}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1706.10192}, EPRINT = {1706.10192}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results.}, }
Endnote
%0 Report %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-064D-D %U http://arxiv.org/abs/1706.10192 %D 2017 %X Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[179]
K. Hui and K. Berberich, “Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{hui2017full, TITLE = {Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_19}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {239--251}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1F75-5 %R 10.1007/978-3-319-56608-5_19 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2016-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 239 - 251 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[180]
R. Jäschke, J. Strötgen, E. Krotova, and F. Fischer, “„Der Helmut Kohl unter den Brotaufstrichen“ - Zur Extraktion vossianischer Antonomasien aus großen Zeitungskorpora,” in DHd 2017, 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V., Bern, Switzerland, 2017, pp. 120–124.
Export
BibTeX
@inproceedings{JaeschkeEtAl2017_DHD, TITLE = {{{Der Helmut Kohl unter den Brotaufstrichen'' -- Zur Extraktion vossianischer Antonomasien aus gro{\ss}en Zeitungskorpora}}}, AUTHOR = {J{\"a}schke, Robert and Str{\"o}tgen, Jannik and Krotova, Elena and Fischer, Frank}, LANGUAGE = {deu}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {DHd 2017, 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.}, PAGES = {120--124}, ADDRESS = {Bern, Switzerland}, }
Endnote
%0 Conference Proceedings %A J&#228;schke, Robert %A Str&#246;tgen, Jannik %A Krotova, Elena %A Fischer, Frank %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T &#8222;Der Helmut Kohl unter den Brotaufstrichen&#8220; - Zur Extraktion vossianischer Antonomasien aus gro&#223;en Zeitungskorpora : %G deu %U http://hdl.handle.net/11858/00-001M-0000-002C-4E05-A %D 2017 %B 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V. %Z date of event: 2017-02-13 - 2017-02-18 %C Bern, Switzerland %B DHd 2017 %P 120 - 124
[181]
H. Jhamtani, R. Saha Roy, N. Chhaya, and E. Nyberg, “Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{JhamtaniECIR2017, TITLE = {Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages}, AUTHOR = {Jhamtani, Harsh and Saha Roy, Rishiraj and Chhaya, Niyati and Nyberg, Eric}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_41}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {506--512}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Jhamtani, Harsh %A Saha Roy, Rishiraj %A Chhaya, Niyati %A Nyberg, Eric %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages : %G eng %U http://hdl.handle.net/21.11116/0000-0000-DB33-0 %R 10.1007/978-3-319-56608-5_41 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2017-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 506 - 512 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[182]
J. Kalofolias, M. Boley, and J. Vreeken, “Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups,” 2017. [Online]. Available: http://arxiv.org/abs/1709.07941. (arXiv: 1709.07941)
Abstract
Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.
Export
BibTeX
@online{Kalofolias_arXiv2017, TITLE = {Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups}, AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.07941}, EPRINT = {1709.07941}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.}, }
Endnote
%0 Report %A Kalofolias, Janis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0685-D %U http://arxiv.org/abs/1709.07941 %D 2017 %X Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time. %K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI
[183]
J. Kalofolias, M. Boley, and J. Vreeken, “Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{KalofoliasICDM2017, TITLE = {Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups}, AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.29}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {197--206}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Kalofolias, Janis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups : %G eng %U http://hdl.handle.net/21.11116/0000-0000-63C2-5 %R 10.1109/ICDM.2017.29 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 197 - 206 %I IEEE %@ 978-1-5386-3835-4
[184]
J. Kalofolias, E. Galbrun, and P. Miettinen, “From Sets of Good Redescriptions to Good Sets of Redescriptions,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{kalofolias16from, TITLE = {From Sets of Good Redescriptions to Good Sets of Redescriptions}, AUTHOR = {Kalofolias, Janis and Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5090-5473-2}, DOI = {10.1109/ICDM.2016.0032}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, PAGES = {211--220}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Kalofolias, Janis %A Galbrun, Esther %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T From Sets of Good Redescriptions to Good Sets of Redescriptions : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-224D-A %R 10.1109/ICDM.2016.0032 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %P 211 - 220 %I IEEE %@ 978-1-5090-5473-2
[185]
M. Kamp, M. Boley, O. Missura, and T. Gärtner, “Effective Parallelisation for Machine Learning,” in Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 2017.
Export
BibTeX
@inproceedings{NIPS2017_7226, TITLE = {Effective Parallelisation for Machine Learning}, AUTHOR = {Kamp, Michael and Boley, Mario and Missura, Olana and G{\"a}rtner, Thomas}, LANGUAGE = {eng}, PUBLISHER = {Curran Associates}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Advances in Neural Information Processing Systems 30}, EDITOR = {Guyon, I. and Luxburg, U. V. and Bengio, S. and Wallach, H. and Fergus, R. and Vishwanathan, S. and Garnett, R.}, PAGES = {6477--6488}, EID = {7226}, ADDRESS = {Long Beach, CA, USA}, }
Endnote
%0 Conference Proceedings %A Kamp, Michael %A Boley, Mario %A Missura, Olana %A G&#228;rtner, Thomas %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Effective Parallelisation for Machine Learning : %G eng %U http://hdl.handle.net/21.11116/0000-0002-BA32-4 %D 2017 %B Thirty-first Conference on Neural Information Processing Systems %Z date of event: 2017-12-04 - 2017-12-09 %C Long Beach, CA, USA %B Advances in Neural Information Processing Systems 30 %E Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R. %P 6477 - 6488 %Z sequence number: 7226 %I Curran Associates %U http://papers.nips.cc/paper/7226-effective-parallelisation-for-machine-learning.pdf
[186]
S. Karaev and P. Miettinen, “Algorithms for Approximate Subtropical Matrix Factorization,” 2017. [Online]. Available: http://arxiv.org/abs/1707.08872. (arXiv: 1707.08872)
Abstract
Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matrix -- we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow "winner-takes-it-all" interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret.
Export
BibTeX
@online{Karaev_arXiv2017, TITLE = {Algorithms for Approximate Subtropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1707.08872}, EPRINT = {1707.08872}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matrix -- we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow "winner-takes-it-all" interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret.}, }
Endnote
%0 Report %A Karaev, Sanjar %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Algorithms for Approximate Subtropical Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-065A-F %U http://arxiv.org/abs/1707.08872 %D 2017 %X Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matrix -- we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow "winner-takes-it-all" interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret. %K Computer Science, Learning, cs.LG %U http://people.mpi-inf.mpg.de/~pmiettin/tropical/
[187]
E. Kuzey, “Populating Knowledge bases with Temporal Information,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@phdthesis{KuzeyPhd2017, TITLE = {Populating Knowledge bases with Temporal Information}, AUTHOR = {Kuzey, Erdal}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Kuzey, Erdal %Y Weikum, Gerhard %A referee: de Rijke , Maarten %A referee: Suchanek, Fabian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Populating Knowledge bases with Temporal Information : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-EAE5-7 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P XIV, 143 p. %V phd %9 phd %U http://scidok.sulb.uni-saarland.de/volltexte/2017/6811/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de
[188]
L. Lange, “Time in Newspaper: A Large-Scale Analysis of Temporal Expressions in News Corpora,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{LangeBcS2017, TITLE = {Time in Newspaper: {A} Large-Scale Analysis of Temporal Expressions in News Corpora}, AUTHOR = {Lange, Lukas}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, TYPE = {Bachelor's thesis}, }
Endnote
%0 Thesis %A Lange, Lukas %Y Str&#246;tgen, Jannik %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Time in Newspaper: A Large-Scale Analysis of Temporal Expressions in News Corpora : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5D08-B %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P 77 p. %V bachelor %9 bachelor
[189]
F. A. Lisi and D. Stepanova, “Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs,” in Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR, London, UK, 2017.
Export
BibTeX
@inproceedings{LisiRuleML2017, TITLE = {Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs}, AUTHOR = {Lisi, Francesca Alessandra and Stepanova, Daria}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-1875-8}, PUBLISHER = {CEUR-WS.org}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR}, EDITOR = {Bassiliades, Nick and Bikakis, Antonis and Constantini, Stefania and Franconi, Enrico and Giurca, Adrian and Kontchakov, Roman and Patkosi, Theodore and Sadri, Fariba and Van Woensel, William}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {1875}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Lisi, Francesca Alessandra %A Stepanova, Daria %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-55FC-8 %D 2017 %B International Joint Conference on Rules and Reasoning %Z date of event: 2017-07-12 - 2017-07-15 %C London, UK %B Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR %E Bassiliades, Nick; Bikakis, Antonis; Constantini, Stefania; Franconi, Enrico; Giurca, Adrian; Kontchakov, Roman; Patkosi, Theodore; Sadri, Fariba; Van Woensel, William %I CEUR-WS.org %B CEUR Workshop Proceedings %N 1875 %@ false %U http://ceur-ws.org/Vol-1875/paper20.pdf
[190]
S. MacAvaney, K. Hui, and A. Yates, “An Approach for Weakly-Supervised Deep Information Retrieval,” 2017. [Online]. Available: http://arxiv.org/abs/1707.00189. (arXiv: 1707.00189)
Abstract
Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.
Export
BibTeX
@online{MacAvaney_arXiv2017, TITLE = {An Approach for Weakly-Supervised Deep Information Retrieval}, AUTHOR = {MacAvaney, Sean and Hui, Kai and Yates, Andrew}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1707.00189}, EPRINT = {1707.00189}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Hui, Kai %A Yates, Andrew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Approach for Weakly-Supervised Deep Information Retrieval : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06C5-C %U http://arxiv.org/abs/1707.00189 %D 2017 %X Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[191]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Approximate Functional Dependencies,” 2017. [Online]. Available: http://arxiv.org/abs/1705.09391. (arXiv: 1705.09391)
Abstract
Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.
Export
BibTeX
@online{DBLP:journals/corr/MandrosBV17, TITLE = {Discovering Reliable Approximate Functional Dependencies}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.09391}, EPRINT = {1705.09391}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.}, }
Endnote
%0 Report %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Reliable Approximate Functional Dependencies : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90F8-D %U http://arxiv.org/abs/1705.09391 %D 2017 %X Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity. %K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT
[192]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Approximate Functional Dependencies,” in KDD’17, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 2017.
Export
BibTeX
@inproceedings{MandrosKDD2017, TITLE = {Discovering Reliable Approximate Functional Dependencies}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-4503-4887-4}, DOI = {10.1145/3097983.3098062}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {KDD'17, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, PAGES = {355--363}, ADDRESS = {Halifax, NS, Canada}, }
Endnote
%0 Conference Proceedings %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Reliable Approximate Functional Dependencies : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-065F-5 %R 10.1145/3097983.3098062 %D 2017 %B 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining %Z date of event: 2017-08-13 - 2017-08-17 %C Halifax, NS, Canada %B KDD'17 %P 355 - 363 %I ACM %@ 978-1-4503-4887-4
[193]
A. Marx and J. Vreeken, “Telling Cause from Effect using MDL-based Local and Global Regression,” 2017. [Online]. Available: http://arxiv.org/abs/1709.08915. (arXiv: 1709.08915)
Abstract
We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables $X$ and $Y$ from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer $X$ causes $Y$ in case it is shorter to describe $Y$ as a function of $X$ than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.
Export
BibTeX
@online{Marx_arXiv1709.08915, TITLE = {Telling Cause from Effect using {MDL}-based Local and Global Regression}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, URL = {http://arxiv.org/abs/1709.08915}, DOI = {10.1109/ICDM.2017.40}, EPRINT = {1709.08915}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables $X$ and $Y$ from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer $X$ causes $Y$ in case it is shorter to describe $Y$ as a function of $X$ than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.}, }
Endnote
%0 Report %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Telling Cause from Effect using MDL-based Local and Global Regression : %U http://hdl.handle.net/21.11116/0000-0002-9F18-1 %R 10.1109/ICDM.2017.40 %U http://arxiv.org/abs/1709.08915 %D 2017 %X We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables $X$ and $Y$ from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer $X$ causes $Y$ in case it is shorter to describe $Y$ as a function of $X$ than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin. %K Statistics, Machine Learning, stat.ML
[194]
A. Marx and J. Vreeken, “Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06385. (arXiv: 1702.06385)
Abstract
Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data.
Export
BibTeX
@online{DBLP:journals/corr/MarxV17, TITLE = {Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1702.06385}, EPRINT = {1702.06385}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data.}, }
Endnote
%0 Report %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90EF-3 %U http://arxiv.org/abs/1702.06385 %D 2017 %X Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data. %K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG
[195]
A. Marx and J. Vreeken, “Telling Cause from Effect Using MDL-Based Local and Global Regression,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{MarxICDM2017, TITLE = {Telling Cause from Effect Using {MDL}-Based Local and Global Regression}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.40}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {307--316}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Telling Cause from Effect Using MDL-Based Local and Global Regression : %G eng %U http://hdl.handle.net/21.11116/0000-0000-63C4-3 %R 10.1109/ICDM.2017.40 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 307 - 316 %I IEEE %@ 978-1-5386-3835-4
[196]
F. Meawad, M. H. Gad-Elrab, and E. Hemayed, “Designing Mobile Augmented Reality Experiences Using Friendly Markers,” in 4th International Conference on User Science and Engineering (i-USEr 2016), Melaka, Malaysia, 2017.
Export
BibTeX
@inproceedings{Meawad2017, TITLE = {Designing Mobile Augmented Reality Experiences Using Friendly Markers}, AUTHOR = {Meawad, Fatma and Gad-Elrab, Mohamed H. and Hemayed, Elsayed}, LANGUAGE = {eng}, ISBN = {978-1-5090-263-9}, DOI = {10.1109/IUSER.2016.7857937}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {4th International Conference on User Science and Engineering (i-USEr 2016)}, PAGES = {75--80}, ADDRESS = {Melaka, Malaysia}, }
Endnote
%0 Conference Proceedings %A Meawad, Fatma %A Gad-Elrab, Mohamed H. %A Hemayed, Elsayed %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Designing Mobile Augmented Reality Experiences Using Friendly Markers : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-CF28-A %R 10.1109/IUSER.2016.7857937 %D 2017 %B 4th International Conference on User Science and Engineering %Z date of event: 2016-08-23 - 2016-08-25 %C Melaka, Malaysia %B 4th International Conference on User Science and Engineering %P 75 - 80 %I IEEE %@ 978-1-5090-263-9
[197]
S. Metzger, R. Schenkel, and M. Sydow, “QBEES: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation,” Journal of Intelligent Information Systems, vol. 49, no. 3, 2017.
Export
BibTeX
@article{Metzger2017, TITLE = {{QBEES}: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation}, AUTHOR = {Metzger, Steffen and Schenkel, Ralf and Sydow, Marcin}, LANGUAGE = {eng}, DOI = {10.1007/s10844-017-0443-x}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, JOURNAL = {Journal of Intelligent Information Systems}, VOLUME = {49}, NUMBER = {3}, PAGES = {333--366}, }
Endnote
%0 Journal Article %A Metzger, Steffen %A Schenkel, Ralf %A Sydow, Marcin %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T QBEES: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-557B-8 %R 10.1007/s10844-017-0443-x %7 2017 %D 2017 %J Journal of Intelligent Information Systems %V 49 %N 3 %& 333 %P 333 - 366
[198]
S. Metzler, S. Günnemann, and P. Miettinen, “Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{metzler16hyperbolae, TITLE = {Hyperbolae Are No Hyperbole: {Modelling} Communities That Are Not Cliques}, AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5090-5473-2}, DOI = {10.1109/ICDM.2016.0044}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, PAGES = {330--339}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Metzler, Saskia %A G&#252;nnemann, Stephan %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-225F-F %R 10.1109/ICDM.2016.0044 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %P 330 - 339 %I IEEE %@ 978-1-5090-5473-2
[199]
P. Mirza, S. Razniewski, F. Darari, and G. Weikum, “Cardinal Virtues: Extracting Relation Cardinalities from Text,” in The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 2017.
Export
BibTeX
@inproceedings{MirzaACL2017, TITLE = {Cardinal Virtues: {E}xtracting Relation Cardinalities from Text}, AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-76-0}, DOI = {10.18653/v1/P17-2055}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)}, PAGES = {347--351}, ADDRESS = {Vancouver, Canada}, }
Endnote
%0 Conference Proceedings %A Mirza, Paramita %A Razniewski, Simon %A Darari, Fariz %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cardinal Virtues: Extracting Relation Cardinalities from Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-F9F8-7 %R 10.18653/v1/P17-2055 %D 2017 %B The 55th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2017-07-30 - 2017-08-04 %C Vancouver, Canada %B The 55th Annual Meeting of the Association for Computational Linguistics %P 347 - 351 %I ACL %@ 978-1-945626-76-0 %U http://aclweb.org/anthology/P17-2055
[200]
P. Mirza, S. Razniewski, F. Darari, and G. Weikum, “Cardinal Virtues: Extracting Relation Cardinalities from Text,” 2017. [Online]. Available: http://arxiv.org/abs/1704.04455. (arXiv: 1704.04455)
Abstract
Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations.
Export
BibTeX
@online{Mirza2017, TITLE = {Cardinal Virtues: Extracting Relation Cardinalities from Text}, AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1704.04455}, EPRINT = {1704.04455}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations.}, }
Endnote
%0 Report %A Mirza, Paramita %A Razniewski, Simon %A Darari, Fariz %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cardinal Virtues: Extracting Relation Cardinalities from Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8128-9 %U http://arxiv.org/abs/1704.04455 %D 2017 %X Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations. %K Computer Science, Computation and Language, cs.CL
[201]
A. Mishra and K. Berberich, “How do Order and Proximity Impact the Readability of Event Summaries?,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{DBLP:conf/ecir/MishraB17, TITLE = {How do Order and Proximity Impact the Readability of Event Summaries?}, AUTHOR = {Mishra, Arunav and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_17}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {212--225}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Mishra, Arunav %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T How do Order and Proximity Impact the Readability of Event Summaries? : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-20D9-B %R 10.1007/978-3-319-56608-5_17 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2017-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 212 - 225 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[202]
P. Mrazovic, B. Eravci, J. L. Larriba-Pey, H. Ferhatosmanoglu, and M. Matskin, “Understanding and Predicting Trends in Urban Freight Transport,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.
Export
BibTeX
@inproceedings{MrazovicMDM2017, TITLE = {Understanding and Predicting Trends in Urban Freight Transport}, AUTHOR = {Mrazovic, Petar and Eravci, Bahaeddin and Larriba-Pey, Josep L. and Ferhatosmanoglu, Hakan and Matskin, Mihhail}, LANGUAGE = {eng}, ISBN = {978-1-5386-3932-0}, DOI = {10.1109/MDM.2017.26}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)}, PAGES = {124--133}, ADDRESS = {Daejeon, South Korea}, }
Endnote
%0 Conference Proceedings %A Mrazovic, Petar %A Eravci, Bahaeddin %A Larriba-Pey, Josep L. %A Ferhatosmanoglu, Hakan %A Matskin, Mihhail %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Understanding and Predicting Trends in Urban Freight Transport : %G eng %U http://hdl.handle.net/21.11116/0000-0000-DB41-0 %R 10.1109/MDM.2017.26 %D 2017 %B 18th IEEE International Conference on Mobile Data Management %Z date of event: 2017-05-29 - 2017-06-01 %C Daejeon, South Korea %B 18th IEEE International Conference on Mobile Data Management %P 124 - 133 %I IEEE %@ 978-1-5386-3932-0
[203]
S. Mukherjee, S. Dutta, and G. Weikum, “Credible Review Detection with Limited Information using Consistency Analysis,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02668. (arXiv: 1705.02668)
Abstract
Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines.
Export
BibTeX
@online{Mukherjee2017b, TITLE = {Credible Review Detection with Limited Information using Consistency Analysis}, AUTHOR = {Mukherjee, Subhabrata and Dutta, Sourav and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02668}, EPRINT = {1705.02668}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Dutta, Sourav %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Credible Review Detection with Limited Information using Consistency Analysis : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80C1-A %U http://arxiv.org/abs/1705.02668 %D 2017 %X Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[204]
S. Mukherjee, K. Popat, and G. Weikum, “Exploring Latent Semantic Factors to Find Useful Product Reviews,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02518. (arXiv: 1705.02518)
Abstract
Online reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers. In contrast to previous works in this domain exploiting a variety of syntactic and community-level features, we delve deep into the semantics of reviews as to what makes them useful, providing interpretable explanation for the same. We identify a set of consistency and semantic factors, all from the text, ratings, and timestamps of user-generated reviews, making our approach generalizable across all communities and domains. We explore review semantics in terms of several latent factors like the expertise of its author, his judgment about the fine-grained facets of the underlying product, and his writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) item facets, and (iii) review helpfulness. Large-scale experiments on five real-world datasets from Amazon show significant improvement over state-of-the-art baselines in predicting and ranking useful reviews.
Export
BibTeX
@online{Mukjherjee2017e, TITLE = {Exploring Latent Semantic Factors to Find Useful Product Reviews}, AUTHOR = {Mukherjee, Subhabrata and Popat, Kashyap and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02518}, EPRINT = {1705.02518}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers. In contrast to previous works in this domain exploiting a variety of syntactic and community-level features, we delve deep into the semantics of reviews as to what makes them useful, providing interpretable explanation for the same. We identify a set of consistency and semantic factors, all from the text, ratings, and timestamps of user-generated reviews, making our approach generalizable across all communities and domains. We explore review semantics in terms of several latent factors like the expertise of its author, his judgment about the fine-grained facets of the underlying product, and his writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) item facets, and (iii) review helpfulness. Large-scale experiments on five real-world datasets from Amazon show significant improvement over state-of-the-art baselines in predicting and ranking useful reviews.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Popat, Kashyap %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exploring Latent Semantic Factors to Find Useful Product Reviews : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-811C-5 %U http://arxiv.org/abs/1705.02518 %D 2017 %X Online reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers. In contrast to previous works in this domain exploiting a variety of syntactic and community-level features, we delve deep into the semantics of reviews as to what makes them useful, providing interpretable explanation for the same. We identify a set of consistency and semantic factors, all from the text, ratings, and timestamps of user-generated reviews, making our approach generalizable across all communities and domains. We explore review semantics in terms of several latent factors like the expertise of its author, his judgment about the fine-grained facets of the underlying product, and his writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) item facets, and (iii) review helpfulness. Large-scale experiments on five real-world datasets from Amazon show significant improvement over state-of-the-art baselines in predicting and ranking useful reviews. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[205]
S. Mukherjee, K. Popat, and G. Weikum, “Exploring Latent Semantic Factors to Find Useful Product Reviews,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{MukherjeeSDM2017, TITLE = {Exploring Latent Semantic Factors to Find Useful Product Reviews}, AUTHOR = {Mukherjee, Subhabrata and Popat, Kashyap and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-61197-497-3}, DOI = {10.1137/1.9781611974973.54}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, PAGES = {480--488}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Mukherjee, Subhabrata %A Popat, Kashyap %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exploring Latent Semantic Factors to Find Useful Product Reviews : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4CD4-6 %R 10.1137/1.9781611974973.54 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %P 480 - 488 %I SIAM %@ 978-1-61197-497-3
[206]
S. Mukherjee, H. Lamba, and G. Weikum, “Item Recommendation with Evolving User Preferences and Experience,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02519. (arXiv: 1705.02519)
Abstract
Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels.
Export
BibTeX
@online{Mukherjee2017d, TITLE = {Item Recommendation with Evolving User Preferences and Experience}, AUTHOR = {Mukherjee, Subhabrata and Lamba, Hemank and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02519}, DOI = {10.1109/ICDM.2015.111}, EPRINT = {1705.02519}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Lamba, Hemank %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Item Recommendation with Evolving User Preferences and Experience : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8103-C %R 10.1109/ICDM.2015.111 %U http://arxiv.org/abs/1705.02519 %D 2017 %X Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[207]
S. Mukherjee, G. Weikum, and C. Danescu-Niculescu-Mizil, “People on Drugs: Credibility of User Statements in Health Communities,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02522. (arXiv: 1705.02522)
Abstract
Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.
Export
BibTeX
@online{Mukherjee_arXiv2017, TITLE = {People on Drugs: Credibility of User Statements in Health Communities}, AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard and Danescu-Niculescu-Mizil, Cristian}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02522}, EPRINT = {1705.02522}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Weikum, Gerhard %A Danescu-Niculescu-Mizil, Cristian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T People on Drugs: Credibility of User Statements in Health Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80FE-2 %U http://arxiv.org/abs/1705.02522 %D 2017 %X Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[208]
S. Mukherjee and G. Weikum, “People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02667. (arXiv: 1705.02667)
Abstract
Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.
Export
BibTeX
@online{Mukerjee_arXiv1705.02667, TITLE = {People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities}, AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02667}, EPRINT = {1705.02667}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80F7-0 %U http://arxiv.org/abs/1705.02667 %D 2017 %X Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[209]
S. Mukherjee, S. Guennemann, and G. Weikum, “Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02669. (arXiv: 1705.02669)
Abstract
Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings.
Export
BibTeX
@online{Mukherjee2017, TITLE = {Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion}, AUTHOR = {Mukherjee, Subhabrata and Guennemann, Stephan and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02669}, DOI = {10.1145/2939672.2939780}, EPRINT = {1705.02669}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Guennemann, Stephan %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80BE-3 %R 10.1145/2939672.2939780 %U http://arxiv.org/abs/1705.02669 %D 2017 %X Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[210]
S. Mukherjee, “Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.
Export
BibTeX
@phdthesis{Mukherjeephd17, TITLE = {Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities}, AUTHOR = {Mukherjee, Subhabrata}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-69269}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.}, }
Endnote
%0 Thesis %A Mukherjee, Subhabrata %Y Weikum, Gerhard %A referee: Han, Jiawei %A referee: G&#252;nnemann, Stephan %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-A648-0 %U urn:nbn:de:bsz:291-scidok-69269 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P 166 p. %V phd %9 phd %X One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information. %U http://scidok.sulb.uni-saarland.de/volltexte/2017/6926/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de
[211]
S. Neumann and P. Miettinen, “Reductions for Frequency-Based Data Mining Problems,” 2017. [Online]. Available: http://arxiv.org/abs/1709.00900. (arXiv: 1709.00900)
Abstract
Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.
Export
BibTeX
@online{Neumann_arXiv2017, TITLE = {Reductions for Frequency-Based Data Mining Problems}, AUTHOR = {Neumann, Stefan and Miettinen, Pauli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.00900}, EPRINT = {1709.00900}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Studying the computational complexity of problems is one of the -- if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.}, }
Endnote
%0 Report %A Neumann, Stefan %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Reductions for Frequency-Based Data Mining Problems : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0654-C %U http://arxiv.org/abs/1709.00900 %D 2017 %X Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems. %K Computer Science, Computational Complexity, cs.CC
[212]
S. Neumann, R. Gemulla, and P. Miettinen, “What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{neumann16what, TITLE = {What You Will Gain By Rounding: {Theory} and Algorithms for Rounding Rank}, AUTHOR = {Neumann, Stefan and Gemulla, Rainer and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1109/ICDM.2016.147}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, EDITOR = {Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong}, PAGES = {380--389}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Neumann, Stefan %A Gemulla, Rainer %A Miettinen, Pauli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2265-0 %R 10.1109/ICDM.2016.147 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %E Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong %P 380 - 389 %I IEEE
[213]
S. Neumann and P. Miettinen, “Reductions for Frequency-Based Data Mining Problems,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{neumann17reductions, TITLE = {Reductions for Frequency-Based Data Mining Problems}, AUTHOR = {Neumann, Stefan and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.128}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {997--1002}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Neumann, Stefan %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Reductions for Frequency-Based Data Mining Problems : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90CE-F %R 10.1109/ICDM.2017.128 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 997 - 1002 %I IEEE %@ 978-1-5386-3835-4
[214]
D. B. Nguyen, M. Theobald, and G. Weikum, “J-REED: Joint Relation Extraction and Entity Disambiguation,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Nguyen_CIKM2017, TITLE = {J-{REED}: {Joint Relation Extraction and Entity Disambiguation}}, AUTHOR = {Nguyen, Dat Ba and Theobald, Martin and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133090}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {2227--2230}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Nguyen, Dat Ba %A Theobald, Martin %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T J-REED: Joint Relation Extraction and Entity Disambiguation : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B9D-E %R 10.1145/3132847.3133090 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 2227 - 2230 %I ACM %@ 978-1-4503-4918-5
[215]
D. B. Nguyen, “Joint Models for Information and Knowledge Extraction,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: • High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. • Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. • On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.
Export
BibTeX
@phdthesis{Nguyenphd2017, TITLE = {Joint Models for Information and Knowledge Extraction}, AUTHOR = {Nguyen, Dat Ba}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-269433}, DOI = {10.22028/D291-26943}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: \mbox{$\bullet$} High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. \mbox{$\bullet$} Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. \mbox{$\bullet$} On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.}, }
Endnote
%0 Thesis %A Nguyen, Dat Ba %Y Weikum, Gerhard %A referee: Theobald, Martin %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Joint Models for Information and Knowledge Extraction : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-890F-9 %U urn:nbn:de:bsz:291-scidok-ds-269433 %R 10.22028/D291-26943 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P 89 p. %V phd %9 phd %X Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: &#8226; High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. &#8226; Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. &#8226; On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26895
[216]
D. B. Nguyen, A. Abujabal, N. K. Tran, M. Theobald, and G. Weikum, “Query-Driven On-The-Fly Knowledge Base Construction,” Proceedings of the VLDB Endowment (Proc. VLDB 2018), vol. 11, no. 1, 2017.
Export
BibTeX
@article{Nguyen2017_PVLDB, TITLE = {Query-Driven On-The-Fly Knowledge Base Construction}, AUTHOR = {Nguyen, Dat Ba and Abujabal, Abdalghani and Tran, Nam Khanh and Theobald, Martin and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.14778/3136610.31366}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)}, VOLUME = {11}, NUMBER = {1}, PAGES = {66--79}, BOOKTITLE = {Proceedings of the 44th International Conference on Very Large Data Bases (VLDB 2018)}, EDITOR = {Bhowmick, Sourav and Torres, Ricardo}, }
Endnote
%0 Journal Article %A Nguyen, Dat Ba %A Abujabal, Abdalghani %A Tran, Nam Khanh %A Theobald, Martin %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Query-Driven On-The-Fly Knowledge Base Construction : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B51-3 %R 10.14778/3136610.31366 %7 2017 %D 2017 %J Proceedings of the VLDB Endowment %O PVLDB %V 11 %N 1 %& 66 %P 66 - 79 %I ACM %C New York, NY %B Proceedings of the 44th International Conference on Very Large Data Bases %O VLDB 2018 Rio de Janeiro, Brazil, August 27-31, 2018
[217]
A. Nikitin, C. Laoudias, G. Chatzimilioudis, P. Karras, and D. Zeinalipour-Yazti, “ACCES: Offline Accuracy Estimation for Fingerprint-based Localization,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.
Export
BibTeX
@inproceedings{mdm17-spate-demo, TITLE = {{ACCES}: Offline Accuracy Estimation for Fingerprint-based Localization}, AUTHOR = {Nikitin, Artyom and Laoudias, Christos and Chatzimilioudis, Georgios and Karras, Panagiotis and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-3932-0}, DOI = {10.1109/MDM.2017.61}, PUBLISHER = {IEEE Computer Society}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)}, PAGES = {358--359}, ADDRESS = {Daejeon, South Korea}, }
Endnote
%0 Conference Proceedings %A Nikitin, Artyom %A Laoudias, Christos %A Chatzimilioudis, Georgios %A Karras, Panagiotis %A Zeinalipour-Yazti, Demetrios %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ACCES: Offline Accuracy Estimation for Fingerprint-based Localization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-082D-3 %R 10.1109/MDM.2017.61 %D 2017 %B 18th IEEE International Conference on Mobile Data Management %Z date of event: 2017-05-29 - 2017-06-01 %C Daejeon, South Korea %B 18th IEEE International Conference on Mobile Data Management %P 358 - 359 %I IEEE Computer Society %@ 978-1-5386-3932-0
[218]
A. Nikitin, C. Laoudias, G. Chatzimilioudis, P. Karras, and D. Zeinalipour-Yazti, “Indoor Localization Accuracy Estimation from Fingerprint Data,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.
Export
BibTeX
@inproceedings{mdm17-spate, TITLE = {Indoor Localization Accuracy Estimation from Fingerprint Data}, AUTHOR = {Nikitin, Artyom and Laoudias, Christos and Chatzimilioudis, Georgios and Karras, Panagiotis and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-3932-0}, DOI = {10.1109/MDM.2017.34}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)}, PAGES = {196--205}, ADDRESS = {Daejeon, South Korea}, }
Endnote
%0 Conference Proceedings %A Nikitin, Artyom %A Laoudias, Christos %A Chatzimilioudis, Georgios %A Karras, Panagiotis %A Zeinalipour-Yazti, Demetrios %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Indoor Localization Accuracy Estimation from Fingerprint Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-0832-6 %R 10.1109/MDM.2017.34 %D 2017 %B 18th IEEE International Conference on Mobile Data Management %Z date of event: 2017-05-29 - 2017-06-01 %C Daejeon, South Korea %B 18th IEEE International Conference on Mobile Data Management %P 196 - 205 %I IEEE %@ 978-1-5386-3932-0
[219]
S. Paramonov, D. Stepanova, and P. Miettinen, “Hybrid ASP-based Approach to Pattern Mining,” in Lecture Notes in Computer Science, London, UK, 2017, vol. 10364.
Export
BibTeX
@inproceedings{StepanovaRR2017, TITLE = {Hybrid {ASP}-based Approach to Pattern Mining}, AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-3-319-61251-5}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Rules and Reasoning (RuleML+RR 2017)}, PAGES = {199--214}, BOOKTITLE = {Lecture Notes in Computer Science}, VOLUME = {10364}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Paramonov, Sergey %A Stepanova, Daria %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hybrid ASP-based Approach to Pattern Mining : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8450-8 %D 2017 %B International Joint Conference on Rules and Reasoning %Z date of event: 2017-07-12 - 2017-07-15 %C London, UK %B Rules and Reasoning %P 199 - 214 %I Springer %@ 978-3-319-61251-5 %B Lecture Notes in Computer Science %V 10364
[220]
T. Pelilissier Tanon, D. Stepanova, S. Razniewski, P. Mirza, and G. Weikum, “Completeness-Aware Rule Learning from Knowledge Graphs,” in The Semantic Web -- ISWC 2017, Vienna, Austria, 2017.
Export
BibTeX
@inproceedings{StepanovaISWC2017, TITLE = {Completeness-Aware Rule Learning from Knowledge Graphs}, AUTHOR = {Pelilissier Tanon, Thomas and Stepanova, Daria and Razniewski, Simon and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-68287-7}, DOI = {10.1007/978-3-319-68288-4_30}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Semantic Web -- ISWC 2017}, EDITOR = {d'Amato, Claudia and Fernandez, Miriam and Tamma, Valentina and Lecue, Freddy and Cudr{\'e}-Mauroux, Philippe and Sequeda, Juan and Lange, Christoph and Hefflin, Jeff}, PAGES = {507--525}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10587}, ADDRESS = {Vienna, Austria}, }
Endnote
%0 Conference Proceedings %A Pelilissier Tanon, Thomas %A Stepanova, Daria %A Razniewski, Simon %A Mirza, Paramita %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Completeness-Aware Rule Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-55D9-3 %R 10.1007/978-3-319-68288-4_30 %D 2017 %B 16th International Semantic Web Conference %Z date of event: 2017-10-21 - 2017-10-25 %C Vienna, Austria %B The Semantic Web -- ISWC 2017 %E d'Amato, Claudia; Fernandez, Miriam; Tamma, Valentina; Lecue, Freddy; Cudr&#233;-Mauroux, Philippe; Sequeda, Juan; Lange, Christoph; Hefflin, Jeff %P 507 - 525 %I Springer %@ 978-3-319-68287-7 %B Lecture Notes in Computer Science %N 10587 %U https://iswc2017.ai.wu.ac.at/wp-content/uploads/papers/MainProceedings/324.pdf
[221]
R. Pienta, M. Kahng, Z. Lin, J. Vreeken, P. Talukdar, J. Abello, G. Parameswaran, and D. H. Chau, “FACETS: Adaptive Local Exploration of Large Graphs,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{pienta:17:facets, TITLE = {{FACETS}: {A}daptive Local Exploration of Large Graphs}, AUTHOR = {Pienta, Robert and Kahng, Minsuk and Lin, Zhang and Vreeken, Jilles and Talukdar, Partha and Abello, James and Parameswaran, Ganesh and Chau, Duen Horng}, LANGUAGE = {eng}, ISBN = {978-1-611974-87-4}, DOI = {10.1137/1.9781611974973.67}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, EDITOR = {Chawla, Nitesh and Wang, Wei}, PAGES = {597--605}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Pienta, Robert %A Kahng, Minsuk %A Lin, Zhang %A Vreeken, Jilles %A Talukdar, Partha %A Abello, James %A Parameswaran, Ganesh %A Chau, Duen Horng %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations %T FACETS: Adaptive Local Exploration of Large Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BEA-D %R 10.1137/1.9781611974973.67 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %E Chawla, Nitesh; Wang, Wei %P 597 - 605 %I SIAM %@ 978-1-611974-87-4
[222]
E. Pitoura, P. Tsaparas, G. Flouris, I. Fundulaki, P. Papadakos, S. Abiteboul, and G. Weikum, “On Measuring Bias in Online Information,” 2017. [Online]. Available: http://arxiv.org/abs/1704.05730. (arXiv: 1704.05730)
Abstract
Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.
Export
BibTeX
@online{Pitoura2017, TITLE = {On Measuring Bias in Online Information}, AUTHOR = {Pitoura, Evaggelia and Tsaparas, Panayiotis and Flouris, Giorgos and Fundulaki, Irini and Papadakos, Panagiotis and Abiteboul, Serge and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1704.05730}, EPRINT = {1704.05730}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.}, }
Endnote
%0 Report %A Pitoura, Evaggelia %A Tsaparas, Panayiotis %A Flouris, Giorgos %A Fundulaki, Irini %A Papadakos, Panagiotis %A Abiteboul, Serge %A Weikum, Gerhard %+ External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T On Measuring Bias in Online Information : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8123-4 %U http://arxiv.org/abs/1704.05730 %D 2017 %X Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems. %K Computer Science, Databases, cs.DB,Computer Science, Computers and Society, cs.CY
[223]
E. Pitoura, P. Tsaparas, G. Flouris, I. Fundulaki, P. Papadakos, S. Abiteboul, and G. Weikum, “On Measuring Bias in Online Information,” ACM SIGMOD Record, vol. 46, no. 4, 2017.
Abstract
Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.
Export
BibTeX
@article{Pitoura2017a, TITLE = {On Measuring Bias in Online Information}, AUTHOR = {Pitoura, Evaggelia and Tsaparas, Panayiotis and Flouris, Giorgos and Fundulaki, Irini and Papadakos, Panagiotis and Abiteboul, Serge and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1145/3186549.3186553}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.}, JOURNAL = {ACM SIGMOD Record}, VOLUME = {46}, NUMBER = {4}, PAGES = {16--21}, }
Endnote
%0 Journal Article %A Pitoura, Evaggelia %A Tsaparas, Panayiotis %A Flouris, Giorgos %A Fundulaki, Irini %A Papadakos, Panagiotis %A Abiteboul, Serge %A Weikum, Gerhard %+ External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T On Measuring Bias in Online Information : %G eng %U http://hdl.handle.net/21.11116/0000-0000-EA0F-9 %R 10.1145/3186549.3186553 %7 2017 %D 2017 %X Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems. %K Computer Science, Databases, cs.DB,Computer Science, Computers and Society, cs.CY %J ACM SIGMOD Record %V 46 %N 4 %& 16 %P 16 - 21 %I ACM %C New York, NY
[224]
K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum, “Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{PopatWWW2017a, TITLE = {Where the Truth Lies: {E}xplaining the Credibility of Emerging Claims on the {W}eb and Social Media}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3055133}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {1003--1012}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %A Mukherjee, Subhabrata %A Str&#246;tgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4CD8-D %R 10.1145/3041021.3055133 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 1003 - 1012 %I ACM %@ 978-1-4503-4914-7
[225]
K. Popat, “Assessing the Credibility of Claims on the Web,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{PopatWWW2017b, TITLE = {Assessing the Credibility of Claims on the {Web}}, AUTHOR = {Popat, Kashyap}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3053379}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {735--739}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Assessing the Credibility of Claims on the Web : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90CC-2 %R 10.1145/3041021.3053379 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 735 - 739 %I ACM %@ 978-1-4503-4914-7
[226]
Y. Ran, B. He, K. Hui, J. Xu, and L. Sun, “A Document-Based Neural Relevance Model for Effective Clinical Decision Support,” in 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017), Kansas City, MO, USA, 2017.
Export
BibTeX
@inproceedings{RanBIBM2017, TITLE = {A Document-Based Neural Relevance Model for Effective Clinical Decision Support}, AUTHOR = {Ran, Yanhua and He, Ben and Hui, Kai and Xu, Jungang and Sun, Le}, LANGUAGE = {eng}, ISBN = {978-1-5090-3050-7}, DOI = {10.1109/BIBM.2017.8217757}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017)}, EDITOR = {Hu, Xiaohua and Shyu, Chi-Ren and Bromberg, Yana and Gao, Jean and Gong, Yang and Korkin, Dmitry and Yoo, Illhoi and Zheng, Jane Huiru}, PAGES = {798--804}, ADDRESS = {Kansas City, MO, USA}, }
Endnote
%0 Conference Proceedings %A Ran, Yanhua %A He, Ben %A Hui, Kai %A Xu, Jungang %A Sun, Le %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T A Document-Based Neural Relevance Model for Effective Clinical Decision Support : %G eng %U http://hdl.handle.net/21.11116/0000-0000-EA3D-5 %R 10.1109/BIBM.2017.8217757 %D 2017 %B IEEE International Conference on Bioinformatics and Biomedicine %Z date of event: 2017-11-13 - 2017-11-16 %C Kansas City, MO, USA %B 2017 IEEE International Conference on Bioinformatics and Biomedicine %E Hu, Xiaohua; Shyu, Chi-Ren; Bromberg, Yana; Gao, Jean; Gong, Yang; Korkin, Dmitry; Yoo, Illhoi ; Zheng, Jane Huiru %P 798 - 804 %I IEEE %@ 978-1-5090-3050-7
[227]
S. Razniewski, V. Balaraman, and W. Nutt, “Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties,” in Advanced Data Mining and Applications (ADMA 2017), Singapore, 2017.
Export
BibTeX
@inproceedings{Razniewski_ADMA2017, TITLE = {Doctoral Advisor or Medical Condition: {T}owards Entity-Specific Rankings of Knowledge Base Properties}, AUTHOR = {Razniewski, Simon and Balaraman, Vevake and Nutt, Werner}, LANGUAGE = {eng}, ISBN = {978-3-319-69178-7}, DOI = {10.1007/978-3-319-69179-4_37}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advanced Data Mining and Applications (ADMA 2017)}, EDITOR = {Cong, Gao and Peng, Wen-Chin and Zhang, Wei Emma and Li, Chengliang and Sun, Aixin}, PAGES = {526--540}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {10604}, ADDRESS = {Singapore}, }
Endnote
%0 Conference Proceedings %A Razniewski, Simon %A Balaraman, Vevake %A Nutt, Werner %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-2C05-A %R 10.1007/978-3-319-69179-4_37 %D 2017 %B 13th International Conference on Advanced Data Mining and Applications %Z date of event: 2017-11-05 - 2017-11-06 %C Singapore %B Advanced Data Mining and Applications %E Cong, Gao; Peng, Wen-Chin; Zhang, Wei Emma; Li, Chengliang; Sun, Aixin %P 526 - 540 %I Springer %@ 978-3-319-69178-7 %B Lecture Notes in Artificial Intelligence %N 10604
[228]
N. Reiter, E. Gius, J. Strötgen, and M. Willand, “A Shared Task for a Shared Goal: Systematic Annotation of Literary,” in Digital Humanities 2017 (DH 2017), Montréal, Canada, 2017.
Export
BibTeX
@inproceedings{StroetgenDH2017, TITLE = {A Shared Task for a Shared Goal: {S}ystematic Annotation of Literary}, AUTHOR = {Reiter, Nils and Gius, Evelyn and Str{\"o}tgen, Jannik and Willand, Marcus}, LANGUAGE = {eng}, URL = {https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Digital Humanities 2017 (DH 2017)}, EDITOR = {Lewis, Rihan}, EID = {192}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Reiter, Nils %A Gius, Evelyn %A Str&#246;tgen, Jannik %A Willand, Marcus %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T A Shared Task for a Shared Goal: Systematic Annotation of Literary : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-7BDC-3 %D 2017 %B Digital Humanities %Z date of event: 2017-08-08 - 2017-08-11 %C Montr&#233;al, Canada %B Digital Humanities 2017 %E Lewis, Rihan %Z sequence number: 192
[229]
M. Ringsquandl, E. Kharlamov, D. Stepanova, S. Lamparter, R. Lepratti, I. Horrocks, and P. Kroeger, “On Event-driven Knowledge Graph Completion in Digital Factories,” in IEEE International Conference on Big Data, Boston, MA, US, 2017.
Export
BibTeX
@inproceedings{RingsquandlBD2018, TITLE = {On Event-driven Knowledge Graph Completion in Digital Factories}, AUTHOR = {Ringsquandl, Martin and Kharlamov, Evgeny and Stepanova, Daria and Lamparter, Steffen and Lepratti, Raffaello and Horrocks, Ian and Kroeger, Peer}, LANGUAGE = {eng}, ISBN = {978-1-5386-2715-0}, DOI = {10.1109/BigData.2017.8258105}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {IEEE International Conference on Big Data}, DEBUG = {author: Cuzzocrea, Alfredo; author: Tang, Jian; author: Toyoda, Masashi}, EDITOR = {Nie, Jian-Yun and Obradovic, Zoran and Suzumura, Toyotaro and Ghosh, Rumi and Nambia, Raghunath and Wang, Chonggang and Zang, Hui and Baeza-Yates, Ricarda and Hu, Xiaohua and Kepner, Jeremy}, PAGES = {1676--1681}, ADDRESS = {Boston, MA, US}, }
Endnote
%0 Conference Proceedings %A Ringsquandl, Martin %A Kharlamov, Evgeny %A Stepanova, Daria %A Lamparter, Steffen %A Lepratti, Raffaello %A Horrocks, Ian %A Kroeger, Peer %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations %T On Event-driven Knowledge Graph Completion in Digital Factories : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3824-8 %R 10.1109/BigData.2017.8258105 %D 2017 %B IEEE International Conference on Big Data %Z date of event: 2017-12-11 - 2017-12-14 %C Boston, MA, US %B IEEE International Conference on Big Data %E Nie, Jian-Yun; Obradovic, Zoran; Suzumura, Toyotaro; Ghosh, Rumi; Nambia, Raghunath; Wang, Chonggang; Zang, Hui; Baeza-Yates, Ricarda; Hu, Xiaohua; Kepner, Jeremy; Cuzzocrea, Alfredo; Tang, Jian; Toyoda, Masashi %P 1676 - 1681 %I IEEE %@ 978-1-5386-2715-0
[230]
B. Roel, J. Vreeken, and A. Siebes, “Efficiently Discovering Unexpected Pattern-Co-Occurrences,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{RoelSDM2017, TITLE = {Efficiently Discovering Unexpected Pattern-Co-Occurrences}, AUTHOR = {Roel, Bertens and Vreeken, Jilles and Siebes, Arno}, LANGUAGE = {eng}, ISBN = {978-1-611974-87-4}, DOI = {10.1137/1.9781611974973.15}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, EDITOR = {Chawla, Nitesh and Wang, Wei}, PAGES = {126--134}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Roel, Bertens %A Vreeken, Jilles %A Siebes, Arno %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Efficiently Discovering Unexpected Pattern-Co-Occurrences : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-066E-3 %R 10.1137/1.9781611974973.15 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %E Chawla, Nitesh; Wang, Wei %P 126 - 134 %I SIAM %@ 978-1-611974-87-4
[231]
A. Rohrbach, A. Torabi, M. Rohrbach, N. Tandon, C. Pal, H. Larochelle, A. Courville, and B. Schiele, “Movie Description,” International Journal of Computer Vision, vol. 123, no. 1, 2017.
Abstract
Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015.
Export
BibTeX
@article{RohrbachMovie, TITLE = {Movie Description}, AUTHOR = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Christopher and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt}, LANGUAGE = {eng}, DOI = {10.1007/s11263-016-0987-1}, PUBLISHER = {Springer}, ADDRESS = {London}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015.}, JOURNAL = {International Journal of Computer Vision}, VOLUME = {123}, NUMBER = {1}, PAGES = {94--120}, }
Endnote
%0 Journal Article %A Rohrbach, Anna %A Torabi, Atousa %A Rohrbach, Marcus %A Tandon, Niket %A Pal, Christopher %A Larochelle, Hugo %A Courville, Aaron %A Schiele, Bernt %+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society %T Movie Description : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-FD03-C %R 10.1007/s11263-016-0987-1 %7 2017-01-25 %D 2017 %X Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Computation and Language, cs.CL %J International Journal of Computer Vision %O IJCV %V 123 %N 1 %& 94 %P 94 - 120 %I Springer %C London
[232]
R. Saha Roy, A. Singh, P. Chawla, S. Saxena, and A. R. Sinha, “Automatic Assignment of Topical Icons to Documents for Faster File Navigation,” in 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), Kyoto, Japan, 2017.
Export
BibTeX
@inproceedings{Roy_ICDAR2017, TITLE = {Automatic Assignment of Topical Icons to Documents for Faster File Navigation}, AUTHOR = {Saha Roy, Rishiraj and Singh, Abhijeet and Chawla, Prashant and Saxena, Shubham and Sinha, Atanu R.}, LANGUAGE = {eng}, ISSN = {2379-2140}, DOI = {10.1109/ICDAR.2017.220}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017)}, PAGES = {1338--1345}, ADDRESS = {Kyoto, Japan}, }
Endnote
%0 Conference Proceedings %A Saha Roy, Rishiraj %A Singh, Abhijeet %A Chawla, Prashant %A Saxena, Shubham %A Sinha, Atanu R. %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations %T Automatic Assignment of Topical Icons to Documents for Faster File Navigation : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A109-E %R 10.1109/ICDAR.2017.220 %D 2017 %B 14th IAPR International Conference on Document Analysis and Recognition %Z date of event: 2017-11-13 - 2017-11-15 %C Kyoto, Japan %B 14th IAPR International Conference on Document Analysis and Recognition %P 1338 - 1345 %I IEEE %@ false
[233]
V. Setty, A. Anand, A. Mishra, and A. Anand, “Modeling Event Importance for Ranking Daily News Events,” in WSDM’17, 10th ACM International Conference on Web Search and Data Mining, Cambridge, UK, 2017.
Export
BibTeX
@inproceedings{Setii2017, TITLE = {Modeling Event Importance for Ranking Daily News Events}, AUTHOR = {Setty, Vinay and Anand, Abhijit and Mishra, Arunav and Anand, Avishek}, LANGUAGE = {eng}, ISBN = {978-1-4503-4675-7}, DOI = {10.1145/3018661.3018728}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WSDM'17, 10th ACM International Conference on Web Search and Data Mining}, PAGES = {231--240}, ADDRESS = {Cambridge, UK}, }
Endnote
%0 Conference Proceedings %A Setty, Vinay %A Anand, Abhijit %A Mishra, Arunav %A Anand, Avishek %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Modeling Event Importance for Ranking Daily News Events : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-26D5-9 %R 10.1145/3018661.3018728 %D 2017 %B 10th ACM International Conference on Web Search and Data Mining %Z date of event: 2017-02-06 - 2017-02-10 %C Cambridge, UK %B WSDM'17 %P 231 - 240 %I ACM %@ 978-1-4503-4675-7
[234]
D. Seyler, M. Yahya, and K. Berberich, “Knowledge Questions from Knowledge Graphs,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.
Export
BibTeX
@inproceedings{SeylerICTIR2017, TITLE = {Knowledge Questions from Knowledge Graphs}, AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4490-6}, DOI = {10.1145/3121050.3121073}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval}, PAGES = {11--18}, ADDRESS = {Amsterdam, The Netherlands}, }
Endnote
%0 Conference Proceedings %A Seyler, Dominic %A Yahya, Mohamed %A Berberich, Klaus %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Knowledge Questions from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0647-A %R 10.1145/3121050.3121073 %D 2017 %B 7th International Conference on the Theory of Information Retrieval %Z date of event: 2017-10-01 - 2017-10-04 %C Amsterdam, The Netherlands %B ICTIR'17 %P 11 - 18 %I ACM %@ 978-1-4503-4490-6
[235]
D. Seyler, T. Dembelova, L. Del Corro, J. Hoffart, and G. Weikum, “KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition,” 2017. [Online]. Available: http://arxiv.org/abs/1709.03544. (arXiv: 1709.03544)
Abstract
KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.
Export
BibTeX
@online{Seyler_arXiv2017, TITLE = {{KnowNER}: Incremental Multilingual {Knowledge} in {Named Entity Recognition}}, AUTHOR = {Seyler, Dominic and Dembelova, Tatiana and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.03544}, EPRINT = {1709.03544}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.}, }
Endnote
%0 Report %A Seyler, Dominic %A Dembelova, Tatiana %A Del Corro, Luciano %A Hoffart, Johannes %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0693-D %U http://arxiv.org/abs/1709.03544 %D 2017 %X KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them. %K Computer Science, Computation and Language, cs.CL
[236]
L. Soldaini, A. Yates, and N. Goharian, “Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Soldaini_CIKM2017, TITLE = {Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model}, AUTHOR = {Soldaini, Luca and Yates, Andrew and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133149}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {2307--2310}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Soldaini, Luca %A Yates, Andrew %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model : %G eng %U http://hdl.handle.net/21.11116/0000-0002-02F8-4 %R 10.1145/3132847.3133149 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 2307 - 2310 %I ACM %@ 978-1-4503-4918-5
[237]
L. Soldaini, A. Yates, and N. Goharian, “Learning to Reformulate Long Queries for Clinical Decision Support,” Journal of the Association for Information Science and Technology, vol. 68, no. 11, 2017.
Export
BibTeX
@article{Soldaini2017, TITLE = {Learning to Reformulate Long Queries for Clinical Decision Support}, AUTHOR = {Soldaini, Luca and Yates, Andrew and Goharian, Nazli}, LANGUAGE = {eng}, ISSN = {2330-1635}, DOI = {10.1002/asi.23924}, PUBLISHER = {Wiley}, ADDRESS = {Chichester, UK}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {Journal of the Association for Information Science and Technology}, VOLUME = {68}, NUMBER = {11}, PAGES = {2602--2619}, }
Endnote
%0 Journal Article %A Soldaini, Luca %A Yates, Andrew %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Learning to Reformulate Long Queries for Clinical Decision Support : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-2723-C %R 10.1002/asi.23924 %7 2017-09-14 %D 2017 %8 14.09.2017 %J Journal of the Association for Information Science and Technology %O asis&t %V 68 %N 11 %& 2602 %P 2602 - 2619 %I Wiley %C Chichester, UK %@ false
[238]
J. Stoyanovich, B. Howe, S. Abiteboul, G. Miklau, A. Sahuguet, and G. Weikum, “Fides: Towards a Platform for Responsible Data Science,” in 29th International Conference on Scientific and Statistical Database Management (SSDBM 2017), Chicago, IL, USA, 2017.
Export
BibTeX
@inproceedings{StoyanovichSSDBM2017, TITLE = {Fides: {T}owards a Platform for Responsible Data Science}, AUTHOR = {Stoyanovich, Julia and Howe, Bill and Abiteboul, Serge and Miklau, Gerome and Sahuguet, Arnaud and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5282-6}, DOI = {10.1145/3085504.3085530}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {29th International Conference on Scientific and Statistical Database Management (SSDBM 2017)}, EID = {26}, ADDRESS = {Chicago, IL, USA}, }
Endnote
%0 Conference Proceedings %A Stoyanovich, Julia %A Howe, Bill %A Abiteboul, Serge %A Miklau, Gerome %A Sahuguet, Arnaud %A Weikum, Gerhard %+ External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Fides: Towards a Platform for Responsible Data Science : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80BA-B %R 10.1145/3085504.3085530 %D 2017 %B 29th International Conference on Scientific and Statistical Database Management %Z date of event: 2017-06-27 - 2017-06-29 %C Chicago, IL, USA %B 29th International Conference on Scientific and Statistical Database Management %Z sequence number: 26 %I ACM %@ 978-1-4503-5282-6
[239]
N. Tandon, G. de Melo, and G. Weikum, “WebChild 2.0: Fine-Grained Commonsense Knowledge Distillation,” in The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 2017.
Export
BibTeX
@inproceedings{TandonACL2017, TITLE = {{WebChild} 2.0: {F}ine-Grained Commonsense Knowledge Distillation}, AUTHOR = {Tandon, Niket and de Melo, Gerard and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-76-0}, DOI = {10.18653/v1/P17-4020}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)}, PAGES = {115--120}, ADDRESS = {Vancouver, Canada}, }
Endnote
%0 Conference Proceedings %A Tandon, Niket %A de Melo, Gerard %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T WebChild 2.0: Fine-Grained Commonsense Knowledge Distillation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-FAC3-A %R 10.18653/v1/P17-4020 %D 2017 %B The 55th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2017-07-30 - 2017-08-04 %C Vancouver, Canada %B The 55th Annual Meeting of the Association for Computational Linguistics %P 115 - 120 %I ACL %@ 978-1-945626-76-0
[240]
C. Teflioudi and R. Gemulla, “Exact and Approximate Maximum Inner Product Search with LEMP,” ACM Transactions on Database Systems, vol. 42, no. 1, 2017.
Export
BibTeX
@article{Teflioudi:2016:EAM:3015779.2996452, TITLE = {Exact and Approximate Maximum Inner Product Search with {LEMP}}, AUTHOR = {Teflioudi, Christina and Gemulla, Rainer}, LANGUAGE = {eng}, ISSN = {0362-5915}, DOI = {10.1145/2996452}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, JOURNAL = {ACM Transactions on Database Systems}, VOLUME = {42}, NUMBER = {1}, EID = {5}, }
Endnote
%0 Journal Article %A Teflioudi, Christina %A Gemulla, Rainer %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Exact and Approximate Maximum Inner Product Search with LEMP : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-349C-B %R 10.1145/2996452 %7 2016 %D 2017 %J ACM Transactions on Database Systems %O TODS %V 42 %N 1 %Z sequence number: 5 %I ACM %C New York, NY %@ false
[241]
E. N. Toosi, “A New Efficient and Scalable Algorithm for Boolean Matrix Factorization,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{ToosiMsc2017, TITLE = {A New Efficient and Scalable Algorithm for {Boolean} Matrix Factorization}, AUTHOR = {Toosi, Ehsan Nadjaran}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Toosi, Ehsan Nadjaran %Y Miettinen, Pauli %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T A New Efficient and Scalable Algorithm for Boolean Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90D5-E %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P X, 70 p. %V master %9 master
[242]
H. D. Tran, D. Stepanova, M. Gad-Elrab, F. A. Lisi, and G. Weikum, “Towards Nonmonotonic Relational Learning from Knowledge Graphs,” in Inductive Logic Programming (ILP 2016), London, UK, 2017.
Export
BibTeX
@inproceedings{TranILP2016, TITLE = {Towards Nonmonotonic Relational Learning from Knowledge Graphs}, AUTHOR = {Tran, Hai Dang and Stepanova, Daria and Gad-Elrab, Mohamed and Lisi, Francesca A. and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-63341-1}, DOI = {10.1007/978-3-319-63342-8_8}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Inductive Logic Programming (ILP 2016)}, EDITOR = {Cussens, James and Russo, Alessandra}, PAGES = {94--107}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {10326}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Tran, Hai Dang %A Stepanova, Daria %A Gad-Elrab, Mohamed %A Lisi, Francesca A. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Towards Nonmonotonic Relational Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2DB1-E %R 10.1007/978-3-319-63342-8_8 %D 2017 %B 26th International Conference on Inductive Logic Programming %Z date of event: 2016-09-04 - 2016-09-06 %C London, UK %B Inductive Logic Programming %E Cussens, James; Russo, Alessandra %P 94 - 107 %I Springer %@ 978-3-319-63341-1 %B Lecture Notes in Artificial Intelligence %N 10326
[243]
H. D. Tran, “An Approach to Nonmonotonic Relational Learning from Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{TranMSc2017, TITLE = {An Approach to Nonmonotonic Relational Learning from Knowledge Graphs}, AUTHOR = {Tran, Hai Dang}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Tran, Hai Dang %Y Stepanova, Daria %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Approach to Nonmonotonic Relational Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-845A-3 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P XV, 48 p. %V master %9 master
[244]
G. Weikum, “What Computers Should Know, Shouldn’t Know, and Shouldn’t Believe,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{WeikumWWW2017, TITLE = {What Computers Should Know, Shouldn{\textquoteright}t Know, and Shouldn{\textquoteright}t Believe}, AUTHOR = {Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3051120}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {1559--1560}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T What Computers Should Know, Shouldn&#8217;t Know, and Shouldn&#8217;t Believe : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-7DA0-5 %R 10.1145/3041021.3051120 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 1559 - 1560 %I ACM %@ 978-1-4503-4914-7
[245]
A. Yates and K. Hui, “DE-PACRR: Exploring Layers Inside the PACRR Model,” 2017. [Online]. Available: http://arxiv.org/abs/1706.08746. (arXiv: 1706.08746)
Abstract
Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable.
Export
BibTeX
@online{Yates_arXiv2017, TITLE = {{DE}-{PACRR}: Exploring Layers Inside the {PACRR} Model}, AUTHOR = {Yates, Andrew and Hui, Kai}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1706.08746}, EPRINT = {1706.08746}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable.}, }
Endnote
%0 Report %A Yates, Andrew %A Hui, Kai %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T DE-PACRR: Exploring Layers Inside the PACRR Model : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06BE-D %U http://arxiv.org/abs/1706.08746 %D 2017 %X Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[246]
A. Yates, A. Cohan, and N. Goharian, “Depression and Self-Harm Risk Assessment in Online Forums,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{YatesENMLP2017, TITLE = {Depression and Self-Harm Risk Assessment in Online Forums}, AUTHOR = {Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {978-1-945626-83-8}, URL = {https://aclanthology.info/pdf/D/D17/D17-1321.pdf}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {2958--2968}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Depression and Self-Harm Risk Assessment in Online Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06A0-D %U https://aclanthology.info/pdf/D/D17/D17-1321.pdf %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 2958 - 2968 %I ACL %@ 978-1-945626-83-8 %U https://aclanthology.info/pdf/D/D17/D17-1321.pdf
[247]
A. Yates, A. Cohan, and N. Goharian, “Depression and Self-Harm Risk Assessment in Online Forums,” 2017. [Online]. Available: http://arxiv.org/abs/1709.01848. (arXiv: 1709.01848)
Abstract
Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.
Export
BibTeX
@online{Yates_arXiv2017b, TITLE = {Depression and Self-Harm Risk Assessment in Online Forums}, AUTHOR = {Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.01848}, EPRINT = {1709.01848}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.}, }
Endnote
%0 Report %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Depression and Self-Harm Risk Assessment in Online Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06C8-6 %U http://arxiv.org/abs/1709.01848 %D 2017 %X Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset. %K Computer Science, Computation and Language, cs.CL
[248]
D. Zeinalipour-Yazti and C. Laoudias, “The Anatomy of the Anyplace Indoor Navigation Service,” SIGSPATIAL Special, vol. 9, no. 2, 2017.
Export
BibTeX
@article{Zeinalipour-Yazti:2017:AAI:3151123.3151125, TITLE = {The Anatomy of the Anyplace Indoor Navigation Service}, AUTHOR = {Zeinalipour-Yazti, Demetrios and Laoudias, Christos}, LANGUAGE = {eng}, ISSN = {1946-7729}, DOI = {10.1145/3151123.3151125}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, JOURNAL = {SIGSPATIAL Special}, VOLUME = {9}, NUMBER = {2}, PAGES = {3--10}, }
Endnote
%0 Journal Article %A Zeinalipour-Yazti, Demetrios %A Laoudias, Christos %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T The Anatomy of the Anyplace Indoor Navigation Service : %G eng %U http://hdl.handle.net/21.11116/0000-0002-CA02-8 %R 10.1145/3151123.3151125 %7 2017 %D 2017 %J SIGSPATIAL Special %V 9 %N 2 %& 3 %P 3 - 10 %I ACM %C New York, NY %@ false
[249]
Y. Zhang, M. Humbert, B. Surma, P. Manoharan, J. Vreeken, and M. Backes, “CTRL+Z: Recovering Anonymized Social Graphs,” 2017. [Online]. Available: http://arxiv.org/abs/1711.05441. (arXiv: 1711.05441)
Abstract
Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods.
Export
BibTeX
@online{Zhang1711.05441, TITLE = {{CTRL}+Z: Recovering Anonymized Social Graphs}, AUTHOR = {Zhang, Yang and Humbert, Mathias and Surma, Bartlomiej and Manoharan, Praveen and Vreeken, Jilles and Backes, Michael}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1711.05441}, EPRINT = {1711.05441}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods.}, }
Endnote
%0 Report %A Zhang, Yang %A Humbert, Mathias %A Surma, Bartlomiej %A Manoharan, Praveen %A Vreeken, Jilles %A Backes, Michael %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T CTRL+Z: Recovering Anonymized Social Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0000-6463-0 %U http://arxiv.org/abs/1711.05441 %D 2017 %X Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods. %K Computer Science, Cryptography and Security, cs.CR,cs.SI
[250]
D. Ziegler, “Answer Type Prediction for Question Answering over Knowledge Bases,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{ZieglerMSc2017, TITLE = {Answer Type Prediction for Question Answering over Knowledge Bases}, AUTHOR = {Ziegler, David}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Ziegler, David %Y Abujabal, Abdalghani %A referee: Saha Roy, Rishiraj %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Answer Type Prediction for Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8F38-A %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2017 %P X, 48 p. %V master %9 master
[251]
D. Ziegler, A. Abujabal, R. Saha Roy, and G. Weikum, “Efficiency-aware Answering of Compositional Questions using Answer Type Prediction,” in The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, 2017.
Export
BibTeX
@inproceedings{ZieglerIJCNLP2017, TITLE = {Efficiency-aware Answering of Compositional Questions using Answer Type Prediction}, AUTHOR = {Ziegler, David and Abujabal, Abdalghani and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-948087-01-8}, URL = {http://aclweb.org/anthology/I17-2038}, PUBLISHER = {Asian Federation of Natural Language Processing}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)}, PAGES = {222--227}, ADDRESS = {Taipei, Taiwan}, }
Endnote
%0 Conference Proceedings %A Ziegler, David %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiency-aware Answering of Compositional Questions using Answer Type Prediction : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B5F-5 %U http://aclweb.org/anthology/I17-2038 %D 2017 %B 8th International Joint Conference on Natural Language Processing %Z date of event: 2017-11-27 - 2017-12-01 %C Taipei, Taiwan %B The 8th International Joint Conference on Natural Language Processing %P 222 - 227 %I Asian Federation of Natural Language Processing %@ 978-1-948087-01-8
2016
[252]
S. Abiteboul, G. Miklau, J. Stoyanovich, and G. Weikum, Eds., Data, Responsibly, no. 7. Schloss Dagstuhl, 2016.
Export
BibTeX
@proceedings{AbiteboulDagstuhl2016, TITLE = {Data, Responsibly (Dagstuhl Seminar 16291)}, EDITOR = {Abiteboul, Serge and Miklau, Gerome and Stoyanovich, Julia and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {2192-5283}, URL = {urn:nbn:de:0030-drops-67644}, DOI = {10.4230/DagRep.6.7.42}, PUBLISHER = {Schloss Dagstuhl}, YEAR = {2016}, PAGES = {30 p.}, SERIES = {Dagstuhl Reports}, VOLUME = {6}, ISSUE = {7}, ADDRESS = {Wadern, Germany}, }
Endnote
%0 Conference Proceedings %E Abiteboul, Serge %E Miklau, Gerome %E Stoyanovich, Julia %E Weikum, Gerhard %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Data, Responsibly : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-500A-2 %R 10.4230/DagRep.6.7.42 %U urn:nbn:de:0030-drops-67644 %I Schloss Dagstuhl %D 2016 %B Dagstuhl Seminar 16291 "Data, Responsibly" %Z date of event: 2016-07-17 - 2016-07-22 %D 2016 %C Wadern, Germany %P 30 p. %K Data responsibly, Big data, Machine bias, Data analysis, Data management, Data mining, Fairness, Diversity, Accountability, Transparency, Personal %S Dagstuhl Reports %V 6 %P 42 - 71 %@ false %U http://drops.dagstuhl.de/opus/volltexte/2016/6764/http://drops.dagstuhl.de/doku/urheberrecht1.html
[253]
K. Athukorala, D. Głowack, G. Jacucci, A. Oulasvirta, and J. Vreeken, “Is Exploratory Search Different? A Comparison of Information Search Behavior for Exploratory and Lookup Tasks,” Journal of the Association for Information Science and Technology, vol. 67, no. 11, 2016.