2019
[1]
M. Abouhamra, “AligNarr: Aligning Narratives of Different Length for Movie Summarization,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
Export
BibTeX
@mastersthesis{AbouhamraMSc2019, TITLE = {{AligNarr}: Aligning Narratives of Different Length for Movie Summarization}, AUTHOR = {Abouhamra, Mostafa}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.}, }
Endnote
%0 Thesis %A Abouhamra, Mostafa %Y Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T AligNarr: Aligning Narratives of Different Length for Movie Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0004-5836-D %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 54 p. %V master %9 master %X Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
[2]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,” in The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), Minneapolis, MN, USA, 2019.
Export
BibTeX
@inproceedings{abujabal19comqa, TITLE = {{ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-950737-13-0}, URL = {https://www.aclweb.org/anthology/N19-1027}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019)}, EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar}, PAGES = {307--317}, ADDRESS = {Minneapolis, MN, USA}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters : %G eng %U http://hdl.handle.net/21.11116/0000-0003-11A7-D %U https://www.aclweb.org/anthology/N19-1027 %D 2019 %B Annual Conference of the North American Chapter of the Association for Computational Linguistics %Z date of event: 2019-06-02 - 2019-06-07 %C Minneapolis, MN, USA %B The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies %E Burstein, Jill; Doran, Christy; Solorio, Thamar %P 307 - 317 %I ACL %@ 978-1-950737-13-0 %U https://www.aclweb.org/anthology/N19-1027
[3]
A. Abujabal and K. Berberich, “Question Answering over Knowledge Bases with Continuous Learning,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.
Export
BibTeX
@phdthesis{Abujabalphd2013, TITLE = {Question Answering over Knowledge Bases with Continuous Learning}, AUTHOR = {Abujabal, Abdalghani and Berberich, Klaus}, LANGUAGE = {eng}, DOI = {10.22028/D291-27968}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.}, }
Endnote
%0 Thesis %A Abujabal, Abdalghani %Y Weikum, Gerhard %A referee: Linn, Jimmy %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Question Answering over Knowledge Bases with Continuous Learning : %G eng %U http://hdl.handle.net/21.11116/0000-0003-AEC0-0 %R 10.22028/D291-27968 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 141 p. %V phd %9 phd %X Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27438
[4]
M. Alikhani, S. Nag Chowdhury, G. de Melo, and M. Stone, “CITE: A Corpus Of Text-Image Discourse Relations,” in Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), Minneapolis, MN, USA, 2019.
Export
BibTeX
@inproceedings{AlikhaniEtAl2019CITETextImageDiscourse, TITLE = {{CITE}: {A} Corpus Of Text-Image Discourse Relations}, AUTHOR = {Alikhani, Malihe and Nag Chowdhury, Sreyasi and de Melo, Gerard and Stone, Matthew}, LANGUAGE = {eng}, ISBN = {978-1-950737-13-0}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019)}, EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar}, PAGES = {570--575}, ADDRESS = {Minneapolis, MN, USA}, }
Endnote
%0 Conference Proceedings %A Alikhani, Malihe %A Nag Chowdhury, Sreyasi %A de Melo, Gerard %A Stone, Matthew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CITE: A Corpus Of Text-Image Discourse Relations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-78D8-3 %D 2019 %B Annual Conference of the North American Chapter of the Association for Computational Linguistics %Z date of event: 2019-06-02 - 2019-06-07 %C Minneapolis, MN, USA %B Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics %E Burstein, Jill; Doran, Christy; Solorio, Thamar %P 570 - 575 %I ACL %@ 978-1-950737-13-0 %U https://aclweb.org/anthology/papers/N/N19/N19-1056/
[5]
S. Arora and A. Yates, “Investigating Retrieval Method Selection with Axiomatic Features,” in Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (ECIR 2019) (AMIR 2019), Cologne, Germany, 2019.
Export
BibTeX
@inproceedings{Arora_AMIR2019, TITLE = {Investigating Retrieval Method Selection with Axiomatic Features}, AUTHOR = {Arora, Siddhant and Yates, Andrew}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-2360-3}, PUBLISHER = {CEUR-WS}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (ECIR 2019) (AMIR 2019)}, EDITOR = {Beel, Joeran and Kolthoff, Lars}, PAGES = {18--31}, EID = {4}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2360}, ADDRESS = {Cologne, Germany}, }
Endnote
%0 Conference Proceedings %A Arora, Siddhant %A Yates, Andrew %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Investigating Retrieval Method Selection with Axiomatic Features : %G eng %U http://hdl.handle.net/21.11116/0000-0004-028E-A %D 2019 %B The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval %Z date of event: 2019-04-14 - 2019-04-14 %C Cologne, Germany %B Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (ECIR 2019) %E Beel, Joeran; Kolthoff, Lars %P 18 - 31 %Z sequence number: 4 %I CEUR-WS %B CEUR Workshop Proceedings %N 2360 %@ false %U http://ceur-ws.org/Vol-2360/paper4Axiomatic.pdf
[6]
S. Arora and A. Yates, “Investigating Retrieval Method Selection with Axiomatic Features,” 2019. [Online]. Available: http://arxiv.org/abs/1904.05737. (arXiv: 1904.05737)
Abstract
We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior.
Export
BibTeX
@online{Arora_arXiv1904.05737, TITLE = {Investigating Retrieval Method Selection with Axiomatic Features}, AUTHOR = {Arora, Siddhant and Yates, Andrew}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.05737}, EPRINT = {1904.05737}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior.}, }
Endnote
%0 Report %A Arora, Siddhant %A Yates, Andrew %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Investigating Retrieval Method Selection with Axiomatic Features : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02BF-3 %U http://arxiv.org/abs/1904.05737 %D 2019 %X We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior. %K Computer Science, Information Retrieval, cs.IR
[7]
A. Chakraborty, N. Mota, A. J. Biega, K. P. Gummadi, and H. Heidari, “On the Impact of Choice Architectures on Inequality in Online Donation Platforms,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Chakraborty_WWW2019, TITLE = {On the Impact of Choice Architectures on Inequality in Online Donation Platforms}, AUTHOR = {Chakraborty, Abhijnan and Mota, Nuno and Biega, Asia J. and Gummadi, Krishna P. and Heidari, Hoda}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313663}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, PAGES = {2623--2629}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Chakraborty, Abhijnan %A Mota, Nuno %A Biega, Asia J. %A Gummadi, Krishna P. %A Heidari, Hoda %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T On the Impact of Choice Architectures on Inequality in Online Donation Platforms : %G eng %U http://hdl.handle.net/21.11116/0000-0002-FC88-9 %R 10.1145/3308558.3313663 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %P 2623 - 2629 %I ACM %@ 978-1-4503-6674-8
[8]
F. Chierichetti, R. Kumar, A. Panconesi, and E. Terolli, “On the Distortion of Locality Sensitive Hashing,” SIAM Journal on Computing, vol. 48, no. 2, 2019.
Export
BibTeX
@article{Chierichetti2019, TITLE = {On the Distortion of Locality Sensitive Hashing}, AUTHOR = {Chierichetti, Flavio and Kumar, Ravi and Panconesi, Alessandro and Terolli, Erisa}, LANGUAGE = {eng}, ISSN = {0097-5397}, DOI = {10.1137/17M1127752}, PUBLISHER = {SIAM}, ADDRESS = {Philadelphia, PA}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {SIAM Journal on Computing}, VOLUME = {48}, NUMBER = {2}, PAGES = {350--372}, }
Endnote
%0 Journal Article %A Chierichetti, Flavio %A Kumar, Ravi %A Panconesi, Alessandro %A Terolli, Erisa %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T On the Distortion of Locality Sensitive Hashing : %G eng %U http://hdl.handle.net/21.11116/0000-0003-A7E7-C %R 10.1137/17M1127752 %7 2019 %D 2019 %J SIAM Journal on Computing %V 48 %N 2 %& 350 %P 350 - 372 %I SIAM %C Philadelphia, PA %@ false
[9]
C. X. Chu, S. Razniewski, and G. Weikum, “TiFi: Taxonomy Induction for Fictional Domains,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Chu_WWW2019, TITLE = {{TiFi}: {T}axonomy Induction for Fictional Domains}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313519}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {2673--2679}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TiFi: Taxonomy Induction for Fictional Domains : %G eng %U http://hdl.handle.net/21.11116/0000-0003-6558-9 %R 10.1145/3308558.3313519 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 2673 - 2679 %I ACM %@ 978-1-4503-6674-8
[10]
C. X. Chu, S. Razniewski, and G. Weikum, “TiFi: Taxonomy Induction for Fictional Domains [Extended version],” 2019. [Online]. Available: http://arxiv.org/abs/1901.10263. (arXiv: 1901.10263)
Abstract
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.
Export
BibTeX
@online{Chu_arXIv1901.10263, TITLE = {{TiFi}: Taxonomy Induction for Fictional Domains [Extended version]}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1901.10263}, EPRINT = {1901.10263}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.}, }
Endnote
%0 Report %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TiFi: Taxonomy Induction for Fictional Domains [Extended version] : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FE67-C %U http://arxiv.org/abs/1901.10263 %D 2019 %X Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR
[11]
I. Dikeoulias, J. Strötgen, and S. Razniewski, “Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties,” in Companion of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Dikeoulias_WWW2019, TITLE = {Epitaph or Breaking News? {A}nalyzing and Predicting the Stability of Knowledge Base Properties}, AUTHOR = {Dikeoulias, Ioannis and Str{\"o}tgen, Jannik and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-1-4503-6675-5}, DOI = {10.1145/3308560.3314998}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {1155--1158}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Dikeoulias, Ioannis %A Strötgen, Jannik %A Razniewski, Simon %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties : %G eng %U http://hdl.handle.net/21.11116/0000-0004-0281-7 %R 10.1145/3308560.3314998 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Companion of The World Wide Web Conference %E McAuley, Julian %P 1155 - 1158 %I ACM %@ 978-1-4503-6675-5
[12]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “Tracy: Tracing Facts over Knowledge Graphs and Text,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Gad-Elrab_WWW2019, TITLE = {Tracy: {T}racing Facts over Knowledge Graphs and Text}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3314126}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {3516--3520}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Tracy: Tracing Facts over Knowledge Graphs and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-08AA-5 %R 10.1145/3308558.3314126 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 3516 - 3520 %I ACM %@ 978-1-4503-6674-8
[13]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text ,” in WSDM’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Gad-Elrab_WSDM2019, TITLE = {{ExFaKT}: {A} Framework for Explaining Facts over Knowledge Graphs and Text}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290996}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM'19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {87--95}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text  : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9C44-2 %R 10.1145/3289600.3290996 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM'19 %P 87 - 95 %I ACM %@ 978-1-4503-5940-5
[14]
A. Ghazimatin, R. Saha Roy, and G. Weikum, “FAIRY: A Framework for Understanding Relationships between Users’ Actions and their Social Feeds,” in WSDM’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Ghazimatin_WSDM2019, TITLE = {{FAIRY}: {A} Framework for Understanding Relationships between Users' Actions and their Social Feeds}, AUTHOR = {Ghazimatin, Azin and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290990}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM'19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {240--248}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Ghazimatin, Azin %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T FAIRY: A Framework for Understanding Relationships between Users' Actions and their Social Feeds : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9BD9-B %R 10.1145/3289600.3290990 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM'19 %P 240 - 248 %I ACM %@ 978-1-4503-5940-5
[15]
A. Guimarães, O. Balalau, E. Terolli, and G. Weikum, “Analyzing the Traits and Anomalies of Political Discussions on Reddit,” in Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019), Munich, Germany, 2019.
Export
BibTeX
@inproceedings{Guimaraes_ICWSM2019, TITLE = {Analyzing the Traits and Anomalies of Political Discussions on {R}eddit}, AUTHOR = {Guimar{\~a}es, Anna and Balalau, Oana and Terolli, Erisa and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {2334-0770}, PUBLISHER = {AAAI}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019)}, PAGES = {205--213}, ADDRESS = {Munich, Germany}, }
Endnote
%0 Conference Proceedings %A Guimarães, Anna %A Balalau, Oana %A Terolli, Erisa %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Analyzing the Traits and Anomalies of Political Discussions on Reddit : %G eng %U http://hdl.handle.net/21.11116/0000-0003-3649-F %D 2019 %B 13th International Conference on Web and Social Media %Z date of event: 2019-06-11 - 2019-06-14 %C Munich, Germany %B Proceedings of the Thirteenth International Conference on Web and Social Media %P 205 - 213 %I AAAI %@ false
[16]
D. Gupta and K. Berberich, “Structured Search in Annotated Document Collections,” in WSDM’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Gupta_WSDM2019Demo, TITLE = {Structured Search in Annotated Document Collections}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290618}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM'19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {794--797}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Structured Search in Annotated Document Collections : Demo paper %G eng %U http://hdl.handle.net/21.11116/0000-0002-A8D6-F %R 10.1145/3289600.3290618 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM'19 %P 794 - 797 %I ACM %@ 978-1-4503-5940-5
[17]
D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti, “Generating Semantic Aspects for Queries,” in The Semantic Web (ESWC 2019), Portorož, Slovenia, 2019.
Export
BibTeX
@inproceedings{GuptaESWC2019, TITLE = {Generating Semantic Aspects for Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-3-030-21347-3}, DOI = {10.1007/978-3-030-21348-0_11}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {The Semantic Web (ESWC 2019)}, EDITOR = {Hitzler, Pascal and Fern{\'a}ndez, Miriam and Janowicz, Krzysztof and Zaveri, Amrapali and Gray, Alasdair J. G. and Lopez, Vanessa and Haller, Armin and Hammar, Karl}, PAGES = {162--178}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11503}, ADDRESS = {Portoro{\v z}, Slovenia}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %A Strötgen, Jannik %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Generating Semantic Aspects for Queries : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FF5F-5 %R 10.1007/978-3-030-21348-0_11 %D 2019 %B 16th Extended Semantic Web Conference %Z date of event: 2019-06-02 - 2019-06-06 %C Portorož, Slovenia %B The Semantic Web %E Hitzler, Pascal; Fernández, Miriam; Janowicz, Krzysztof; Zaveri, Amrapali; Gray, Alasdair J. G.; Lopez, Vanessa; Haller, Armin; Hammar, Karl %P 162 - 178 %I Springer %@ 978-3-030-21347-3 %B Lecture Notes in Computer Science %N 11503
[18]
M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo, “Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” 2019. [Online]. Available: http://arxiv.org/abs/1904.01451. (arXiv: 1904.01451)
Abstract
Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.
Export
BibTeX
@online{Hedderich_arXiv1904.01451, TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries}, AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.01451}, EPRINT = {1904.01451}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.}, }
Endnote
%0 Report %A Hedderich, Michael A. %A Yates, Andrew %A Klakow, Dietrich %A de Melo, Gerard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02B4-E %U http://arxiv.org/abs/1904.01451 %D 2019 %X Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well. %K Computer Science, Computation and Language, cs.CL,Computer Science, Learning, cs.LG
[19]
M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo, “Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” in Proceedings of the 13th International Conference on Computational Semantics - Long Papers (IWCS 2019), Gothenburg, Sweden, 2019.
Export
BibTeX
@inproceedings{Hedderich_IWCS2019, TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries}, AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-950737-19-2}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 13th International Conference on Computational Semantics -- Long Papers (IWCS 2019)}, EDITOR = {Dobnik, Simon and Chatzikyriakidis, Stergios and Demberg, Vera}, PAGES = {247--258}, ADDRESS = {Gothenburg, Sweden}, }
Endnote
%0 Conference Proceedings %A Hedderich, Michael A. %A Yates, Andrew %A Klakow, Dietrich %A de Melo, Gerard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02A4-0 %D 2019 %B 13th International Conference on Computational Semantics %Z date of event: 2019-05-23 - 2019-05-27 %C Gothenburg, Sweden %B Proceedings of the 13th International Conference on Computational Semantics - Long Papers %E Dobnik, Simon; Chatzikyriakidis, Stergios; Demberg, Vera %P 247 - 258 %I ACL %@ 978-1-950737-19-2 %U https://www.aclweb.org/anthology/W19-0421
[20]
Y. Ibrahim, M. Riedewald, G. Weikum, and D. Zeinalipour-Yazti, “Bridging Quantities in Tables and Text,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.
Export
BibTeX
@inproceedings{Ibrahim_ICDE2019, TITLE = {Bridging Quantities in Tables and Text}, AUTHOR = {Ibrahim, Yusra and Riedewald, Mirek and Weikum, Gerhard and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-7474-1}, DOI = {10.1109/ICDE.2019.00094}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering}, PAGES = {1010--1021}, ADDRESS = {Macau, China}, }
Endnote
%0 Conference Proceedings %A Ibrahim, Yusra %A Riedewald, Mirek %A Weikum, Gerhard %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Bridging Quantities in Tables and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-01AB-B %R 10.1109/ICDE.2019.00094 %D 2019 %B 35th IEEE International Conference on Data Engineering %Z date of event: 2019-04-08 - 2019-04-12 %C Macau, China %B ICDE 2019 %P 1010 - 1021 %I IEEE %@ 978-1-5386-7474-1
[21]
Y. Ibrahim and G. Weikum, “ExQuisiTe: Explaining Quantities in Text,” in Proceedings of the World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Ibrahim_WWW2019, TITLE = {{ExQuisiTe}: {E}xplaining Quantities in Text}, AUTHOR = {Ibrahim, Yusra and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3314134}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {3541--3544}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ibrahim, Yusra %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ExQuisiTe: Explaining Quantities in Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-01B3-1 %R 10.1145/3308558.3314134 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of the World Wide Web Conference %E McAuley, Julian %P 3541 - 3544 %I ACM %@ 978-1-4503-6674-8
[22]
D. Kaltenpoth and J. Vreeken, “We Are Not Your Real Parents: Telling Causal from Confounded by MDL,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.
Export
BibTeX
@inproceedings{Kaltenpoth_SDM2019, TITLE = {We Are Not Your Real Parents: {T}elling Causal from Confounded by {MDL}}, AUTHOR = {Kaltenpoth, David and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-567-3}, DOI = {10.1137/1.9781611975673.23}, PUBLISHER = {SIAM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)}, EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh}, PAGES = {199--207}, ADDRESS = {Calgary, Canada}, }
Endnote
%0 Conference Proceedings %A Kaltenpoth, David %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T We Are Not Your Real Parents: Telling Causal from Confounded by MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D37-2 %R 10.1137/1.9781611975673.23 %D 2019 %B SIAM International Conference on Data Mining %Z date of event: 2019-05-02 - 2019-05-04 %C Calgary, Canada %B Proceedings of the 2019 SIAM International Conference on Data Mining %E Berger-Wolf, Tanya; Chawla, Nitesh %P 199 - 207 %I SIAM %@ 978-1-61197-567-3
[23]
D. Kaltenpoth and J. Vreeken, “We Are Not Your Real Parents: Telling Causal from Confounded using MDL,” 2019. [Online]. Available: http://arxiv.org/abs/1901.06950. (arXiv: 1901.06950)
Abstract
Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence.
Export
BibTeX
@online{Kaltenpoth_arXiv1901.06950, TITLE = {We Are Not Your Real Parents: Telling Causal from Confounded using {MDL}}, AUTHOR = {Kaltenpoth, David and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1901.06950}, EPRINT = {1901.06950}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence.}, }
Endnote
%0 Report %A Kaltenpoth, David %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T We Are Not Your Real Parents: Telling Causal from Confounded using MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FFEE-3 %U http://arxiv.org/abs/1901.06950 %D 2019 %X Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence. %K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
[24]
S. Karaev and P. Miettinen, “Algorithms for Approximate Subtropical Matrix Factorization,” Data Mining and Knowledge Discovery, vol. 33, no. 2, 2019.
Export
BibTeX
@article{Karaev_DMKD2018, TITLE = {Algorithms for Approximate Subtropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1007/s10618-018-0599-1}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Data Mining and Knowledge Discovery}, VOLUME = {33}, NUMBER = {2}, PAGES = {526--576}, }
Endnote
%0 Journal Article %A Karaev, Sanjar %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Algorithms for Approximate Subtropical Matrix Factorization : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9FD5-B %R 10.1007/s10618-018-0599-1 %7 2018 %D 2019 %J Data Mining and Knowledge Discovery %O DMKD %V 33 %N 2 %& 526 %P 526 - 576 %I Springer %C New York, NY
[25]
A. Konstantinidis, P. Irakleous, Z. Georgiou, D. Zeinalipour-Yazti, and P. K. Chrysanthis, “IoT Data Prefetching in Indoor Navigation SOAs,” ACM Transactions on Internet Technology, vol. 19, no. 1, 2019.
Export
BibTeX
@article{Konstantinidis:2018:IDP:3283809.3177777, TITLE = {{IoT} Data Prefetching in Indoor Navigation {SOAs}}, AUTHOR = {Konstantinidis, Andreas and Irakleous, Panagiotis and Georgiou, Zacharias and Zeinalipour-Yazti, Demetrios and Chrysanthis, Panos K.}, LANGUAGE = {eng}, ISSN = {1533-5399}, DOI = {10.1145/3177777}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {ACM Transactions on Internet Technology}, VOLUME = {19}, NUMBER = {1}, EID = {10}, }
Endnote
%0 Journal Article %A Konstantinidis, Andreas %A Irakleous, Panagiotis %A Georgiou, Zacharias %A Zeinalipour-Yazti, Demetrios %A Chrysanthis, Panos K. %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T IoT Data Prefetching in Indoor Navigation SOAs : %G eng %U http://hdl.handle.net/21.11116/0000-0002-CA09-1 %R 10.1145/3177777 %7 2019 %D 2019 %J ACM Transactions on Internet Technology %O TOIT %V 19 %N 1 %Z sequence number: 10 %I ACM %C New York, NY %@ false
[26]
P. Lahoti, K. Gummadi, and G. Weikum, “iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.
Export
BibTeX
@inproceedings{Lahoti_ICDE2019, TITLE = {{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making}, AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-5386-7474-1}, DOI = {10.1109/ICDE.2019.00121}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering}, PAGES = {1334--1345}, ADDRESS = {Macau, China}, }
Endnote
%0 Conference Proceedings %A Lahoti, Preethi %A Gummadi, Krishna %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making : %G eng %U http://hdl.handle.net/21.11116/0000-0003-F395-2 %R 10.1109/ICDE.2019.00121 %D 2019 %B 35th IEEE International Conference on Data Engineering %Z date of event: 2019-04-08 - 2019-04-12 %C Macau, China %B ICDE 2019 %P 1334 - 1345 %I IEEE %@ 978-1-5386-7474-1
[27]
P. Lahoti, K. P. Gummadi, and G. Weikum, “Operationalizing Individual Fairness with Pairwise Fair Representations,” 2019. [Online]. Available: http://arxiv.org/abs/1907.01439. (arXiv: 1907.01439)
Abstract
We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.
Export
BibTeX
@online{Lahoti_arXiv1907.01439, TITLE = {Operationalizing Individual Fairness with Pairwise Fair Representations}, AUTHOR = {Lahoti, Preethi and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1907.01439}, EPRINT = {1907.01439}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.}, }
Endnote
%0 Report %A Lahoti, Preethi %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Operationalizing Individual Fairness with Pairwise Fair Representations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FF17-5 %U http://arxiv.org/abs/1907.01439 %D 2019 %X We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable. %K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
[28]
X. Lu, S. Pramanik, R. Saha Roy, A. Abujabal, Y. Wang, and G. Weikum, “Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France. (Accepted/in press)
Export
BibTeX
@inproceedings{lu19answering, TITLE = {Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs}, AUTHOR = {Lu, Xiaolu and Pramanik, Soumajit and Saha Roy, Rishiraj and Abujabal, Abdalghani and Wang, Yafang and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2019}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A Lu, Xiaolu %A Pramanik, Soumajit %A Saha Roy, Rishiraj %A Abujabal, Abdalghani %A Wang, Yafang %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0003-7085-8 %D 2019 %B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2019-07-21 - 2019-07-25 %C Paris, France %B Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %I ACM
[29]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Overcoming Low-Utility Facets for Complex Answer Retrieval,” Information Retrieval Journal, vol. 22, no. 3–4, 2019.
Export
BibTeX
@article{MacAvaney2019, TITLE = {Overcoming Low-Utility Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, ISSN = {1386-4564}, DOI = {10.1007/s10791-018-9343-0}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Information Retrieval Journal}, VOLUME = {22}, NUMBER = {3-4}, PAGES = {395--418}, }
Endnote
%0 Journal Article %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Overcoming Low-Utility Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0003-C4A1-9 %R 10.1007/s10791-018-9343-0 %7 2019 %D 2019 %J Information Retrieval Journal %V 22 %N 3-4 %& 395 %P 395 - 418 %I Springer %C New York, NY %@ false
[30]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, “CEDR: Contextualized Embeddings for Document Ranking,” 2019. [Online]. Available: http://arxiv.org/abs/1904.07094. (arXiv: 1904.07094)
Abstract
Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.
Export
BibTeX
@online{MacAvaney_arXiv1904.07094, TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.07094}, EPRINT = {1904.07094}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CEDR: Contextualized Embeddings for Document Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02C7-9 %U http://arxiv.org/abs/1904.07094 %D 2019 %X Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[31]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, “CEDR: Contextualized Embeddings for Document Ranking,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France. (Accepted/in press)
Export
BibTeX
@inproceedings{MacAvaney_SIGIR2019, TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2019}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CEDR: Contextualized Embeddings for Document Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02D3-B %D 2019 %B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2019-07-21 - 2019-07-25 %C Paris, France %B Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %I ACM
[32]
A. Marx and J. Vreeken, “Testing Conditional Independence on Discrete Data using Stochastic Complexity,” in Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019), Naha, Okinawa, Japan, 2019.
Export
BibTeX
@inproceedings{Marx_AISTATS2019, TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, PUBLISHER = {PMLR}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)}, EDITOR = {Chaudhuri, Kamalika and Sugiyama, Masashi}, PAGES = {496--505}, SERIES = {Proceedings of the Machine Learning Research}, VOLUME = {89}, ADDRESS = {Naha, Okinawa, Japan}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Testing Conditional Independence on Discrete Data using Stochastic Complexity : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D3C-D %D 2019 %B 22nd International Conference on Artificial Intelligence and Statistics %Z date of event: 2019-04-16 - 2019-04-18 %C Naha, Okinawa, Japan %B Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics %E Chaudhuri, Kamalika; Sugiyama, Masashi %P 496 - 505 %I PMLR %B Proceedings of the Machine Learning Research %N 89 %U http://proceedings.mlr.press/v89/marx19a/marx19a.pdf
[33]
A. Marx and J. Vreeken, “Approximating Algorithmic Conditional Independence for Discrete Data,” in Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI, Stanford, CA, USA. (Accepted/in press)
Export
BibTeX
@inproceedings{Marx_AAAISpringSymp2019, TITLE = {Approximating Algorithmic Conditional Independence for Discrete Data}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, YEAR = {2019}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI}, ADDRESS = {Stanford, CA, USA}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Approximating Algorithmic Conditional Independence for Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D4C-B %D 2019 %B First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI %Z date of event: 2019-05-25 - 2019-05-27 %C Stanford, CA, USA %B Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI
[34]
A. Marx and J. Vreeken, “Causal Inference on Multivariate and Mixed-Type Data,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018), Dublin, Ireland, 2019.
Export
BibTeX
@inproceedings{marx:18:crack, TITLE = {Causal Inference on Multivariate and Mixed-Type Data}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-3-030-10927-1}, DOI = {10.1007/978-3-030-10928-8_39}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018)}, EDITOR = {Berlingerio, Michele and Bonchi, Francesco and G{\"a}rtner, Thomas and Hurley, Neil and Ifrim, Georgiana}, PAGES = {655--671}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {11052}, ADDRESS = {Dublin, Ireland}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference on Multivariate and Mixed-Type Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9E86-5 %R 10.1007/978-3-030-10928-8_39 %D 2019 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2018-09-10 - 2018-09-14 %C Dublin, Ireland %B Machine Learning and Knowledge Discovery in Databases %E Berlingerio, Michele; Bonchi, Francesco; Gärtner, Thomas; Hurley, Neil; Ifrim, Georgiana %P 655 - 671 %I Springer %@ 978-3-030-10927-1 %B Lecture Notes in Artificial Intelligence %N 11052
[35]
A. Marx and J. Vreeken, “Testing Conditional Independence on Discrete Data using Stochastic Complexity,” 2019. [Online]. Available: http://arxiv.org/abs/1903.04829. (arXiv: 1903.04829)
Abstract
Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.
Export
BibTeX
@online{Marx_arXiv1903.04829, TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1903.04829}, EPRINT = {1903.04829}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.}, }
Endnote
%0 Report %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Testing Conditional Independence on Discrete Data using Stochastic Complexity : %G eng %U http://hdl.handle.net/21.11116/0000-0004-027A-1 %U http://arxiv.org/abs/1903.04829 %D 2019 %X Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision. %K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG
[36]
S. Metzler, S. Günnemann, and P. Miettinen, “Stability and Dynamics of Communities on Online Question-Answer Sites,” Social Networks, vol. 58, 2019.
Export
BibTeX
@article{Metzler2019, TITLE = {Stability and Dynamics of Communities on Online Question-Answer Sites}, AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {0378-8733}, DOI = {10.1016/j.socnet.2018.12.004}, PUBLISHER = {Elsevier}, ADDRESS = {Amsterdam}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Social Networks}, VOLUME = {58}, PAGES = {50--58}, }
Endnote
%0 Journal Article %A Metzler, Saskia %A Günnemann, Stephan %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Stability and Dynamics of Communities on Online Question-Answer Sites : %G eng %U http://hdl.handle.net/21.11116/0000-0002-BCC1-0 %R 10.1016/j.socnet.2018.12.004 %7 2019 %D 2019 %J Social Networks %V 58 %& 50 %P 50 - 58 %I Elsevier %C Amsterdam %@ false
[37]
M. Mohanty, M. Ramanath, M. Yahya, and G. Weikum, “Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs,” in Advances in Database Technology (EDBT 2019), Lisbon, Portugal, 2019.
Export
BibTeX
@inproceedings{Mohanty:EDBT2019, TITLE = {{Spec-QP}: {S}peculative Query Planning for Joins over Knowledge Graphs}, AUTHOR = {Mohanty, Madhulika and Ramanath, Maya and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-89318-081-3}, DOI = {10.5441/002/edbt.2019.07}, PUBLISHER = {OpenProceedings.org}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Advances in Database Technology (EDBT 2019)}, EDITOR = {Herschel, Melanie and Galhardas, Helena and Reinwald, Berthold and Fundlaki, Irini and Binning, Carsten and Kaoudi, Zoi}, PAGES = {61--72}, ADDRESS = {Lisbon, Portugal}, }
Endnote
%0 Conference Proceedings %A Mohanty, Madhulika %A Ramanath, Maya %A Yahya, Mohamed %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0003-3A7D-1 %R 10.5441/002/edbt.2019.07 %D 2019 %B 22nd International Conference on Extending Database Technology %Z date of event: 2019-03-26 - 2019-03-29 %C Lisbon, Portugal %B Advances in Database Technology %E Herschel, Melanie; Galhardas, Helena; Reinwald, Berthold; Fundlaki, Irini; Binning, Carsten; Kaoudi, Zoi %P 61 - 72 %I OpenProceedings.org %@ 978-3-89318-081-3
[38]
S. Paramonov, D. Stepanova, and P. Miettinen, “Hybrid ASP-based Approach to Pattern Mining,” Theory and Practice of Logic Programming, vol. 19, no. 4, 2019.
Export
BibTeX
@article{ParamonovTPLP, TITLE = {Hybrid {ASP}-based Approach to Pattern Mining}, AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {1471-0684}, DOI = {10.1017/S1471068418000467}, PUBLISHER = {Cambridge University Press}, ADDRESS = {Cambridge}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Theory and Practice of Logic Programming}, VOLUME = {19}, NUMBER = {4}, PAGES = {505--535}, }
Endnote
%0 Journal Article %A Paramonov, Sergey %A Stepanova, Daria %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hybrid ASP-based Approach to Pattern Mining : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0CC4-3 %R 10.1017/S1471068418000467 %7 2019 %D 2019 %J Theory and Practice of Logic Programming %O TPLP %V 19 %N 4 %& 505 %P 505 - 535 %I Cambridge University Press %C Cambridge %@ false
[39]
J. Romero, S. Razniewski, K. Pal, J. Z. Pan, A. Sakhadeo, and G. Weikum, “Commonsense Properties from Query Logs and Question Answering Forums,” 2019. [Online]. Available: http://arxiv.org/abs/1905.10989. (arXiv: 1905.10989)
Abstract
Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.
Export
BibTeX
@online{Romero_arXiv1905.10989, TITLE = {Commonsense Properties from Query Logs and Question Answering Forums}, AUTHOR = {Romero, Julien and Razniewski, Simon and Pal, Koninika and Pan, Jeff Z. and Sakhadeo, Archit and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1905.10989}, EPRINT = {1905.10989}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.}, }
Endnote
%0 Report %A Romero, Julien %A Razniewski, Simon %A Pal, Koninika %A Pan, Jeff Z. %A Sakhadeo, Archit %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Commonsense Properties from Query Logs and Question Answering Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FEEE-4 %U http://arxiv.org/abs/1905.10989 %D 2019 %X Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[40]
N. Tatti and P. Miettinen, “Boolean Matrix Factorization Meets Consecutive Ones Property,” 2019. [Online]. Available: http://arxiv.org/abs/1901.05797. (arXiv: 1901.05797)
Abstract
Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well.
Export
BibTeX
@online{Tatti_arXiv1901.05797, TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property}, AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli}, URL = {http://arxiv.org/abs/1901.05797}, EPRINT = {1901.05797}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well.}, }
Endnote
%0 Report %A Tatti, Nikolaj %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Boolean Matrix Factorization Meets Consecutive Ones Property : %U http://hdl.handle.net/21.11116/0000-0004-02F0-A %U http://arxiv.org/abs/1901.05797 %D 2019 %X Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well. %K Computer Science, Data Structures and Algorithms, cs.DS,Computer Science, Discrete Mathematics, cs.DM,Computer Science, Learning, cs.LG
[41]
N. Tatti and P. Miettinen, “Boolean Matrix Factorization Meets Consecutive Ones Property,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.
Export
BibTeX
@inproceedings{Tatti_SDM2019, TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property}, AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-61197-567-3}, DOI = {10.1137/1.9781611975673.82}, PUBLISHER = {SIAM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)}, EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh}, PAGES = {729--737}, ADDRESS = {Calgary, Canada}, }
Endnote
%0 Conference Proceedings %A Tatti, Nikolaj %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Boolean Matrix Factorization Meets Consecutive Ones Property : %G eng %U http://hdl.handle.net/21.11116/0000-0004-030A-E %R 10.1137/1.9781611975673.82 %D 2019 %B SIAM International Conference on Data Mining %Z date of event: 2019-05-02 - 2019-05-04 %C Calgary, Canada %B Proceedings of the 2019 SIAM International Conference on Data Mining %E Berger-Wolf, Tanya; Chawla, Nitesh %P 729 - 737 %I SIAM %@ 978-1-61197-567-3
[42]
A. Tigunova, A. Yates, P. Mirza, and G. Weikum, “Listening between the Lines: Learning Personal Attributes from Conversations,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{tigunova2019listening, TITLE = {Listening between the Lines: {L}earning Personal Attributes from Conversations}, AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313498}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {1818--1828}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Tigunova, Anna %A Yates, Andrew %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Listening between the Lines: Learning Personal Attributes from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-1460-A %R 10.1145/3308558.3313498 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 1818 - 1828 %I ACM %@ 978-1-4503-6674-8
[43]
A. Tigunova, A. Yates, P. Mirza, and G. Weikum, “Listening between the Lines: Learning Personal Attributes from Conversations,” 2019. [Online]. Available: http://arxiv.org/abs/1904.10887. (arXiv: 1904.10887)
Abstract
Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.
Export
BibTeX
@online{Tigunova_arXiv1904.10887, TITLE = {Listening between the Lines: Learning Personal Attributes from Conversations}, AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.10887}, EPRINT = {1904.10887}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.}, }
Endnote
%0 Report %A Tigunova, Anna %A Yates, Andrew %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Listening between the Lines: Learning Personal Attributes from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FE7F-2 %U http://arxiv.org/abs/1904.10887 %D 2019 %X Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines. %K Computer Science, Computation and Language, cs.CL
[44]
M. Unterkalmsteiner and A. Yates, “Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation,” in Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019), Essen, Germany, 2019.
Export
BibTeX
@inproceedings{Unterkalmsteiner_NLP4RE2019, TITLE = {Expert-sourcing Domain-specific Knowledge: {The} Case of Synonym Validation}, AUTHOR = {Unterkalmsteiner, Michael and Yates, Andrew}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-2376-8}, PUBLISHER = {CEUR-WS}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019)}, EDITOR = {Dalpiaz, Fabiano and Ferrari, Alessio and Franch, Xavier and Gregory, Sarah and Houdek, Frank and Palomares, Cristina}, EID = {8}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2376}, ADDRESS = {Essen, Germany}, }
Endnote
%0 Conference Proceedings %A Unterkalmsteiner, Michael %A Yates, Andrew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02AE-6 %D 2019 %B 2nd Workshop on Natural Language Processing for Requirements Engineering and NLP Tool Showcase %Z date of event: 2019-03-18 - 2019-03-18 %C Essen, Germany %B Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) %E Dalpiaz, Fabiano; Ferrari, Alessio; Franch, Xavier; Gregory, Sarah; Houdek, Frank; Palomares, Cristina %Z sequence number: 8 %I CEUR-WS %B CEUR Workshop Proceedings %N 2376 %@ false %U http://ceur-ws.org/Vol-2376/NLP4RE19_paper08.pdf
[45]
M. van Leeuwen, P. Chau, J. Vreeken, D. Shahaf, and C. Faloutsos, “Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors,” ACM Transactions on Knowledge Discovery from Data, vol. 13, no. 1, 2019.
Export
BibTeX
@article{vanLeeuwen2019, TITLE = {Addendum to the Special Issue on Interactive Data Exploration and Analytics ({TKDD}, Vol. 12, Iss. 1): Introduction by the Guest Editors}, AUTHOR = {van Leeuwen, Matthijs and Chau, Polo and Vreeken, Jilles and Shahaf, Dafna and Faloutsos, Christos}, LANGUAGE = {eng}, ISSN = {1556-4681}, DOI = {10.1145/3298786}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Knowledge Discovery from Data}, VOLUME = {13}, NUMBER = {1}, EID = {13}, }
Endnote
%0 Journal Article %A van Leeuwen, Matthijs %A Chau, Polo %A Vreeken, Jilles %A Shahaf, Dafna %A Faloutsos, Christos %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FFD5-E %R 10.1145/3298786 %7 2019 %D 2019 %J ACM Transactions on Knowledge Discovery from Data %V 13 %N 1 %Z sequence number: 13 %I ACM %C New York, NY %@ false
[46]
A. Yates and M. Unterkalmsteiner, “Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain,” in Advances in Information Retrieval (ECIR 2019), Cologne, Germany, 2019.
Export
BibTeX
@inproceedings{Yates_ECIR2019, TITLE = {Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain}, AUTHOR = {Yates, Andrew and Unterkalmsteiner, Michael}, LANGUAGE = {eng}, ISBN = {978-3-030-15711-1}, DOI = {10.1007/978-3-030-15712-8_28}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2019)}, EDITOR = {Azzopardi, Leif and Stein, Benno and Fuhr, Norbert and Mayr, Philipp and Hauff, Claudia and Hiemstra, Djoerd}, PAGES = {429--442}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11437}, ADDRESS = {Cologne, Germany}, }
Endnote
%0 Conference Proceedings %A Yates, Andrew %A Unterkalmsteiner, Michael %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain : %G eng %U http://hdl.handle.net/21.11116/0000-0004-029B-B %R 10.1007/978-3-030-15712-8_28 %D 2019 %B 41st European Conference on IR Research %Z date of event: 2019-04-14 - 2019-04-18 %C Cologne, Germany %B Advances in Information Retrieval %E Azzopardi, Leif; Stein, Benno; Fuhr, Norbert; Mayr, Philipp; Hauff, Claudia; Hiemstra, Djoerd %P 429 - 442 %I Springer %@ 978-3-030-15711-1 %B Lecture Notes in Computer Science %N 11437