Publications

2021
[1]
J. Ali, P. Lahoti, and K. P. Gummadi, “Accounting for Model Uncertainty in Algorithmic Discrimination,” in Fourth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society, Virtual Conference. (Accepted/in press)
Export
BibTeX
@inproceedings{Ali_AIES2021, TITLE = {Accounting for Model Uncertainty in Algorithmic Discrimination}, AUTHOR = {Ali, Junaid and Lahoti, Preethi and Gummadi, Krishna P.}, LANGUAGE = {eng}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Fourth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Ali, Junaid %A Lahoti, Preethi %A Gummadi, Krishna P. %+ Computer Graphics, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society %T Accounting for Model Uncertainty in Algorithmic Discrimination : %G eng %U http://hdl.handle.net/21.11116/0000-0008-72E3-7 %D 2021 %B Fourth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society %Z date of event: 2021-05-19 - 2021-05-21 %C Virtual Conference %B Fourth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society
[2]
K. Budhathoki, M. Boley, and J. Vreeken, “Rule Discovery for Exploratory Causal Reasoning,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2021), Virtual Conference. (Accepted/in press)
Export
BibTeX
@inproceedings{budhathoki:21:dice, TITLE = {Rule Discovery for Exploratory Causal Reasoning}, AUTHOR = {Budhathoki, Kailash and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, PUBLISHER = {SIAM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2021)}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Rule Discovery for Exploratory Causal Reasoning : %G eng %U http://hdl.handle.net/21.11116/0000-0008-2571-F %D 2021 %B SIAM International Conference on Data Mining %Z date of event: 2021-04-29 - 2021-05-01 %C Virtual Conference %B Proceedings of the SIAM International Conference on Data Mining %I SIAM
[3]
E. Chang, X. Shen, D. Zhu, V. Demberg, and H. Su, “Neural Data-to-Text Generation with LM-based Text Augmentation,” in EACL 2021, 16th Conference of the European Chapter of the Association for Computational Linguistics, Online. (Accepted/in press)
Export
BibTeX
@inproceedings{chang2021neural, TITLE = {Neural Data-to-Text Generation with {LM}-based Text Augmentation}, AUTHOR = {Chang, Ernie and Shen, Xiaoyu and Zhu, Dawei and Demberg, Vera and Su, Hui}, LANGUAGE = {eng}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {EACL 2021, 16th Conference of the European Chapter of the Association for Computational Linguistics}, ADDRESS = {Online}, }
Endnote
%0 Conference Proceedings %A Chang, Ernie %A Shen, Xiaoyu %A Zhu, Dawei %A Demberg, Vera %A Su, Hui %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Neural Data-to-Text Generation with LM-based Text Augmentation : %G eng %U http://hdl.handle.net/21.11116/0000-0008-149E-0 %D 2021 %B 16th Conference of the European Chapter of the Association for Computational Linguistics %Z date of event: 2021-04-19 - 2021-04-23 %C Online %B EACL 2021
[4]
J. Fischer, F. B. Ardakani, K. Kattler, J. Walter, and M. H. Schulz, “CpG Content-dependent Associations between Transcription Factors and Histone Modifications,” PLoS One, vol. 16, no. 4, 2021.
Export
BibTeX
@article{fischer:21:cpgtfhm, TITLE = {{CpG} content-dependent associations between transcription factors and histone modifications}, AUTHOR = {Fischer, Jonas and Ardakani, Fatemeh Behjati and Kattler, Kathrin and Walter, J{\"o}rn and Schulz, Marcel Holger}, LANGUAGE = {eng}, ISSN = {1932-6203}, DOI = {10.1371/journal.pone.0249985}, PUBLISHER = {Public Library of Science}, ADDRESS = {San Francisco, CA}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, JOURNAL = {PLoS One}, VOLUME = {16}, NUMBER = {4}, EID = {0249985}, }
Endnote
%0 Journal Article %A Fischer, Jonas %A Ardakani, Fatemeh Behjati %A Kattler, Kathrin %A Walter, Jörn %A Schulz, Marcel Holger %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society External Organizations External Organizations Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society %T CpG Content-dependent Associations between Transcription Factors and Histone Modifications : %G eng %U http://hdl.handle.net/21.11116/0000-0008-5602-5 %R 10.1371/journal.pone.0249985 %7 2021 %D 2021 %J PLoS One %V 16 %N 4 %Z sequence number: 0249985 %I Public Library of Science %C San Francisco, CA %@ false
[5]
A. Ghazimatin, S. Pramanik, R. Saha Roy, and G. Weikum, “ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models,” 2021. [Online]. Available: https://arxiv.org/abs/2102.09388. (arXiv: 2102.09388)
Abstract
System-provided explanations for recommendations are an important component towards transparent and trustworthy AI. In state-of-the-art research, this is a one-way signal, though, to improve user acceptance. In this paper, we turn the role of explanations around and investigate how they can contribute to enhancing the quality of generated recommendations themselves. We devise a human-in-the-loop framework, called ELIXIR, where user feedback on explanations is leveraged for pairwise learning of user preferences. ELIXIR leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors, overcoming sparseness by label propagation with item-similarity-based neighborhoods. Our framework is instantiated using generalized graph recommendation via Random Walk with Restart. Insightful experiments with a real user study show significant improvements in movie and book recommendations over item-level feedback.
Export
BibTeX
@online{Ghazimatin_2102.09388, TITLE = {{ELIXIR}: {L}earning from User Feedback on Explanations to Improve Recommender Models}, AUTHOR = {Ghazimatin, Azin and Pramanik, Soumajit and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2102.09388}, EPRINT = {2102.09388}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {System-provided explanations for recommendations are an important component towards transparent and trustworthy AI. In state-of-the-art research, this is a one-way signal, though, to improve user acceptance. In this paper, we turn the role of explanations around and investigate how they can contribute to enhancing the quality of generated recommendations themselves. We devise a human-in-the-loop framework, called ELIXIR, where user feedback on explanations is leveraged for pairwise learning of user preferences. ELIXIR leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors, overcoming sparseness by label propagation with item-similarity-based neighborhoods. Our framework is instantiated using generalized graph recommendation via Random Walk with Restart. Insightful experiments with a real user study show significant improvements in movie and book recommendations over item-level feedback.}, }
Endnote
%0 Report %A Ghazimatin, Azin %A Pramanik, Soumajit %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0309-B %U https://arxiv.org/abs/2102.09388 %D 2021 %X System-provided explanations for recommendations are an important component towards transparent and trustworthy AI. In state-of-the-art research, this is a one-way signal, though, to improve user acceptance. In this paper, we turn the role of explanations around and investigate how they can contribute to enhancing the quality of generated recommendations themselves. We devise a human-in-the-loop framework, called ELIXIR, where user feedback on explanations is leveraged for pairwise learning of user preferences. ELIXIR leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors, overcoming sparseness by label propagation with item-similarity-based neighborhoods. Our framework is instantiated using generalized graph recommendation via Random Walk with Restart. Insightful experiments with a real user study show significant improvements in movie and book recommendations over item-level feedback. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG
[6]
A. Ghazimatin, S. Pramanik, R. Saha Roy, and G. Weikum, “ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models,” in Proceedings of The Web Conference 2021 (WWW 2021), Ljubljana, Slovenia. (Accepted/in press)
Export
BibTeX
@inproceedings{Ghazimatin_WWW21, TITLE = {{ELIXIR}: {L}earning from User Feedback on Explanations to Improve Recommender Models}, AUTHOR = {Ghazimatin, Azin and Pramanik, Soumajit and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1145/3442381.3449848}, PUBLISHER = {ACM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The Web Conference 2021 (WWW 2021)}, ADDRESS = {Ljubljana, Slovenia}, }
Endnote
%0 Conference Proceedings %A Ghazimatin, Azin %A Pramanik, Soumajit %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0303-1 %R 10.1145/3442381.3449848 %D 2021 %B 30th The Web Conference %Z date of event: 2021-04-30 - %C Ljubljana, Slovenia %B Proceedings of The Web Conference 2021 %I ACM
[7]
A. Guimarães and G. Weikum, “X-Posts Explained: Analyzing and Predicting Controversial Contributions in Thematically Diverse Reddit Forums,” in Proceedings of the Fifteenth International Conference on Web and Social Media (ICWSM 2021), Atlanta, GA, USA. (Accepted/in press)
Export
BibTeX
@inproceedings{Guimaraes_ICWSM2021, TITLE = {X-Posts Explained: {A}nalyzing and Predicting Controversial Contributions in Thematically Diverse {R}eddit Forums}, AUTHOR = {Guimar{\~a}es, Anna and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {AAAI}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Fifteenth International Conference on Web and Social Media (ICWSM 2021)}, ADDRESS = {Atlanta, GA, USA}, }
Endnote
%0 Conference Proceedings %A Guimarães, Anna %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T X-Posts Explained: Analyzing and Predicting Controversial Contributions in Thematically Diverse Reddit Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0345-7 %D 2021 %B 15th International Conference on Web and Social Media %Z date of event: 2021-06-07 - 2021-06-10 %C Atlanta, GA, USA %B Proceedings of the Fifteenth International Conference on Web and Social Media %I AAAI
[8]
E. Heiter, J. Fischer, and J. Vreeken, “Factoring Out Prior Knowledge from Low-dimensional Embeddings,” 2021. [Online]. Available: https://arxiv.org/abs/2103.01828. (arXiv: 2103.01828)
Abstract
Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing high-dimensional data and therewith facilitate the discovery of interesting structure. Although they are widely used, they visualize data as is, rather than in light of the background knowledge we have about the data. What we already know, however, strongly determines what is novel and hence interesting. In this paper we propose two methods for factoring out prior knowledge in the form of distance matrices from low-dimensional embeddings. To factor out prior knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective in a principled way using Jensen-Shannon divergence. To factor out prior knowledge from any downstream embedding approach, we propose CONFETTI, in which we directly operate on the input distance matrices. Extensive experiments on both synthetic and real world data show that both methods work well, providing embeddings that exhibit meaningful structure that would otherwise remain hidden.
Export
BibTeX
@online{heiter:21:factoring, TITLE = {Factoring Out Prior Knowledge from Low-dimensional Embeddings}, AUTHOR = {Heiter, Edith and Fischer, Jonas and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2103.01828}, EPRINT = {2103.01828}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing high-dimensional data and therewith facilitate the discovery of interesting structure. Although they are widely used, they visualize data as is, rather than in light of the background knowledge we have about the data. What we already know, however, strongly determines what is novel and hence interesting. In this paper we propose two methods for factoring out prior knowledge in the form of distance matrices from low-dimensional embeddings. To factor out prior knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective in a principled way using Jensen-Shannon divergence. To factor out prior knowledge from any downstream embedding approach, we propose CONFETTI, in which we directly operate on the input distance matrices. Extensive experiments on both synthetic and real world data show that both methods work well, providing embeddings that exhibit meaningful structure that would otherwise remain hidden.}, }
Endnote
%0 Report %A Heiter, Edith %A Fischer, Jonas %A Vreeken, Jilles %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Factoring Out Prior Knowledge from Low-dimensional Embeddings : %G eng %U http://hdl.handle.net/21.11116/0000-0008-16ED-5 %U https://arxiv.org/abs/2103.01828 %D 2021 %X Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing high-dimensional data and therewith facilitate the discovery of interesting structure. Although they are widely used, they visualize data as is, rather than in light of the background knowledge we have about the data. What we already know, however, strongly determines what is novel and hence interesting. In this paper we propose two methods for factoring out prior knowledge in the form of distance matrices from low-dimensional embeddings. To factor out prior knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective in a principled way using Jensen-Shannon divergence. To factor out prior knowledge from any downstream embedding approach, we propose CONFETTI, in which we directly operate on the input distance matrices. Extensive experiments on both synthetic and real world data show that both methods work well, providing embeddings that exhibit meaningful structure that would otherwise remain hidden. %K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
[9]
V. T. Ho, K. Pal, and G. Weikum, “QuTE: Answering Quantity Queries from Web Tables,” in SIGMOD 2021, Xi’an, Shaanxi, China. (Accepted/in press)
Export
BibTeX
@inproceedings{Thinh_SIG21, TITLE = {Qu{TE}: Answering Quantity Queries from Web Tables}, AUTHOR = {Ho, Vinh Thinh and Pal, Koninika and Weikum, Gerhard}, LANGUAGE = {eng}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGMOD 2021}, ADDRESS = {Xi'an, Shaanxi, China}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Pal, Koninika %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T QuTE: Answering Quantity Queries from Web Tables : %G eng %U http://hdl.handle.net/21.11116/0000-0008-052E-0 %D 2021 %B SIGMOD 2021 %Z date of event: 2021-06-19 - 2021-06-25 %C Xi'an, Shaanxi, China %B SIGMOD 2021
[10]
V. T. Ho, K. Pal, S. Razniewski, K. Berberich, and G. Weikum, “Extracting Contextualized Quantity Facts from Web Tables,” in Proceedings of The Web Conference 2021 (WWW 2021), Ljubljana, Slovenia. (Accepted/in press)
Export
BibTeX
@inproceedings{Thinh_WWW21, TITLE = {Extracting Contextualized Quantity Facts from Web Tables}, AUTHOR = {Ho, Vinh Thinh and Pal, Koninika and Razniewski, Simon and Berberich, Klaus and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The Web Conference 2021 (WWW 2021)}, ADDRESS = {Ljubljana, Slovenia}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Pal, Koninika %A Razniewski, Simon %A Berberich, Klaus %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Extracting Contextualized Quantity Facts from Web Tables : %G eng %U http://hdl.handle.net/21.11116/0000-0008-04A0-E %D 2021 %B 30th The Web Conference %Z date of event: 2021-04-30 - %C Ljubljana, Slovenia %B Proceedings of The Web Conference 2021 %I ACM
[11]
M. Kaiser, R. Saha Roy, and G. Weikum, “Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Online. (Accepted/in press)
Export
BibTeX
@inproceedings{kaiser2021reinforcement, TITLE = {Reinforcement Learning from Reformulations in~Conversational Question Answering over Knowledge Graphs}, AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval}, ADDRESS = {Online}, }
Endnote
%0 Conference Proceedings %A Kaiser, Magdalena %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0008-513E-8 %D 2021 %B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2021-07-11 - 2021-07-15 %C Online %B Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval %I ACM
[12]
J. Kalofolias, P. Welke, and J. Vreeken, “SUSAN: The Structural Similarity Random Walk Kernel,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2021), Virtual Conference. (Accepted/in press)
Export
BibTeX
@inproceedings{kalofolias:21:susan, TITLE = {{SUSAN}: The Structural Similarity Random Walk Kernel}, AUTHOR = {Kalofolias, Janis and Welke, Pascal and Vreeken, Jilles}, LANGUAGE = {eng}, PUBLISHER = {SIAM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2021)}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Kalofolias, Janis %A Welke, Pascal %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T SUSAN: The Structural Similarity Random Walk Kernel : %G eng %U http://hdl.handle.net/21.11116/0000-0008-26C9-B %D 2021 %B SIAM International Conference on Data Mining %Z date of event: 2021-04-29 - 2021-05-01 %C Virtual Conference %B Proceedings of the SIAM International Conference on Data Mining %I SIAM
[13]
A. Marx, L. Yang, and M. van Leeuwen, “Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multidimensional Adaptive Histograms,” in Proceedings of the SIAM International Conference on Data Mining (SDM 2021), Virtual Conference. (Accepted/in press)
Export
BibTeX
@inproceedings{marx:20:myl, TITLE = {Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multidimensional Adaptive Histograms}, AUTHOR = {Marx, Alexander and Yang, Lincen and van Leeuwen, Matthijs}, LANGUAGE = {eng}, PUBLISHER = {SIAM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the SIAM International Conference on Data Mining (SDM 2021)}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Yang, Lincen %A van Leeuwen, Matthijs %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multidimensional Adaptive Histograms : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0BC7-C %D 2021 %B SIAM International Conference on Data Mining %Z date of event: 2021-04-29 - 2021-05-01 %C Virtual Conference %B Proceedings of the SIAM International Conference on Data Mining %I SIAM
[14]
A. Marx, A. Gretton, and J. M. Mooij, “A Weaker Faithfulness Assumption based on Triple Interactions,” 2021. [Online]. Available: https://arxiv.org/abs/2010.14265. (arXiv: 2010.14265)
Abstract
One of the core assumptions in causal discovery is the faithfulness assumption---i.e. assuming that independencies found in the data are due to separations in the true causal graph. This assumption can, however, be violated in many ways, including xor connections, deterministic functions or cancelling paths. In this work, we propose a weaker assumption that we call 2-adjacency faithfulness. In contrast to adjacency faithfulness, which assumes that there is no conditional independence between each pair of variables that are connected in the causal graph, we only require no conditional independence between a node and a subset of its Markov blanket that can contain up to two nodes. Equivalently, we adapt orientation faithfulness to this setting. We further propose a sound orientation rule for causal discovery that applies under weaker assumptions. As a proof of concept, we derive a modified Grow and Shrink algorithm that recovers the Markov blanket of a target node and prove its correctness under strictly weaker assumptions than the standard faithfulness assumption.
Export
BibTeX
@online{Marxarxiv21, TITLE = {A Weaker Faithfulness Assumption based on Triple Interactions}, AUTHOR = {Marx, Alexander and Gretton, Arthur and Mooij, Joris M.}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2010.14265}, EPRINT = {2010.14265}, EPRINTTYPE = {arXiv}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {One of the core assumptions in causal discovery is the faithfulness assumption---i.e. assuming that independencies found in the data are due to separations in the true causal graph. This assumption can, however, be violated in many ways, including xor connections, deterministic functions or cancelling paths. In this work, we propose a weaker assumption that we call 2-adjacency faithfulness. In contrast to adjacency faithfulness, which assumes that there is no conditional independence between each pair of variables that are connected in the causal graph, we only require no conditional independence between a node and a subset of its Markov blanket that can contain up to two nodes. Equivalently, we adapt orientation faithfulness to this setting. We further propose a sound orientation rule for causal discovery that applies under weaker assumptions. As a proof of concept, we derive a modified Grow and Shrink algorithm that recovers the Markov blanket of a target node and prove its correctness under strictly weaker assumptions than the standard faithfulness assumption.}, }
Endnote
%0 Report %A Marx, Alexander %A Gretton, Arthur %A Mooij, Joris M. %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T A Weaker Faithfulness Assumption based on Triple Interactions : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0BCE-5 %U https://arxiv.org/abs/2010.14265 %D 2021 %X One of the core assumptions in causal discovery is the faithfulness assumption---i.e. assuming that independencies found in the data are due to separations in the true causal graph. This assumption can, however, be violated in many ways, including xor connections, deterministic functions or cancelling paths. In this work, we propose a weaker assumption that we call 2-adjacency faithfulness. In contrast to adjacency faithfulness, which assumes that there is no conditional independence between each pair of variables that are connected in the causal graph, we only require no conditional independence between a node and a subset of its Markov blanket that can contain up to two nodes. Equivalently, we adapt orientation faithfulness to this setting. We further propose a sound orientation rule for causal discovery that applies under weaker assumptions. As a proof of concept, we derive a modified Grow and Shrink algorithm that recovers the Markov blanket of a target node and prove its correctness under strictly weaker assumptions than the standard faithfulness assumption. %K Statistics, Machine Learning, stat.ML,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG
[15]
O. Mian, A. Marx, and J. Vreeken, “Discovering Fully Oriented Causal Networks,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, Vancouver, Canada. (Accepted/in press)
Export
BibTeX
@inproceedings{mian:20:globe, TITLE = {Discovering Fully Oriented Causal Networks}, AUTHOR = {Mian, Osman and Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, PUBLISHER = {AAAI}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Thirty-Fifth AAAI Conference on Artificial Intelligence}, ADDRESS = {Vancouver, Canada}, }
Endnote
%0 Conference Proceedings %A Mian, Osman %A Marx, Alexander %A Vreeken, Jilles %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Fully Oriented Causal Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0BCB-8 %D 2021 %B The Thirty-Fifth Conference on Artificial Intelligence %Z date of event: 2021-02-02 - 2021-02-09 %C Vancouver, Canada %B Thirty-Fifth AAAI Conference on Artificial Intelligence %I AAAI
[16]
S. Nag Chowdhury, S. Razniewski, and G. Weikum, “SANDI: Story-and-Images Alignment,” in EACL 2021, 16th Conference of the European Chapter of the Association for Computational Linguistics, Online. (Accepted/in press)
Export
BibTeX
@inproceedings{Thinh_EACL21, TITLE = {{SANDI}: Story-and-Images Alignment}, AUTHOR = {Nag Chowdhury, Sreyasi and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {EACL 2021, 16th Conference of the European Chapter of the Association for Computational Linguistics}, ADDRESS = {Online}, }
Endnote
%0 Conference Proceedings %A Nag Chowdhury, Sreyasi %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T SANDI: Story-and-Images Alignment : %G eng %U http://hdl.handle.net/21.11116/0000-0008-04A2-C %D 2021 %B 16th Conference of the European Chapter of the Association for Computational Linguistics %Z date of event: 2021-04-19 - 2021-04-23 %C Online %B EACL 2021
[17]
S. Nag Chowdhury, “Exploiting Image-Text Synergy for Contextual Image Captioning,” in LANTERN 2021, The First Workshop Beyond Vision and LANguage: inTEgrating Real-world kNowledge, Virtual. (Accepted/in press)
Export
BibTeX
@inproceedings{Chod_ECAL2021, TITLE = {Exploiting Image-Text Synergy for Contextual Image Captioning}, AUTHOR = {Nag Chowdhury, Sreyasi}, LANGUAGE = {eng}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {LANTERN 2021, The First Workshop Beyond Vision and LANguage: inTEgrating Real-world kNowledge}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Nag Chowdhury, Sreyasi %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exploiting Image-Text Synergy for Contextual Image Captioning : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0E60-D %D 2021 %B The First Workshop Beyond Vision and LANguage: inTEgrating Real-world kNowledge %Z date of event: 2021-04-20 - 2021-04-20 %C Virtual %B LANTERN 2021
[18]
T.-P. Nguyen, S. Razniewski, and G. Weikum, “Advanced Semantics for Commonsense Knowledge Extraction,” in Proceedings of The Web Conference 2021 (WWW 2021), Ljubljana, Slovenia. (Accepted/in press)
Export
BibTeX
@inproceedings{Nguyen_WWW21, TITLE = {Advanced Semantics for Commonsense Knowledge Extraction}, AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The Web Conference 2021 (WWW 2021)}, ADDRESS = {Ljubljana, Slovenia}, }
Endnote
%0 Conference Proceedings %A Nguyen, Tuan-Phong %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Advanced Semantics for Commonsense Knowledge Extraction : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0196-D %D 2021 %B 30th The Web Conference %Z date of event: 2021-04-30 - %C Ljubljana, Slovenia %B Proceedings of The Web Conference 2021 %I ACM
[19]
J. Romero, “Pyformlang: An Educational Library for Formal Language Manipulation,” in SIGCSE ’21, The 52nd ACM Technical Symposium on Computer Science Education, Virtual Event, USA. (Accepted/in press)
Export
BibTeX
@inproceedings{Romero_SIGCSE21, TITLE = {Pyformlang: {An} Educational Library for Formal Language Manipulation}, AUTHOR = {Romero, Julien}, LANGUAGE = {eng}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGCSE '21, The 52nd ACM Technical Symposium on Computer Science Education}, ADDRESS = {Virtual Event, USA}, }
Endnote
%0 Conference Proceedings %A Romero, Julien %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Pyformlang: An Educational Library for Formal Language Manipulation : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F836-5 %D 2021 %B The 52nd ACM Technical Symposium on Computer Science Education %Z date of event: 2021-03-13 - 2021-03-20 %C Virtual Event, USA %B SIGCSE '21
[20]
A. Tigunova, P. Mirza, A. Yates, and G. Weikum, “Exploring Personal Knowledge Extraction from Conversations with CHARM,” in WSDM ’21, 14th International Conference on Web Search and Data Mining, Jerusalem, Israel (Online). (Accepted/in press)
Export
BibTeX
@inproceedings{Tigunova_WSDM21, TITLE = {Exploring Personal Knowledge Extraction from Conversations with {CHARM}}, AUTHOR = {Tigunova, Anna and Mirza, Paramita and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {WSDM '21, 14th International Conference on Web Search and Data Mining}, ADDRESS = {Jerusalem, Israel (Online)}, }
Endnote
%0 Conference Proceedings %A Tigunova, Anna %A Mirza, Paramita %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exploring Personal Knowledge Extraction from Conversations with CHARM : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F850-7 %D 2021 %B 14th International Conference on Web Search and Data Mining %Z date of event: 2021-03-08 - 2021-03-12 %C Jerusalem, Israel (Online) %B WSDM '21 %I ACM
[21]
G. H. Torbati, A. Yates, and G. Weikum, “You Get What You Chat: Using Conversations to Personalize Search-based Recommendations,” in Advances in Information Retrieval (ECIR 2021), Lucca, Italy (Online Event), 2021.
Export
BibTeX
@inproceedings{Torbati_ECIR2021, TITLE = {You Get What You Chat: {U}sing Conversations to Personalize Search-based Recommendations}, AUTHOR = {Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-72112-1}, DOI = {10.1007/978-3-030-72113-8_14}, PUBLISHER = {Springer}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, DATE = {2021}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2021)}, EDITOR = {Hiemstra, Djoerd and Moens, Marie-Francine and Mothe, Josiane and Perego, Raffaele and Potthast, Martin and Sebastiani, Fabrizio}, PAGES = {207--223}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {12656}, ADDRESS = {Lucca, Italy (Online Event)}, }
Endnote
%0 Conference Proceedings %A Torbati, Ghazaleh Haratinezhad %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T You Get What You Chat: Using Conversations to Personalize Search-based Recommendations : %G eng %U http://hdl.handle.net/21.11116/0000-0007-ECA2-8 %R 10.1007/978-3-030-72113-8_14 %D 2021 %B 43rd European Conference on IR Research %Z date of event: 2021-03-28 - 2021-04-01 %C Lucca, Italy (Online Event) %B Advances in Information Retrieval %E Hiemstra, Djoerd; Moens, Marie-Francine; Mothe, Josiane; Perego, Raffaele; Potthast, Martin; Sebastiani, Fabrizio %P 207 - 223 %I Springer %@ 978-3-030-72112-1 %B Lecture Notes in Computer Science %N 12656
[22]
K. H. Tran, A. Ghazimatin, and R. Saha Roy, “Counterfactual Explanations for Neural Recommenders,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Online. (Accepted/in press)
Export
BibTeX
@inproceedings{tran2021counterfactual, TITLE = {Counterfactual Explanations for Neural Recommenders}, AUTHOR = {Tran, Khanh Hiep and Ghazimatin, Azin and Saha Roy, Rishiraj}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2021}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval}, ADDRESS = {Online}, }
Endnote
%0 Conference Proceedings %A Tran, Khanh Hiep %A Ghazimatin, Azin %A Saha Roy, Rishiraj %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Counterfactual Explanations for Neural Recommenders : %G eng %U http://hdl.handle.net/21.11116/0000-0008-5140-4 %D 2021 %B 44th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2021-07-11 - 2021-07-15 %C Online %B Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval %I ACM
2020
[23]
H. Arnaout, S. Razniewski, and G. Weikum, “Negative Statements Considered Useful,” 2020. [Online]. Available: http://arxiv.org/abs/2001.04425. (arXiv: 2001.04425)
Abstract
Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities.
Export
BibTeX
@online{Arnaout_arXiv2001.04425, TITLE = {Negative Statements Considered Useful}, AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/2001.04425}, EPRINT = {2001.04425}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities.}, }
Endnote
%0 Report %A Arnaout, Hiba %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Negative Statements Considered Useful : %G eng %U http://hdl.handle.net/21.11116/0000-0005-821F-6 %U http://arxiv.org/abs/2001.04425 %D 2020 %X Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Databases, cs.DB
[24]
H. Arnaout, S. Razniewski, and G. Weikum, “Enriching Knowledge Bases with Interesting Negative Statements,” in Automated Knowledge Base Construction (AKBC 2020), Virtual Conference, 2020.
Export
BibTeX
@inproceedings{Arnaout_AKBC2020, TITLE = {Enriching Knowledge Bases with Interesting Negative Statements}, AUTHOR = {Arnaout, Hiba and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.24432/C5101K}, PUBLISHER = {OpenReview}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Automated Knowledge Base Construction (AKBC 2020)}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Arnaout, Hiba %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Enriching Knowledge Bases with Interesting Negative Statements : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EBC9-E %R 10.24432/C5101K %D 2020 %B 2nd Conference on Automated Knowledge Base Construction %Z date of event: 2020-06-22 - 2020-06-24 %C Virtual Conference %B Automated Knowledge Base Construction %I OpenReview %U https://openreview.net/forum?id=pSLmyZKaS
[25]
K. Balog, V. Setty, C. Lioma, Y. Liu, M. Zhang, and K. Berberich, Eds., ICTIR ’20. ACM, 2020.
Export
BibTeX
@proceedings{Balog_ICTIR20, TITLE = {ICTIR '20, ACM SIGIR International Conference on Theory of Information Retrieval}, EDITOR = {Balog, Krisztian and Setty, Vinay and Lioma, Christina and Liu, Yiqun and Zhang, Min and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-8067-6}, DOI = {10.1145/3409256}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ADDRESS = {Virtual Event, Norway}, }
Endnote
%0 Conference Proceedings %E Balog, Krisztian %E Setty, Vinay %E Lioma, Christina %E Liu, Yiqun %E Zhang, Min %E Berberich, Klaus %+ External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ICTIR '20 : Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval %G eng %U http://hdl.handle.net/21.11116/0000-0008-041D-4 %R 10.1145/3409256 %@ 978-1-4503-8067-6 %I ACM %D 2020 %B ACM SIGIR International Conference on Theory of Information Retrieval %Z date of event: 2020-09-14 - 2020-09-17 %D 2020 %C Virtual Event, Norway
[26]
C. Belth, X. Zheng, J. Vreeken, and D. Koutra, “What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization,” in Proceedings of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.
Export
BibTeX
@inproceedings{belth:20:kgist, TITLE = {What is Normal, What is Strange, and What is Missing in a Knowledge Graph: {U}nified Characterization via Inductive Summarization}, AUTHOR = {Belth, Caleb and Zheng, Xinyi and Vreeken, Jilles and Koutra, Danai}, LANGUAGE = {eng}, ISBN = {978-1-4503-7023-3}, DOI = {10.1145/3366423.3380189}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2020)}, EDITOR = {Huang, Yennun and King, Irwin and Liu, Tie-Yan and van Steen, Maarten}, PAGES = {1115--1126}, ADDRESS = {Taipei, Taiwan}, }
Endnote
%0 Conference Proceedings %A Belth, Caleb %A Zheng, Xinyi %A Vreeken, Jilles %A Koutra, Danai %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0008-253F-9 %R 10.1145/3366423.3380189 %D 2020 %B The World Wide Web Conference %Z date of event: 2020-04-20 - 2020-04-24 %C Taipei, Taiwan %B Proceedings of The World Wide Web Conference %E Huang, Yennun; King, Irwin; Liu, Tie-Yan; van Steen, Maarten %P 1115 - 1126 %I ACM %@ 978-1-4503-7023-3
[27]
J. J. Benjamin, C. Müller-Birn, and S. Razniewski, “Examining the Impact of Algorithm Awareness on Wikidata’s Recommender System Recoin,” 2020. [Online]. Available: https://arxiv.org/abs/2009.09049. (arXiv: 2009.09049)
Abstract
The global infrastructure of the Web, designed as an open and transparent system, has a significant impact on our society. However, algorithmic systems of corporate entities that neglect those principles increasingly populated the Web. Typical representatives of these algorithmic systems are recommender systems that influence our society both on a scale of global politics and during mundane shopping decisions. Recently, such recommender systems have come under critique for how they may strengthen existing or even generate new kinds of biases. To this end, designers and engineers are increasingly urged to make the functioning and purpose of recommender systems more transparent. Our research relates to the discourse of algorithm awareness, that reconsiders the role of algorithm visibility in interface design. We conducted online experiments with 105 participants using MTurk for the recommender system Recoin, a gadget for Wikidata. In these experiments, we presented users with one of a set of three different designs of Recoin's user interface, each of them exhibiting a varying degree of explainability and interactivity. Our findings include a positive correlation between comprehension of and trust in an algorithmic system in our interactive redesign. However, our results are not conclusive yet, and suggest that the measures of comprehension, fairness, accuracy and trust are not yet exhaustive for the empirical study of algorithm awareness. Our qualitative insights provide a first indication for further measures. Our study participants, for example, were less concerned with the details of understanding an algorithmic calculation than with who or what is judging the result of the algorithm.
Export
BibTeX
@online{Benjamin2009.09049, TITLE = {Examining the Impact of Algorithm Awareness on {W}ikidata's Recommender System Recoin}, AUTHOR = {Benjamin, Jesse Josua and M{\"u}ller-Birn, Claudia and Razniewski, Simon}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2009.09049}, EPRINT = {2009.09049}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The global infrastructure of the Web, designed as an open and transparent system, has a significant impact on our society. However, algorithmic systems of corporate entities that neglect those principles increasingly populated the Web. Typical representatives of these algorithmic systems are recommender systems that influence our society both on a scale of global politics and during mundane shopping decisions. Recently, such recommender systems have come under critique for how they may strengthen existing or even generate new kinds of biases. To this end, designers and engineers are increasingly urged to make the functioning and purpose of recommender systems more transparent. Our research relates to the discourse of algorithm awareness, that reconsiders the role of algorithm visibility in interface design. We conducted online experiments with 105 participants using MTurk for the recommender system Recoin, a gadget for Wikidata. In these experiments, we presented users with one of a set of three different designs of Recoin's user interface, each of them exhibiting a varying degree of explainability and interactivity. Our findings include a positive correlation between comprehension of and trust in an algorithmic system in our interactive redesign. However, our results are not conclusive yet, and suggest that the measures of comprehension, fairness, accuracy and trust are not yet exhaustive for the empirical study of algorithm awareness. Our qualitative insights provide a first indication for further measures. Our study participants, for example, were less concerned with the details of understanding an algorithmic calculation than with who or what is judging the result of the algorithm.}, }
Endnote
%0 Report %A Benjamin, Jesse Josua %A Müller-Birn, Claudia %A Razniewski, Simon %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Examining the Impact of Algorithm Awareness on Wikidata's Recommender System Recoin : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0661-4 %U https://arxiv.org/abs/2009.09049 %D 2020 %X The global infrastructure of the Web, designed as an open and transparent system, has a significant impact on our society. However, algorithmic systems of corporate entities that neglect those principles increasingly populated the Web. Typical representatives of these algorithmic systems are recommender systems that influence our society both on a scale of global politics and during mundane shopping decisions. Recently, such recommender systems have come under critique for how they may strengthen existing or even generate new kinds of biases. To this end, designers and engineers are increasingly urged to make the functioning and purpose of recommender systems more transparent. Our research relates to the discourse of algorithm awareness, that reconsiders the role of algorithm visibility in interface design. We conducted online experiments with 105 participants using MTurk for the recommender system Recoin, a gadget for Wikidata. In these experiments, we presented users with one of a set of three different designs of Recoin's user interface, each of them exhibiting a varying degree of explainability and interactivity. Our findings include a positive correlation between comprehension of and trust in an algorithmic system in our interactive redesign. However, our results are not conclusive yet, and suggest that the measures of comprehension, fairness, accuracy and trust are not yet exhaustive for the empirical study of algorithm awareness. Our qualitative insights provide a first indication for further measures. Our study participants, for example, were less concerned with the details of understanding an algorithmic calculation than with who or what is judging the result of the algorithm. %K Computer Science, Human-Computer Interaction, cs.HC,Computer Science, Computers and Society, cs.CY,Computer Science, Digital Libraries, cs.DL
[28]
A. Bhattacharya, S. Natarajan, and R. Saha Roy, Eds., Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. ACM, 2020.
Export
BibTeX
@proceedings{SahaRoy_CoDSCOMAD20, TITLE = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020)}, EDITOR = {Bhattacharya, Arnab and Natarajan, Sriaam and Saha Roy, Rishiraj}, LANGUAGE = {eng}, ISBN = {978-1-4503-7738-6}, DOI = {10.1145/3371158}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ADDRESS = {Hyderabad, India}, }
Endnote
%0 Conference Proceedings %E Bhattacharya, Arnab %E Natarajan, Sriaam %E Saha Roy, Rishiraj %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Proceedings of the 7th ACM IKDD CoDS and 25th COMAD : %G eng %U http://hdl.handle.net/21.11116/0000-0008-09CF-6 %R 10.1145/3371158 %@ 978-1-4503-7738-6 %I ACM %D 2020 %B ACM India Joint International Conferenceon Data Science and Management of Data %Z date of event: 2020-01-05 - 2020-01-07 %D 2020 %C Hyderabad, India
[29]
A. J. Biega, J. Schmidt, and R. Saha Roy, “Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions,” in Advances in Information Retrieval (ECIR 2020), Lisbon, Portugal, 2020.
Export
BibTeX
@inproceedings{Biega_ECIR2020, TITLE = {Towards Query Logs for Privacy Studies: {O}n Deriving Search Queries from Questions}, AUTHOR = {Biega, Asia J. and Schmidt, Jana and Saha Roy, Rishiraj}, LANGUAGE = {eng}, ISBN = {978-3-030-45441-8}, DOI = {10.1007/978-3-030-45442-5_14}, PUBLISHER = {Springer}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2020)}, EDITOR = {Jose, Joemon M. and Yilmaz, Emine and Magalh{\~a}es, Jo{\~a}o and Castells, Pablo and Ferro, Nicola and Silva, M{\'a}rio J. and Martins, Fl{\'a}vio}, PAGES = {110--117}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {12036}, ADDRESS = {Lisbon, Portugal}, }
Endnote
%0 Conference Proceedings %A Biega, Asia J. %A Schmidt, Jana %A Saha Roy, Rishiraj %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions : %G eng %U http://hdl.handle.net/21.11116/0000-0008-02FD-9 %R 10.1007/978-3-030-45442-5_14 %D 2020 %B 42nd European Conference on IR Research %Z date of event: 2020-04-14 - 2020-04-17 %C Lisbon, Portugal %B Advances in Information Retrieval %E Jose, Joemon M.; Yilmaz, Emine; Magalhães, João; Castells, Pablo; Ferro, Nicola; Silva, Mário J.; Martins, Flávio %P 110 - 117 %I Springer %@ 978-3-030-45441-8 %B Lecture Notes in Computer Science %N 12036
[30]
A. J. Biega, J. Schmidt, and R. Saha Roy, “Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions,” 2020. [Online]. Available: https://arxiv.org/abs/2004.02023. (arXiv: 2004.02023)
Abstract
Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.
Export
BibTeX
@online{Biega2004.02023, TITLE = {Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions}, AUTHOR = {Biega, Asia J. and Schmidt, Jana and Saha Roy, Rishiraj}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2004.02023}, EPRINT = {2004.02023}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.}, }
Endnote
%0 Report %A Biega, Asia J. %A Schmidt, Jana %A Saha Roy, Rishiraj %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions : %G eng %U http://hdl.handle.net/21.11116/0000-0008-09C7-E %U https://arxiv.org/abs/2004.02023 %D 2020 %X Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding. %K Computer Science, Information Retrieval, cs.IR
[31]
K. Budhathoki, “Causal Inference on Discrete Data,” Universität des Saarlandes, Saarbrücken, 2020.
Export
BibTeX
@phdthesis{BudDiss_2020, TITLE = {Causal Inference on Discrete Data}, AUTHOR = {Budhathoki, Kailash}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-329528}, DOI = {10.22028/D291-32952}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, }
Endnote
%0 Thesis %A Budhathoki, Kailash %Y Vreeken, Jilles %A referee: Weikum, Gerhard %A referee: Heskes, Tom %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Causal Inference on Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FE73-A %R 10.22028/D291-32952 %U urn:nbn:de:bsz:291--ds-329528 %I Universität des Saarlandes %C Saarbrücken %D 2020 %P 171 p. %V phd %9 phd %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/30501
[32]
D. Calvanes, J. Corman, D. Lanti, and S. Razniewski, “Counting Query Answers over a DL-Lite Knowledge Base (extended version),” 2020. [Online]. Available: https://arxiv.org/abs/2005.05886. (arXiv: 2005.05886)
Abstract
Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with counting.
Export
BibTeX
@online{Razniewskiarxiv2020, TITLE = {Counting Query Answers over a {DL}-Lite Knowledge Base (extended version)}, AUTHOR = {Calvanes, Diego and Corman, Julien and Lanti, Davide and Razniewski, Simon}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2005.05886}, EPRINT = {2005.05886}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with counting.}, }
Endnote
%0 Report %A Calvanes, Diego %A Corman, Julien %A Lanti, Davide %A Razniewski, Simon %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Counting Query Answers over a DL-Lite Knowledge Base (extended version) : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FF5A-6 %U https://arxiv.org/abs/2005.05886 %D 2020 %X Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with counting. %K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI
[33]
D. Calvanes, J. Corman, D. Lanti, and S. Razniewski, “Counting Query Answers over a DL-Lite Knowledge Base,” in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI 2020), Yokohama, Japan (Virtual), 2020.
Abstract
Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with counting.
Export
BibTeX
@inproceedings{RazniewskiIJCAI2020, TITLE = {Counting Query Answers over a {$DL-Lite$} Knowledge Base}, AUTHOR = {Calvanes, Diego and Corman, Julien and Lanti, Davide and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-0-9992411-6-5}, DOI = {10.24963/ijcai.2020/230}, PUBLISHER = {IJCAI}, YEAR = {2021}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with counting.}, BOOKTITLE = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI 2020)}, EDITOR = {Bessiere, Christian}, PAGES = {1658--1666}, ADDRESS = {Yokohama, Japan (Virtual)}, }
Endnote
%0 Conference Proceedings %A Calvanes, Diego %A Corman, Julien %A Lanti, Davide %A Razniewski, Simon %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Counting Query Answers over a DL-Lite Knowledge Base : %G eng %U http://hdl.handle.net/21.11116/0000-0008-009E-6 %R 10.24963/ijcai.2020/230 %D 2020 %B Twenty-Ninth International Joint Conference on Artificial Intelligence %Z date of event: 2021-01-07 - 2021-01-15 %C Yokohama, Japan (Virtual) %X Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with counting. %K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI %B Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence %E Bessiere, Christian %P 1658 - 1666 %I IJCAI %@ 978-0-9992411-6-5
[34]
D. Calvanese, J. Corman, D. Lanti, and S. Razniewski, “Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions,” in Proceedings of the 33rd International Workshop on Description Logics (DL 2020), Rhodes, Greece (Virtual Event), 2020.
Export
BibTeX
@inproceedings{Calvanese_DL2020, TITLE = {Rewriting Count Queries over {DL}-Lite {TBoxes} with Number Restrictions}, AUTHOR = {Calvanese, Diego and Corman, Julien and Lanti, Davide and Razniewski, Simon}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {http://ceur-ws.org/Vol-2663/paper-7.pdf; urn:nbn:de:0074-2663-4}, PUBLISHER = {ceur-ws.org}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 33rd International Workshop on Description Logics (DL 2020)}, EDITOR = {Borgwardt, Stefan and Meyer, Thomas}, EID = {7}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2663}, ADDRESS = {Rhodes, Greece (Virtual Event)}, }
Endnote
%0 Conference Proceedings %A Calvanese, Diego %A Corman, Julien %A Lanti, Davide %A Razniewski, Simon %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0606-B %U http://ceur-ws.org/Vol-2663/paper-7.pdf %D 2020 %B 33rd International Workshop on Description Logics %Z date of event: 2020-09-12 - 2020-09-14 %C Rhodes, Greece (Virtual Event) %B Proceedings of the 33rd International Workshop on Description Logics %E Borgwardt , Stefan; Meyer, Thomas %Z sequence number: 7 %I ceur-ws.org %B CEUR Workshop Proceedings %N 2663 %@ false
[35]
Y. Chalier, S. Razniewski, and G. Weikum, “Joint Reasoning for Multi-Faceted Commonsense Knowledge,” 2020. [Online]. Available: http://arxiv.org/abs/2001.04170. (arXiv: 2001.04170)
Abstract
Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge. Results are available at https://dice.mpi-inf.mpg.de.
Export
BibTeX
@online{Chalier_arXiv2001.04170, TITLE = {Joint Reasoning for Multi-Faceted Commonsense Knowledge}, AUTHOR = {Chalier, Yohan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/2001.04170}, EPRINT = {2001.04170}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge. Results are available at https://dice.mpi-inf.mpg.de.}, }
Endnote
%0 Report %A Chalier, Yohan %A Razniewski, Simon %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Joint Reasoning for Multi-Faceted Commonsense Knowledge : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8226-D %U http://arxiv.org/abs/2001.04170 %D 2020 %X Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge. Results are available at https://dice.mpi-inf.mpg.de. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR
[36]
Y. Chalier, S. Razniewski, and G. Weikum, “Joint Reasoning for Multi-Faceted Commonsense Knowledge,” in Automated Knowledge Base Construction (AKBC 2020), Virtual Conference, 2020.
Export
BibTeX
@inproceedings{Chalier_AKBC2020, TITLE = {Joint Reasoning for Multi-Faceted Commonsense Knowledge}, AUTHOR = {Chalier, Yohan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.24432/C58G6G}, PUBLISHER = {OpenReview}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Automated Knowledge Base Construction (AKBC 2020)}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Chalier, Yohan %A Razniewski, Simon %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Joint Reasoning for Multi-Faceted Commonsense Knowledge : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EBCF-8 %R 10.24432/C58G6G %D 2020 %B 2nd Conference on Automated Knowledge Base Construction %Z date of event: 2020-06-22 - 2020-06-24 %C Virtual Conference %B Automated Knowledge Base Construction %I OpenReview %U https://openreview.net/forum?id=QnPV72SZVt
[37]
Y. Chalier, S. Razniewski, and G. Weikum, “Dice: A Joint Reasoning Framework for Multi-Faceted Commonsense Knowledge,” in ISWC 2020 Posters, Demos, and Industry Tracks, Globally Online, 2020.
Export
BibTeX
@inproceedings{Chalier_ISCW20, TITLE = {Dice: {A} Joint Reasoning Framework for Multi-Faceted Commonsense Knowledge}, AUTHOR = {Chalier, Yohan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {http://ceur-ws.org/Vol-2721/paper482.pdf; urn:nbn:de:0074-2721-6}, PUBLISHER = {ceur-ws.org}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ISWC 2020 Posters, Demos, and Industry Tracks}, EDITOR = {Taylor, Kerry and Goncalves, Rafael and Lecue, Freddy and Yan, Jun}, PAGES = {16--20}, EID = {482}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2721}, ADDRESS = {Globally Online}, }
Endnote
%0 Conference Proceedings %A Chalier, Yohan %A Razniewski, Simon %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Dice: A Joint Reasoning Framework for Multi-Faceted Commonsense Knowledge : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F132-0 %U http://ceur-ws.org/Vol-2721/paper482.pdf %D 2020 %B 19th Internatinal Semantic Web Conference %Z date of event: 2020-11-01 - 2020-11-06 %C Globally Online %B ISWC 2020 Posters, Demos, and Industry Tracks %E Taylor, Kerry; Goncalves, Rafael; Lecue, Freddy; Yan, Jun %P 16 - 20 %Z sequence number: 482 %I ceur-ws.org %B CEUR Workshop Proceedings %N 2721 %@ false %U http://ceur-ws.org/Vol-2721/paper482.pdf
[38]
E. Chang, J. Caplinger, A. Marin, X. Shen, and V. Demberg, “DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool,” in The 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain (Online), 2020.
Export
BibTeX
@inproceedings{chang2020dart, TITLE = {{DART}: {A} Lightweight Quality-Suggestive Data-to-Text Annotation Tool}, AUTHOR = {Chang, Ernie and Caplinger, Jeriah and Marin, Alex and Shen, Xiaoyu and Demberg, Vera}, LANGUAGE = {eng}, ISBN = {978-1-952148-28-6}, URL = {https://www.aclweb.org/anthology/2020.coling-demos.3}, DOI = {10.18653/v1/2020.coling-demos.3}, PUBLISHER = {ACL}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 28th International Conference on Computational Linguistics (COLING 2020)}, EDITOR = {Ptaszynski, Michal and Ziolko, Bartosz}, PAGES = {12--17}, ADDRESS = {Barcelona, Spain (Online)}, }
Endnote
%0 Conference Proceedings %A Chang, Ernie %A Caplinger, Jeriah %A Marin, Alex %A Shen, Xiaoyu %A Demberg, Vera %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool : %G eng %U http://hdl.handle.net/21.11116/0000-0008-149C-2 %U https://www.aclweb.org/anthology/2020.coling-demos.3 %R 10.18653/v1/2020.coling-demos.3 %D 2020 %B The 28th International Conferenceon Computational Linguistics %Z date of event: 2020-12-08 - 2020-12-13 %C Barcelona, Spain (Online) %B The 28th International Conference on Computational Linguistics %E Ptaszynski, Michal; Ziolko, Bartosz %P 12 - 17 %I ACL %@ 978-1-952148-28-6
[39]
C. X. Chu, S. Razniewski, and G. Weikum, “ENTYFI: Entity Typing in Fictional Texts,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.
Export
BibTeX
@inproceedings{ChuWSDM2020, TITLE = {{ENTYFI}: {E}ntity Typing in Fictional Texts}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450368223}, DOI = {10.1145/3336191.3371808}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining}, EDITOR = {Caverlee, James and Hu, Xia Ben}, PAGES = {124--132}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ENTYFI: Entity Typing in Fictional Texts : %G eng %U http://hdl.handle.net/21.11116/0000-0006-A27E-6 %R 10.1145/3336191.3371808 %D 2020 %B 13th International Conference on Web Search and Data Mining %Z date of event: 2020-02-03 - 2020-02-07 %C Houston, TX, USA %B WSDM '20 %E Caverlee, James; Hu, Xia Ben %P 124 - 132 %I ACM %@ 9781450368223
[40]
C. X. Chu, S. Razniewski, and G. Weikum, “ENTYFI: A System for Fine-grained Entity Typing in Fictional Texts,” in The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Online, 2020.
Export
BibTeX
@inproceedings{Chu_EMNLP20, TITLE = {{ENTYFI}: {A} System for Fine-grained Entity Typing in Fictional Texts}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-952148-62-0}, URL = {https://www.aclweb.org/anthology/2020.emnlp-demos.14/}, DOI = {10.18653/v1/2020.emnlp-demos.14}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)}, EDITOR = {Liu, Qun and Schlangen, David}, PAGES = {100--106}, ADDRESS = {Online}, }
Endnote
%0 Conference Proceedings %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ENTYFI: A System for Fine-grained Entity Typing in Fictional Texts : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EED5-D %U https://www.aclweb.org/anthology/2020.emnlp-demos.14/ %R 10.18653/v1/2020.emnlp-demos.14 %D 2020 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2020-11-16 - 2020-11-20 %C Online %B The 2020 Conference on Empirical Methods in Natural Language Processing %E Liu, Qun; Schlangen, David %P 100 - 106 %I ACM %@ 978-1-952148-62-0 %U https://www.aclweb.org/anthology/2020.emnlp-demos.14.pdf
[41]
S. Dalleiger and J. Vreeken, “The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery,” in 20th IEEE International Conference on Data Mining (ICDM 2020), Virtual Conference, 2020.
Export
BibTeX
@inproceedings{dalleiger:20:reaper, TITLE = {The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery}, AUTHOR = {Dalleiger, Sebastian and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-7281-8316-9}, DOI = {10.1109/ICDM50108.2020.00112}, PUBLISHER = {IEEE}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {20th IEEE International Conference on Data Mining (ICDM 2020)}, EDITOR = {Plant, Claudia and Wang, Haixun and Cuzzocrea, Alfredo and Zaniolo, Carlo and Wu, Xidong}, PAGES = {978--983}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Dalleiger, Sebastian %A Vreeken, Jilles %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery : %G eng %U http://hdl.handle.net/21.11116/0000-0008-254E-8 %R 10.1109/ICDM50108.2020.00112 %D 2020 %B 20th IEEE International Conference on Data Mining %Z date of event: 2020-11-17 - 2020-11-20 %C Virtual Conference %B 20th IEEE International Conference on Data Mining %E Plant, Claudia; Wang, Haixun; Cuzzocrea, Alfredo; Zaniolo, Carlo; Wu, Xidong %P 978 - 983 %I IEEE %@ 978-1-7281-8316-9
[42]
S. Dalleiger and J. Vreeken, “Explainable Data Decompositions,” in AAAI Technical Track: Machine Learning, New York, NY, USA, 2020.
Export
BibTeX
@inproceedings{dalleiger:20:disc, TITLE = {Explainable Data Decompositions}, AUTHOR = {Dalleiger, Sebastian and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-57735-835-0}, DOI = {10.1609/aaai.v34i04.5780}, PUBLISHER = {AAAI}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {AAAI Technical Track: Machine Learning}, PAGES = {3709--3716}, ADDRESS = {New York, NY, USA}, }
Endnote
%0 Conference Proceedings %A Dalleiger, Sebastian %A Vreeken, Jilles %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Explainable Data Decompositions : %G eng %U http://hdl.handle.net/21.11116/0000-0008-2559-B %R 10.1609/aaai.v34i04.5780 %D 2020 %B Thirty-Fourth AAAI Conference on Artificial Intelligence %Z date of event: 2020-02-07 - 2020-02-12 %C New York, NY, USA %B AAAI Technical Track: Machine Learning %P 3709 - 3716 %I AAAI %@ 978-1-57735-835-0
[43]
F. Darari, W. Nutt, S. Razniewski, and S. Rudolph, “Completeness and soundness guarantees for conjunctive SPARQL queries over RDF data sources with completeness statements,” Semantic Web, vol. 11, no. 1, 2020.
Export
BibTeX
@article{Darari2020, TITLE = {Completeness and soundness guarantees for conjunctive {SPARQL} queries over {RDF} data sources with completeness statements}, AUTHOR = {Darari, Fariza and Nutt, Werner and Razniewski, Simon and Rudolph, Sebastian}, LANGUAGE = {eng}, ISSN = {1570-0844}, DOI = {10.3233/SW-190344}, PUBLISHER = {IOS Press}, ADDRESS = {Amsterdam}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, JOURNAL = {Semantic Web}, VOLUME = {11}, NUMBER = {1}, PAGES = {441--482}, }
Endnote
%0 Journal Article %A Darari, Fariza %A Nutt, Werner %A Razniewski, Simon %A Rudolph, Sebastian %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Completeness and soundness guarantees for conjunctive SPARQL queries over RDF data sources with completeness statements : %G eng %U http://hdl.handle.net/21.11116/0000-0006-9A06-6 %R 10.3233/SW-190344 %7 2020 %D 2020 %J Semantic Web %V 11 %N 1 %& 441 %P 441 - 482 %I IOS Press %C Amsterdam %@ false
[44]
J. Fischer and J. Vreeken, “Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity,” in KDD ’20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, USA, 2020.
Export
BibTeX
@inproceedings{fischer:20:mexican, TITLE = {Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity}, AUTHOR = {Fischer, Jonas and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-4503-7998-4}, DOI = {10.1145/3394486.3403124}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {KDD '20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, EDITOR = {Gupta, Rajesh and Liu, Yan and Tang, Jilaiang and Prakash, B. Aditya}, PAGES = {813--823}, ADDRESS = {Virtual Event, USA}, }
Endnote
%0 Conference Proceedings %A Fischer, Jonas %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FEA5-1 %R 10.1145/3394486.3403124 %D 2020 %B 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining %Z date of event: 2020-08-23 - 2020-08-27 %C Virtual Event, USA %B KDD '20 %E Gupta, Rajesh; Liu, Yan; Tang, Jilaiang; Prakash, B. Aditya %P 813 - 823 %I ACM %@ 978-1-4503-7998-4
[45]
J. Fischer and J. Vreeken, “Sets of Robust Rules, and How to Find Them,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019), Würzburg, Germany, 2020.
Export
BibTeX
@inproceedings{fischer:19:grab, TITLE = {Sets of Robust Rules, and How to Find Them}, AUTHOR = {Fischer, Jonas and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-3-030-46150-8}, DOI = {10.1007/978-3-030-46150-8_3}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)}, PAGES = {38--54}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {11906}, ADDRESS = {W{\"u}rzburg, Germany}, }
Endnote
%0 Conference Proceedings %A Fischer, Jonas %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Sets of Robust Rules, and How to Find Them : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FEAE-8 %R 10.1007/978-3-030-46150-8_3 %D 2020 %B European Conference on Machine Learning and Knowledge Discovery in Databases %Z date of event: 2019-09-19 - 2019-09-20 %C Würzburg, Germany %B Machine Learning and Knowledge Discovery in Databases %P 38 - 54 %I Springer %@ 978-3-030-46150-8 %B Lecture Notes in Artificial Intelligence %N 11906
[46]
M. H. Gad-Elrab, D. Stepanova, T.-K. Tran, H. Adel, and G. Weikum, “ExCut: Explainable Embedding-Based Clustering over Knowledge Graphs,” in The Semantic Web -- ISWC 2020, Athens, Greece (Virtual Conference), 2020.
Export
BibTeX
@inproceedings{Gad_Elrab_ISWC2020, TITLE = {{ExCut}: {E}xplainable Embedding-Based Clustering over Knowledge Graphs}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Tran, Trung-Kien and Adel, Heike and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-62418-7}, DOI = {10.1007/978-3-030-62419-4_13}, PUBLISHER = {Springer}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {The Semantic Web -- ISWC 2020}, EDITOR = {Pan, Jeff Z. and Tamma, Valentina and D'Amato, Claudia and Janowicz, Krzysztof and Fu, Bo and Polleres, Axel and Seneviratne, Oshani and Kagal, Lalana}, PAGES = {218--237}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {12506}, ADDRESS = {Athens, Greece (Virtual Conference)}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Tran, Trung-Kien %A Adel, Heike %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ExCut: Explainable Embedding-Based Clustering over Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0007-830F-5 %R 10.1007/978-3-030-62419-4_13 %D 2020 %B 19th International Semantic Web Conference %Z date of event: 2020-11-02 - 2020-11-06 %C Athens, Greece (Virtual Conference) %B The Semantic Web -- ISWC 2020 %E Pan, Jeff Z.; Tamma, Valentina; D'Amato, Claudia; Janowicz, Krzysztof; Fu, Bo; Polleres, Axel; Seneviratne, Oshani; Kagal, Lalana %P 218 - 237 %I Springer %@ 978-3-030-62418-7 %B Lecture Notes in Computer Science %N 12506
[47]
M. H. Gad-Elrab, V. T. Ho, E. Levinkov, T.-K. Tran, and D. Stepanova, “Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering,” in ISWC 2020 Posters, Demos, and Industry Tracks, Globally Online, 2020.
Export
BibTeX
@inproceedings{Gad-Elrab_ISCW20, TITLE = {Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Ho, Vinh Thinh and Levinkov, Evgeny and Tran, Trung-Kien and Stepanova, Daria}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {http://ceur-ws.org/Vol-2721/paper572.pdf; urn:nbn:de:0074-2721-6}, PUBLISHER = {ceur-ws.org}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ISWC 2020 Posters, Demos, and Industry Tracks}, EDITOR = {Taylor, Kerry and Goncalves, Rafael and Lecue, Freddy and Yan, Jun}, PAGES = {281--286}, EID = {572}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2721}, ADDRESS = {Globally Online}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Ho, Vinh Thinh %A Levinkov, Evgeny %A Tran, Trung-Kien %A Stepanova, Daria %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Towards Utilizing Knowledge Graph Embedding Models for Conceptual Clustering : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F86B-A %U http://ceur-ws.org/Vol-2721/paper572.pdf %D 2020 %B 19th Internatinal Semantic Web Conference %Z date of event: 2020-11-01 - 2020-11-06 %C Globally Online %B ISWC 2020 Posters, Demos, and Industry Tracks %E Taylor, Kerry; Goncalves, Rafael; Lecue, Freddy; Yan, Jun %P 281 - 286 %Z sequence number: 572 %I ceur-ws.org %B CEUR Workshop Proceedings %N 2721 %@ false %U http://ceur-ws.org/Vol-2721/paper572.pdf
[48]
A. Ghazimatin, O. Balalau, R. Saha Roy, and G. Weikum, “PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.
Export
BibTeX
@inproceedings{GhazimatinWSDM2020, TITLE = {{PRINCE}: {P}rovider-side Interpretability with Counterfactual Explanations in Recommender Systemsxts}, AUTHOR = {Ghazimatin, Azin and Balalau, Oana and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6822-3}, DOI = {10.1145/3336191.3371824}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining}, EDITOR = {Caverlee, James and Hu, Xia Ben}, PAGES = {196--204}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Ghazimatin, Azin %A Balalau, Oana %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F173-7 %R 10.1145/3336191.3371824 %D 2020 %B 13th International Conference on Web Search and Data Mining %Z date of event: 2020-02-03 - 2020-02-07 %C Houston, TX, USA %B WSDM '20 %E Caverlee, James; Hu, Xia Ben %P 196 - 204 %I ACM %@ 978-1-4503-6822-3
[49]
S. Ghosh, S. Razniewski, and G. Weikum, “Uncovering Hidden Semantics of Set Information in Knowledge Bases,” 2020. [Online]. Available: http://arxiv.org/abs/2003.03155. (arXiv: 2003.03155)
Abstract
Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicates such as parentOf and worksFor, that store individual set memberships. Both formats are typically complementary: unlike enumerating predicates, counting predicates do not give away individuals, but are more likely informative towards the true set size, thus this coexistence could enable interesting applications in question answering and KB curation. In this paper we aim at uncovering this hidden knowledge. We proceed in two steps. (i) We identify set-valued predicates from a given KB predicates via statistical and embedding-based features. (ii) We link counting predicates and enumerating predicates by a combination of co-occurrence, correlation and textual relatedness metrics. We analyze the prevalence of count information in four prominent knowledge bases, and show that our linking method achieves up to 0.55 F1 score in set predicate identification versus 0.40 F1 score of a random selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75 at position 3 in relevant predicate alignments. Our predicate alignments are showcased in a demonstration system available at https://counqer.mpi-inf.mpg.de/spo.
Export
BibTeX
@online{Ghosh_arXiv2003.03155, TITLE = {Uncovering Hidden Semantics of Set Information in Knowledge Bases}, AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/2003.03155}, EPRINT = {2003.03155}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicates such as parentOf and worksFor, that store individual set memberships. Both formats are typically complementary: unlike enumerating predicates, counting predicates do not give away individuals, but are more likely informative towards the true set size, thus this coexistence could enable interesting applications in question answering and KB curation. In this paper we aim at uncovering this hidden knowledge. We proceed in two steps. (i) We identify set-valued predicates from a given KB predicates via statistical and embedding-based features. (ii) We link counting predicates and enumerating predicates by a combination of co-occurrence, correlation and textual relatedness metrics. We analyze the prevalence of count information in four prominent knowledge bases, and show that our linking method achieves up to 0.55 F1 score in set predicate identification versus 0.40 F1 score of a random selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75 at position 3 in relevant predicate alignments. Our predicate alignments are showcased in a demonstration system available at https://counqer.mpi-inf.mpg.de/spo.}, }
Endnote
%0 Report %A Ghosh, Shrestha %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Uncovering Hidden Semantics of Set Information in Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0007-0662-4 %U http://arxiv.org/abs/2003.03155 %D 2020 %X Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicates such as parentOf and worksFor, that store individual set memberships. Both formats are typically complementary: unlike enumerating predicates, counting predicates do not give away individuals, but are more likely informative towards the true set size, thus this coexistence could enable interesting applications in question answering and KB curation. In this paper we aim at uncovering this hidden knowledge. We proceed in two steps. (i) We identify set-valued predicates from a given KB predicates via statistical and embedding-based features. (ii) We link counting predicates and enumerating predicates by a combination of co-occurrence, correlation and textual relatedness metrics. We analyze the prevalence of count information in four prominent knowledge bases, and show that our linking method achieves up to 0.55 F1 score in set predicate identification versus 0.40 F1 score of a random selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75 at position 3 in relevant predicate alignments. Our predicate alignments are showcased in a demonstration system available at https://counqer.mpi-inf.mpg.de/spo. %K Computer Science, Databases, cs.DB,Computer Science, Information Retrieval, cs.IR
[50]
S. Ghosh, S. Razniewski, and G. Weikum, “Uncovering Hidden Semantics of Set Information in Knowledge Bases,” Journal of Web Semantics, vol. 64, 2020.
Export
BibTeX
@article{Ghosh_2020, TITLE = {Uncovering Hidden Semantics of Set Information in Knowledge Bases}, AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {1570-8268}, DOI = {10.1016/j.websem.2020.100588}, PUBLISHER = {Elsevier}, ADDRESS = {Amsterdam}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, JOURNAL = {Journal of Web Semantics}, VOLUME = {64}, EID = {100588}, }
Endnote
%0 Journal Article %A Ghosh, Shrestha %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Uncovering Hidden Semantics of Set Information in Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0007-066D-9 %R 10.1016/j.websem.2020.100588 %7 2020 %D 2020 %J Journal of Web Semantics %V 64 %Z sequence number: 100588 %I Elsevier %C Amsterdam %@ false
[51]
S. Ghosh, S. Razniewski, and G. Weikum, “CounQER: A System for Discovering and Linking Count Information in Knowledge Bases,” in The Semantic Web: ESWC 2020 Satellite Events, Heraklion, Greece, 2020.
Export
BibTeX
@inproceedings{Ghosh_ESWC20, TITLE = {{CounQER}: {A} System for Discovering and Linking Count Information in Knowledge Bases}, AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-62326-5}, DOI = {10.1007/978-3-030-62327-2_15}, PUBLISHER = {Springer}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {The Semantic Web: ESWC 2020 Satellite Events}, EDITOR = {Harth, Andreas and Presutti, Valentina and Troncy, Rapha{\"e}l and Acosta, Maribel and Polleres, Axel and Fern{\'a}ndez, Javier D. and Xavier Parreira, Josiane and Hartig, Olaf and Hose, Katja and Cochez, Michael}, PAGES = {84--90}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {12124}, ADDRESS = {Heraklion, Greece}, }
Endnote
%0 Conference Proceedings %A Ghosh, Shrestha %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T CounQER: A System for Discovering and Linking Count Information in Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EFB9-C %R 10.1007/978-3-030-62327-2_15 %D 2020 %B 17th Extended Semantic Web Conference %Z date of event: 2020-05-31 - 2020-06-04 %C Heraklion, Greece %B The Semantic Web: ESWC 2020 Satellite Events %E Harth, Andreas; Presutti, Valentina; Troncy, Raphaël; Acosta, Maribel; Polleres, Axel; Fernández, Javier D.; Xavier Parreira, Josiane; Hartig, Olaf; Hose, Katja; Cochez, Michael %P 84 - 90 %I Springer %@ 978-3-030-62326-5 %B Lecture Notes in Computer Science %N 12124
[52]
S. Ghosh, S. Razniewski, and G. Weikum, “CounQER: A System for Discovering and Linking Count Information in Knowledge Bases,” 2020. [Online]. Available: https://arxiv.org/abs/2005.03529. (arXiv: 2005.03529)
Abstract
Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata, DBpedia and Freebase are often limited to subproperty, domain and range constraints. In this demo we showcase CounQER, a system that illustrates the alignment of counting predicates, like staffSize, and enumerating predicates, like workInstitution^{-1} . In the demonstration session, attendees can inspect these alignments, and will learn about the importance of these alignments for KB question answering and curation. CounQER is available at https://counqer.mpi-inf.mpg.de/spo.
Export
BibTeX
@online{Ghosh_2005.03529, TITLE = {{CounQER}: {A} System for Discovering and Linking Count Information in Knowledge Bases}, AUTHOR = {Ghosh, Shrestha and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2005.03529}, EPRINT = {2005.03529}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata, DBpedia and Freebase are often limited to subproperty, domain and range constraints. In this demo we showcase CounQER, a system that illustrates the alignment of counting predicates, like staffSize, and enumerating predicates, like workInstitution^{-1} . In the demonstration session, attendees can inspect these alignments, and will learn about the importance of these alignments for KB question answering and curation. CounQER is available at https://counqer.mpi-inf.mpg.de/spo.}, }
Endnote
%0 Report %A Ghosh, Shrestha %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T CounQER: A System for Discovering and Linking Count Information in Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F187-0 %U https://arxiv.org/abs/2005.03529 %D 2020 %X Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata, DBpedia and Freebase are often limited to subproperty, domain and range constraints. In this demo we showcase CounQER, a system that illustrates the alignment of counting predicates, like staffSize, and enumerating predicates, like workInstitution^{-1} . In the demonstration session, attendees can inspect these alignments, and will learn about the importance of these alignments for KB question answering and curation. CounQER is available at https://counqer.mpi-inf.mpg.de/spo. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[53]
D. Gupta and K. Berberich, “Weaving Text into Tables,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.
Export
BibTeX
@inproceedings{DBLP:conf/cikm/0001B20, TITLE = {Weaving Text into Tables}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-6859-9}, DOI = {10.1145/3340531.3417442}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management}, EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan}, PAGES = {3401--34049}, ADDRESS = {Virtual Event, Ireland}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Weaving Text into Tables : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0313-F %R 10.1145/3340531.3417442 %D 2020 %B 29th ACM International Conference on Information & Knowledge Management %Z date of event: 2020-10-19 - 2020-10-23 %C Virtual Event, Ireland %B CIKM '20 %E d’Aquin, Mathieu; Dietze, Stefan %P 3401 - 34049 %I ACM %@ 978-1-4503-6859-9
[54]
D. Gupta and K. Berberich, “Optimizing Hyper-Phrase Queries,” in ICTIR ’20, ACM SIGIR International Conference on Theory of Information Retrieval, Virtual Event, Norway, 2020.
Export
BibTeX
@inproceedings{DBLP:conf/ictir/0002B20, TITLE = {Optimizing Hyper-Phrase Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-8067-6}, DOI = {10.1145/3409256.3409827}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICTIR '20, ACM SIGIR International Conference on Theory of Information Retrieval}, EDITOR = {Balog, Krisztian and Setty, Vinay and Lioma, Christina and Liu, Yiqun and Zhang, Min and Berberich, Klaus}, PAGES = {41--48}, ADDRESS = {Virtual Event, Norway}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Optimizing Hyper-Phrase Queries : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0335-9 %R 10.1145/3409256.3409827 %D 2020 %B ACM SIGIR International Conference on Theory of Information Retrieval %Z date of event: 2020-09-14 - 2020-09-17 %C Virtual Event, Norway %B ICTIR '20 %E Balog, Krisztian; Setty, Vinay; Lioma, Christina; Liu, Yiqun; Zhang, Min; Berberich, Klaus %P 41 - 48 %I ACM %@ 978-1-4503-8067-6
[55]
E. Heiter, “Factoring Out Prior Knowledge from Low-dimensional Embeddings,” Universität des Saarlandes, Saarbrücken, 2020.
Export
BibTeX
@mastersthesis{heiter:20:confetti, TITLE = {Factoring Out Prior Knowledge from Low-dimensional Embeddings}, AUTHOR = {Heiter, Edith}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, }
Endnote
%0 Thesis %A Heiter, Edith %Y Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Factoring Out Prior Knowledge from Low-dimensional Embeddings : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FEF8-4 %I Universität des Saarlandes %C Saarbrücken %D 2020 %V master %9 master
[56]
V. T. Ho, K. Pal, N. Kleer, K. Berberich, and G. Weikum, “Entities with Quantities: Extraction, Search, and Ranking,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.
Export
BibTeX
@inproceedings{HoWSDM2020, TITLE = {Entities with Quantities: {E}xtraction, Search, and Ranking}, AUTHOR = {Ho, Vinh Thinh and Pal, Koninika and Kleer, Niko and Berberich, Klaus and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450368223}, DOI = {10.1145/3336191.3371860}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining}, EDITOR = {Caverlee, James and Hu, Xia Ben}, PAGES = {833--836}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Pal, Koninika %A Kleer, Niko %A Berberich, Klaus %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Entities with Quantities: Extraction, Search, and Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0006-A284-D %R 10.1145/3336191.3371860 %D 2020 %B 13th International Conference on Web Search and Data Mining %Z date of event: 2020-02-03 - 2020-02-07 %C Houston, TX, USA %B WSDM '20 %E Caverlee, James; Hu, Xia Ben %P 833 - 836 %I ACM %@ 9781450368223
[57]
M. Kaiser, R. Saha Roy, and G. Weikum, “Conversational Question Answering over Passages by Leveraging Word Proximity Networks,” in SIGIR ’20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020.
Export
BibTeX
@inproceedings{Kaiser_SIGIR20, TITLE = {Conversational Question Answering over Passages by Leveraging Word Proximity Networks}, AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450380164}, DOI = {10.1145/3397271.3401399}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGIR '20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {2129--2132}, ADDRESS = {Virtual Event, China}, }
Endnote
%0 Conference Proceedings %A Kaiser, Magdalena %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Conversational Question Answering over Passages by Leveraging Word Proximity Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F152-C %R 10.1145/3397271.3401399 %D 2020 %B 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2020-07-25 - 2020-07-30 %C Virtual Event, China %B SIGIR '20 %P 2129 - 2132 %I ACM %@ 9781450380164
[58]
M. Kaiser, R. Saha Roy, and G. Weikum, “Conversational Question Answering over Passages by Leveraging Word Proximity Networks,” 2020. [Online]. Available: https://arxiv.org/abs/2004.13117. (arXiv: 2004.13117)
Abstract
Question answering (QA) over text passages is a problem of long-standing interest in information retrieval. Recently, the conversational setting has attracted attention, where a user asks a sequence of questions to satisfy her information needs around a topic. While this setup is a natural one and similar to humans conversing with each other, it introduces two key research challenges: understanding the context left implicit by the user in follow-up questions, and dealing with ad hoc question formulations. In this work, we demonstrate CROWN (Conversational passage ranking by Reasoning Over Word Networks): an unsupervised yet effective system for conversational QA with passage responses, that supports several modes of context propagation over multiple turns. To this end, CROWN first builds a word proximity network (WPN) from large corpora to store statistically significant term co-occurrences. At answering time, passages are ranked by a combination of their similarity to the question, and coherence of query terms within: these factors are measured by reading off node and edge weights from the WPN. CROWN provides an interface that is both intuitive for end-users, and insightful for experts for reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods.
Export
BibTeX
@online{Kaiser_2004.13117, TITLE = {Conversational Question Answering over Passages by Leveraging Word Proximity Networks}, AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2004.13117}, EPRINT = {2004.13117}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Question answering (QA) over text passages is a problem of long-standing interest in information retrieval. Recently, the conversational setting has attracted attention, where a user asks a sequence of questions to satisfy her information needs around a topic. While this setup is a natural one and similar to humans conversing with each other, it introduces two key research challenges: understanding the context left implicit by the user in follow-up questions, and dealing with ad hoc question formulations. In this work, we demonstrate CROWN (Conversational passage ranking by Reasoning Over Word Networks): an unsupervised yet effective system for conversational QA with passage responses, that supports several modes of context propagation over multiple turns. To this end, CROWN first builds a word proximity network (WPN) from large corpora to store statistically significant term co-occurrences. At answering time, passages are ranked by a combination of their similarity to the question, and coherence of query terms within: these factors are measured by reading off node and edge weights from the WPN. CROWN provides an interface that is both intuitive for end-users, and insightful for experts for reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods.}, }
Endnote
%0 Report %A Kaiser, Magdalena %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Conversational Question Answering over Passages by Leveraging Word Proximity Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F17D-D %U https://arxiv.org/abs/2004.13117 %D 2020 %X Question answering (QA) over text passages is a problem of long-standing interest in information retrieval. Recently, the conversational setting has attracted attention, where a user asks a sequence of questions to satisfy her information needs around a topic. While this setup is a natural one and similar to humans conversing with each other, it introduces two key research challenges: understanding the context left implicit by the user in follow-up questions, and dealing with ad hoc question formulations. In this work, we demonstrate CROWN (Conversational passage ranking by Reasoning Over Word Networks): an unsupervised yet effective system for conversational QA with passage responses, that supports several modes of context propagation over multiple turns. To this end, CROWN first builds a word proximity network (WPN) from large corpora to store statistically significant term co-occurrences. At answering time, passages are ranked by a combination of their similarity to the question, and coherence of query terms within: these factors are measured by reading off node and edge weights from the WPN. CROWN provides an interface that is both intuitive for end-users, and insightful for experts for reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[59]
M. Kaiser, “Incorporating User Feedback in Conversational Question Answering over Heterogeneous Web Sources,” in SIGIR ’20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020.
Export
BibTeX
@inproceedings{Kaiser_SIGIR20b, TITLE = {Incorporating User Feedback in Conversational Question Answering over Heterogeneous {Web} Sources}, AUTHOR = {Kaiser, Magdalena}, LANGUAGE = {eng}, ISBN = {9781450380164}, DOI = {10.1145/3397271.3401454}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGIR '20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {2482--2482}, ADDRESS = {Virtual Event, China}, }
Endnote
%0 Conference Proceedings %A Kaiser, Magdalena %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Incorporating User Feedback in Conversational Question Answering over Heterogeneous Web Sources : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FCDA-8 %R 10.1145/3397271.3401454 %D 2020 %B 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2020-07-25 - 2020-07-30 %C Virtual Event, China %B SIGIR '20 %P 2482 - 2482 %I ACM %@ 9781450380164
[60]
P. Lahoti, A. Beutel, J. Chen, K. Lee, F. Prost, N. Thain, X. Wang, and E. Chi, “Fairness without Demographics through Adversarially Reweighted Learning,” in Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Virtual Event, 2020.
Export
BibTeX
@inproceedings{DBLP:conf/nips/LahotiBCLPT0C20, TITLE = {Fairness without Demographics through Adversarially Reweighted Learning}, AUTHOR = {Lahoti, Preethi and Beutel, Alex and Chen, Jilin and Lee, Kang and Prost, Flavien and Thain, Nithum and Wang, Xuezhi and Chi, Ed}, LANGUAGE = {eng}, PUBLISHER = {Curran Associates, Inc.}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Advances in Neural Information Processing Systems 33 (NeurIPS 2020)}, EDITOR = {Larochelle, Hugo and Ranzato, Marc Aurelio and Hadsell, Raia and Balcan, Maria-Florina and Lin, Hsuan-Tien}, ADDRESS = {Virtual Event}, }
Endnote
%0 Conference Proceedings %A Lahoti, Preethi %A Beutel, Alex %A Chen, Jilin %A Lee, Kang %A Prost, Flavien %A Thain, Nithum %A Wang, Xuezhi %A Chi, Ed %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations %T Fairness without Demographics through Adversarially Reweighted Learning : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FCC2-2 %D 2020 %B 34th Conference on Neural Information Processing Systems %Z date of event: 2020-12-06 - 2020-12-12 %C Virtual Event %B Advances in Neural Information Processing Systems 33 %E Larochelle, Hugo; Ranzato, Marc Aurelio; Hadsell, Raia; Balcan, Maria-Florina; Lin, Hsuan-Tien %I Curran Associates, Inc. %U https://proceedings.neurips.cc/paper/2020/hash/07fc15c9d169ee48573edd749d25945d-Abstract.html
[61]
C. Li, A. Yates, S. MacAvaney, B. He, and Y. Sun, “PARADE: Passage Representation Aggregation for Document Reranking,” 2020. [Online]. Available: https://arxiv.org/abs/2008.09093. (arXiv: 2008.09093)
Abstract
We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document relevance score, overcoming the limitations of previous approaches that perform inference on passages independently. Experiments on two ad-hoc retrieval benchmarks demonstrate PARADE's effectiveness over such methods. We conduct extensive analyses on PARADE's efficiency, highlighting several strategies for improving it. When combined with knowledge distillation, a PARADE model with 72\% fewer parameters achieves effectiveness competitive with previous approaches using BERT-Base. Our code is available at \url{https://github.com/canjiali/PARADE}.
Export
BibTeX
@online{Li2008.09093, TITLE = {{PARADE}: Passage Representation Aggregation for Document Reranking}, AUTHOR = {Li, Canjia and Yates, Andrew and MacAvaney, Sean and He, Ben and Sun, Yingfei}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2008.09093}, EPRINT = {2008.09093}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document relevance score, overcoming the limitations of previous approaches that perform inference on passages independently. Experiments on two ad-hoc retrieval benchmarks demonstrate PARADE's effectiveness over such methods. We conduct extensive analyses on PARADE's efficiency, highlighting several strategies for improving it. When combined with knowledge distillation, a PARADE model with 72\% fewer parameters achieves effectiveness competitive with previous approaches using BERT-Base. Our code is available at \url{https://github.com/canjiali/PARADE}.}, }
Endnote
%0 Report %A Li, Canjia %A Yates, Andrew %A MacAvaney, Sean %A He, Ben %A Sun, Yingfei %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T PARADE: Passage Representation Aggregation for Document Reranking : %G eng %U http://hdl.handle.net/21.11116/0000-0008-06CF-9 %U https://arxiv.org/abs/2008.09093 %D 2020 %X We present PARADE, an end-to-end Transformer-based model that considers document-level context for document reranking. PARADE leverages passage-level relevance representations to predict a document relevance score, overcoming the limitations of previous approaches that perform inference on passages independently. Experiments on two ad-hoc retrieval benchmarks demonstrate PARADE's effectiveness over such methods. We conduct extensive analyses on PARADE's efficiency, highlighting several strategies for improving it. When combined with knowledge distillation, a PARADE model with 72\% fewer parameters achieves effectiveness competitive with previous approaches using BERT-Base. Our code is available at \url{https://github.com/canjiali/PARADE}. %K Computer Science, Information Retrieval, cs.IR
[62]
J. Lin, R. Nogueira, and A. Yates, “Pretrained Transformers for Text Ranking: BERT and Beyond,” 2020. [Online]. Available: https://arxiv.org/abs/2010.06467. (arXiv: 2010.06467)
Abstract
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading.
Export
BibTeX
@online{Lin2010.06467, TITLE = {Pretrained Transformers for Text Ranking: {BERT} and Beyond}, AUTHOR = {Lin, Jimmy and Nogueira, Rodrigo and Yates, Andrew}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2010.06467}, EPRINT = {2010.06467}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading.}, }
Endnote
%0 Report %A Lin, Jimmy %A Nogueira, Rodrigo %A Yates, Andrew %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Pretrained Transformers for Text Ranking: BERT and Beyond : %G eng %U http://hdl.handle.net/21.11116/0000-0008-06DA-C %U https://arxiv.org/abs/2010.06467 %D 2020 %X The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[63]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Dependencies with Reliable Mutual Information,” Knowledge and Information Systems, vol. 62, 2020.
Export
BibTeX
@article{Mandros2020, TITLE = {Discovering Dependencies with Reliable Mutual Information}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISSN = {0219-3116}, DOI = {10.1007/s10115-020-01494-9}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, JOURNAL = {Knowledge and Information Systems}, VOLUME = {62}, PAGES = {4223--4253}, }
Endnote
%0 Journal Article %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Dependencies with Reliable Mutual Information : %G eng %U http://hdl.handle.net/21.11116/0000-0006-DC90-F %R 10.1007/s10115-020-01494-9 %7 2020 %D 2020 %J Knowledge and Information Systems %V 62 %& 4223 %P 4223 - 4253 %I Springer %C New York, NY %@ false
[64]
S. Nag Chowdhury, W. Cheng, G. de Melo, S. Razniewski, and G. Weikum, “Illustrate Your Story: Enriching Text with Images,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.
Export
BibTeX
@inproceedings{NagWSDM2020, TITLE = {Illustrate Your Story: {Enriching} Text with Images}, AUTHOR = {Nag Chowdhury, Sreyasi and Cheng, William and de Melo, Gerard and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450368223}, DOI = {10.1145/3336191.3371866}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining}, EDITOR = {Caverlee, James and Hu, Xia Ben}, PAGES = {849--852}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Nag Chowdhury, Sreyasi %A Cheng, William %A de Melo, Gerard %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Illustrate Your Story: Enriching Text with Images : %G eng %U http://hdl.handle.net/21.11116/0000-0006-A27C-8 %R 10.1145/3336191.3371866 %D 2020 %B 13th International Conference on Web Search and Data Mining %Z date of event: 2020-02-03 - 2020-02-07 %C Houston, TX, USA %B WSDM '20 %E Caverlee, James; Hu, Xia Ben %P 849 - 852 %I ACM %@ 9781450368223
[65]
T.-P. Nguyen, “Advanced Semantics for Commonsense Knowledge Extraction,” Universität des Saarlandes, Saarbrücken, 2020.
Abstract
Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.
Export
BibTeX
@mastersthesis{NguyenMSc2020, TITLE = {Advanced Semantics for Commonsense Knowledge Extraction}, AUTHOR = {Nguyen, Tuan-Phong}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, ABSTRACT = {Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.}, }
Endnote
%0 Thesis %A Nguyen, Tuan-Phong %Y Razniewski, Simon %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Advanced Semantics for Commonsense Knowledge Extraction : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FED0-0 %I Universität des Saarlandes %C Saarbrücken %D 2020 %P 67 p. %V master %9 master %X Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This thesis presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.
[66]
T.-P. Nguyen, S. Razniewski, and G. Weikum, “Advanced Semantics for Commonsense Knowledge Extraction,” WWW 2021, 2020. [Online]. Available: https://arxiv.org/abs/2011.00905. (arXiv: 2011.00905)
Abstract
Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.
Export
BibTeX
@online{Nguyen_2011.00905, TITLE = {Advanced Semantics for Commonsense Knowledge Extraction}, AUTHOR = {Nguyen, Tuan-Phong and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2011.00905}, EPRINT = {2011.00905}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.}, JOURNAL = {WWW 2021}, }
Endnote
%0 Report %A Nguyen, Tuan-Phong %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Advanced Semantics for Commonsense Knowledge Extraction : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FEDA-6 %U https://arxiv.org/abs/2011.00905 %D 2020 %X Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL %J WWW 2021
[67]
A. Oláh, “What’s in the Box? Explaining Neural Networks with Robust Rules,” Universität des Saarlandes, Saarbrücken, 2020.
Export
BibTeX
@mastersthesis{olah:20:explainn, TITLE = {What's in the Box? Explaining Neural Networks with Robust Rules}, AUTHOR = {Ol{\'a}h, Anna}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, }
Endnote
%0 Thesis %A Oláh, Anna %Y Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T What's in the Box? Explaining Neural Networks with Robust Rules : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FEFA-2 %I Universität des Saarlandes %C Saarbrücken %D 2020 %V master %9 master
[68]
K. Pal, V. T. Ho, and G. Weikum, “Co-Clustering Triples from Open Information Extraction,” in Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020), Hyderabad, India, 2020.
Export
BibTeX
@inproceedings{Pal_CoDS2020, TITLE = {Co-Clustering Triples from Open Information Extraction}, AUTHOR = {Pal, Koninika and Ho, Vinh Thinh and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450377386}, DOI = {10.1145/3371158.3371183}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (CoDS-COMAD 2020)}, EDITOR = {Bhattacharya, Arnab and Natarajan, Sriraam and Saha Roy, Rishiraj}, PAGES = {190--194}, ADDRESS = {Hyderabad, India}, }
Endnote
%0 Conference Proceedings %A Pal, Koninika %A Ho, Vinh Thinh %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Co-Clustering Triples from Open Information Extraction : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EBFC-5 %R 10.1145/3371158.3371183 %D 2020 %B ACM India Joint International Conferenceon Data Science and Management of Data %Z date of event: 2020-01-05 - 2020-01-07 %C Hyderabad, India %B Proceedings of the 7th ACM IKDD CoDS and 25th COMAD %E Bhattacharya, Arnab; Natarajan, Sriraam; Saha Roy, Rishiraj %P 190 - 194 %I ACM %@ 9781450377386
[69]
T. Pellissier Tanon, G. Weikum, and F. Suchanek, “YAGO 4: A Reason-able Knowledge Base,” in The Semantic Web (ESWC 2020), Heraklion, Greece, 2020.
Export
BibTeX
@inproceedings{Pellissier_ESCW2020, TITLE = {{YAGO 4}: {A} Reason-able Knowledge Base}, AUTHOR = {Pellissier Tanon, Thomas and Weikum, Gerhard and Suchanek, Fabian}, LANGUAGE = {eng}, ISBN = {978-3-030-49460-5}, DOI = {10.1007/978-3-030-49461-2_34}, PUBLISHER = {Springer}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {The Semantic Web (ESWC 2020)}, EDITOR = {Harth, Andreas and Kirrane, Sabrina and Ngonga Ngomo, Axel-Cyrille and Paulheim, Heiko and Rula, Anisa and Gentile, Anna Lisa and Haase, Peter and Cochez, Michael}, PAGES = {583 {\textbar}--596}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {12123}, ADDRESS = {Heraklion, Greece}, }
Endnote
%0 Conference Proceedings %A Pellissier Tanon, Thomas %A Weikum, Gerhard %A Suchanek, Fabian %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T YAGO 4: A Reason-able Knowledge Base : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EFC8-B %R 10.1007/978-3-030-49461-2_34 %D 2020 %B 17th Extended Semantic Web Conference %Z date of event: 2020-05-31 - 2020-06-04 %C Heraklion, Greece %B The Semantic Web %E Harth, Andreas; Kirrane, Sabrina; Ngonga Ngomo, Axel-Cyrille; Paulheim, Heiko; Rula, Anisa; Gentile, Anna Lisa; Haase, Peter; Cochez, Michael %P 583 | - 596 %I Springer %@ 978-3-030-49460-5 %B Lecture Notes in Computer Science %N 12123
[70]
F. Pennerath, P. Mandros, and J. Vreeken, “Discovering Approximate Functional Dependencies using Smoothed Mutual Information,” in KDD ’20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, USA, 2020.
Export
BibTeX
@inproceedings{penerath:20:smooth, TITLE = {Discovering Approximate Functional Dependencies using Smoothed Mutual Information}, AUTHOR = {Pennerath, Fr{\'e}d{\'e}ric and Mandros, Panagiotis and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-4503-7998-4}, DOI = {10.1145/3394486.3403178}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {KDD '20, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, EDITOR = {Gupta, Rajesh and Liu, Yan and Tang, Jilaiang and Prakash, B. Aditya}, PAGES = {1254--1264}, ADDRESS = {Virtual Event, USA}, }
Endnote
%0 Conference Proceedings %A Pennerath, Frédéric %A Mandros, Panagiotis %A Vreeken, Jilles %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Discovering Approximate Functional Dependencies using Smoothed Mutual Information : %G eng %U http://hdl.handle.net/21.11116/0000-0008-2560-2 %R 10.1145/3394486.3403178 %D 2020 %B 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining %Z date of event: 2020-08-23 - 2020-08-27 %C Virtual Event, USA %B KDD '20 %E Gupta, Rajesh; Liu, Yan; Tang, Jilaiang; Prakash, B. Aditya %P 1254 - 1264 %I ACM %@ 978-1-4503-7998-4
[71]
S. Qiu, B. Xu, J. Zhang, Y. Wang, X. Shen, G. de Melo, C. Long, and X. Li, “EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks,” in Companion of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.
Export
BibTeX
@inproceedings{qiu2020easyaug, TITLE = {{EasyAug}: {An} Automatic Textual Data Augmentation Platform for Classification Tasks}, AUTHOR = {Qiu, Siyuan and Xu, Binxia and Zhang, Jie and Wang, Yafang and Shen, Xiaoyu and de Melo, Gerard and Long, Chong and Li, Xiaolong}, LANGUAGE = {eng}, ISBN = {978-1-4503-7024-0}, DOI = {10.1145/3366424.3383552}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2020)}, EDITOR = {El Fallah, Amal and Sukthankar, Gita and Liu, Tie-Yan and van Steen, Maarten}, PAGES = {249--252}, ADDRESS = {Taipei, Taiwan}, }
Endnote
%0 Conference Proceedings %A Qiu, Siyuan %A Xu, Binxia %A Zhang, Jie %A Wang, Yafang %A Shen, Xiaoyu %A de Melo, Gerard %A Long, Chong %A Li, Xiaolong %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks : %G eng %U http://hdl.handle.net/21.11116/0000-0008-143B-0 %R 10.1145/3366424.3383552 %D 2020 %B The World Wide Web Conference %Z date of event: 2020-04-20 - 2020-04-24 %C Taipei, Taiwan %B Companion of The World Wide Web Conference %E El Fallah, Amal; Sukthankar, Gita; Liu, Tie-Yan; van Steen, Maarten %P 249 - 252 %I ACM %@ 978-1-4503-7024-0
[72]
N. H. Ramadhana, F. Darari, P. O. H. Putra, W. Nutt, S. Razniewski, and R. I. Akbar, “User-Centered Design for Knowledge Imbalance Analysis: A Case Study of ProWD,” in VOILA!2020, Fifth International Workshop on Visualization and Interaction for Ontologies and Linked Data, Virtual Conference, 2020.
Export
BibTeX
@inproceedings{Ramadhana_VOILA2020, TITLE = {User-Centered Design for Knowledge Imbalance Analysis: {A} Case Study of {ProWD}}, AUTHOR = {Ramadhana, Nadyah Hani and Darari, Fariz and Putra, Panca O. Hadi and Nutt, Werner and Razniewski, Simon and Akbar, Refo Ilmiya}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {http://ceur-ws.org/Vol-2778/paper2.pdf; urn:nbn:de:0074-2778-8}, PUBLISHER = {ceur-ws.org}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {VOILA!2020, Fifth International Workshop on Visualization and Interaction for Ontologies and Linked Data}, EDITOR = {Ivanova, Valentina and Lambrix, Patrick and Pesquita, Catia and Wiens, Vitalis}, PAGES = {14--27}, EID = {2}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2778}, ADDRESS = {Virtual Conference}, }
Endnote
%0 Conference Proceedings %A Ramadhana, Nadyah Hani %A Darari, Fariz %A Putra, Panca O. Hadi %A Nutt, Werner %A Razniewski, Simon %A Akbar, Refo Ilmiya %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T User-Centered Design for Knowledge Imbalance Analysis: A Case Study of ProWD : %G eng %U http://hdl.handle.net/21.11116/0000-0008-063B-0 %U http://ceur-ws.org/Vol-2778/paper2.pdf %D 2020 %B Fifth International Workshop on Visualization and Interaction for Ontologies and Linked Data %Z date of event: 2020-11-02 - 2020-11-02 %C Virtual Conference %B VOILA!2020 %E Ivanova, Valentina; Lambrix, Patrick; Pesquita, Catia; Wiens, Vitalis %P 14 - 27 %Z sequence number: 2 %I ceur-ws.org %B CEUR Workshop Proceedings %N 2778 %@ false %U http://ceur-ws.org/Vol-2778/paper2.pdf
[73]
S. Razniewski and P. Das, “Structured Knowledge: Have We Made Progress? An Extrinsic Study of KB Coverage over 19 Years,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.
Abstract
Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.
Export
BibTeX
@inproceedings{razniewski2020structured, TITLE = {Structured Knowledge: {H}ave We Made Progress? {A}n Extrinsic Study of {KB} Coverage over 19 Years}, AUTHOR = {Razniewski, Simon and Das, Priyanka}, LANGUAGE = {eng}, ISBN = {978-1-4503-6859-9}, DOI = {10.1145/3340531.3417447}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, ABSTRACT = {Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.}, BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management}, EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan}, PAGES = {3317--3320}, ADDRESS = {Virtual Event, Ireland}, }
Endnote
%0 Conference Proceedings %A Razniewski, Simon %A Das, Priyanka %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Structured Knowledge: Have We Made Progress? An Extrinsic Study of KB Coverage over 19 Years : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FF42-0 %R 10.1145/3340531.3417447 %D 2020 %B 29th ACM International Conference on Information & Knowledge Management %Z date of event: 2020-10-19 - 2020-10-23 %C Virtual Event, Ireland %X Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off. %B CIKM '20 %E d’Aquin, Mathieu; Dietze, Stefan %P 3317 - 3320 %I ACM %@ 978-1-4503-6859-9
[74]
J. Romero and S. Razniewski, “Inside Quasimodo: Exploring Construction and Usage of Commonsense Knowledge,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.
Abstract
Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.
Export
BibTeX
@inproceedings{Romero_CIKM2020, TITLE = {Inside {Quasimodo}: {E}xploring Construction and Usage of Commonsense Knowledge}, AUTHOR = {Romero, Julien and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-1-4503-6859-9}, DOI = {10.1145/3340531.3417416}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, ABSTRACT = {Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.}, BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management}, EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan}, PAGES = {3445--3448}, ADDRESS = {Virtual Event, Ireland}, }
Endnote
%0 Conference Proceedings %A Romero, Julien %A Razniewski, Simon %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Inside Quasimodo: Exploring Construction and Usage of Commonsense Knowledge : %G eng %U http://hdl.handle.net/21.11116/0000-0008-04C6-4 %R 10.1145/3340531.3417416 %D 2020 %B 29th ACM International Conference on Information & Knowledge Management %Z date of event: 2020-10-19 - 2020-10-23 %C Virtual Event, Ireland %X Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off. %B CIKM '20 %E d’Aquin, Mathieu; Dietze, Stefan %P 3445 - 3448 %I ACM %@ 978-1-4503-6859-9
[75]
R. Saha Roy and A. Anand, “Question Answering over Curated and Open Web Sources,” in SIGIR ’20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020.
Export
BibTeX
@inproceedings{SahaRoy_SIGIR20, TITLE = {Question Answering over Curated and Open Web Sources}, AUTHOR = {Saha Roy, Rishiraj and Anand, Avishek}, LANGUAGE = {eng}, ISBN = {9781450380164}, DOI = {10.1145/3397271.3401421}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGIR '20, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {2432--2435}, ADDRESS = {Virtual Event, China}, }
Endnote
%0 Conference Proceedings %A Saha Roy, Rishiraj %A Anand, Avishek %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Question Answering over Curated and Open Web Sources : %G eng %U http://hdl.handle.net/21.11116/0000-0008-02F6-0 %R 10.1145/3397271.3401421 %D 2020 %B 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2020-07-25 - 2020-07-30 %C Virtual Event, China %B SIGIR '20 %P 2432 - 2435 %I ACM %@ 9781450380164
[76]
R. Saha Roy and A. Anand, “Question Answering over Curated and Open Web Sources,” 2020. [Online]. Available: https://arxiv.org/abs/2004.11980. (arXiv: 2004.11980)
Abstract
The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.
Export
BibTeX
@online{SahaRoy2004.11980, TITLE = {Question Answering over Curated and Open Web Sources}, AUTHOR = {Saha Roy, Rishiraj and Anand, Avishek}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2004.11980}, EPRINT = {2004.11980}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.}, }
Endnote
%0 Report %A Saha Roy, Rishiraj %A Anand, Avishek %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Question Answering over Curated and Open Web Sources : %G eng %U http://hdl.handle.net/21.11116/0000-0008-09CA-B %U https://arxiv.org/abs/2004.11980 %D 2020 %X The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[77]
V. Sathya, S. Ghosh, A. Ramamurthy, and B. R. Tamma, “Small Cell Planning: Resource Management and Interference Mitigation Mechanisms in LTE HetNets,” Wireless Personal Communications, vol. 115, 2020.
Export
BibTeX
@article{Sathya2020, TITLE = {Small Cell Planning: {R}esource Management and Interference Mitigation Mechanisms in {LTE HetNets}}, AUTHOR = {Sathya, Vanlin and Ghosh, Shrestha and Ramamurthy, Arun and Tamma, Bheemarjuna Reddy}, LANGUAGE = {eng}, ISSN = {0929-6212}, DOI = {10.1007/s11277-020-07574-x}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, JOURNAL = {Wireless Personal Communications}, VOLUME = {115}, PAGES = {335--361}, }
Endnote
%0 Journal Article %A Sathya, Vanlin %A Ghosh, Shrestha %A Ramamurthy, Arun %A Tamma, Bheemarjuna Reddy %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Small Cell Planning: Resource Management and Interference Mitigation Mechanisms in LTE HetNets : %G eng %U http://hdl.handle.net/21.11116/0000-0006-B963-A %R 10.1007/s11277-020-07574-x %7 2020 %D 2020 %J Wireless Personal Communications %V 115 %& 335 %P 335 - 361 %I Springer %C New York, NY %@ false
[78]
X. Shen, E. Chang, H. Su, C. Niu, and D. Klakow, “Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence,” in The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020.
Export
BibTeX
@inproceedings{shen2020neural, TITLE = {Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence}, AUTHOR = {Shen, Xiaoyu and Chang, Ernie and Su, Hui and Niu, Cheng and Klakow, Dietrich}, LANGUAGE = {eng}, ISBN = {978-1-952148-25-5}, URL = {https://www.aclweb.org/anthology/2020.acl-main.641}, DOI = {10.18653/v1/2020.acl-main.641}, PUBLISHER = {ACL}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)}, EDITOR = {Jurafsky, Dan and Chai, Joyce and Schluter, Natalie and Tetreault, Joel}, PAGES = {7155--7165}, }
Endnote
%0 Conference Proceedings %A Shen, Xiaoyu %A Chang, Ernie %A Su, Hui %A Niu, Cheng %A Klakow, Dietrich %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations %T Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence : %G eng %U http://hdl.handle.net/21.11116/0000-0008-141B-4 %U https://www.aclweb.org/anthology/2020.acl-main.641 %R 10.18653/v1/2020.acl-main.641 %D 2020 %B 58th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2020-07-05 - 2020-07-10 %B The 58th Annual Meeting of the Association for Computational Linguistics %E Jurafsky, Dan; Chai, Joyce; Schluter, Natalie; Tetreault, Joel %P 7155 - 7165 %I ACL %@ 978-1-952148-25-5
[79]
H. Su, X. Shen, S. Zhao, Z. Xiao, P. Hu, C. Niu, and J. Zhou, “Diversifying Dialogue Generation with Non-Conversational Text,” in The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020.
Export
BibTeX
@inproceedings{su2020diversifying, TITLE = {Diversifying Dialogue Generation with Non-Conversational Text}, AUTHOR = {Su, Hui and Shen, Xiaoyu and Zhao, Sanqiang and Xiao, Zhou and Hu, Pengwei and Niu, Cheng and Zhou, Jie}, LANGUAGE = {eng}, ISBN = {978-1-952148-25-5}, URL = {https://www.aclweb.org/anthology/2020.acl-main.634}, DOI = {10.18653/v1/2020.acl-main.634}, PUBLISHER = {ACL}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)}, EDITOR = {Jurafsky, Dan and Chai, Joyce and Schluter, Natalie and Tetreault, Joel}, PAGES = {7087--7097}, }
Endnote
%0 Conference Proceedings %A Su, Hui %A Shen, Xiaoyu %A Zhao, Sanqiang %A Xiao, Zhou %A Hu, Pengwei %A Niu, Cheng %A Zhou, Jie %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Diversifying Dialogue Generation with Non-Conversational Text : %G eng %U http://hdl.handle.net/21.11116/0000-0008-14AF-D %U https://www.aclweb.org/anthology/2020.acl-main.634 %R 10.18653/v1/2020.acl-main.634 %D 2020 %B 58th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2020-07-05 - 2020-07-10 %B The 58th Annual Meeting of the Association for Computational Linguistics %E Jurafsky, Dan; Chai, Joyce; Schluter, Natalie; Tetreault, Joel %P 7087 - 7097 %I ACL %@ 978-1-952148-25-5
[80]
S. Sukarieh, “SPRAP: Detecting Opinion Spam Campaigns in Online Rating Services,” Universität des Saarlandes, Saarbrücken, 2020.
Export
BibTeX
@mastersthesis{sukarieh:20:sprap, TITLE = {{SPRAP}: Detecting Opinion Spam Campaigns in Online Rating Services}, AUTHOR = {Sukarieh, Sandra}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, }
Endnote
%0 Thesis %A Sukarieh, Sandra %Y Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T SPRAP: Detecting Opinion Spam Campaigns in Online Rating Services : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FF00-A %I Universität des Saarlandes %C Saarbrücken %D 2020 %V master %9 master
[81]
C. Sutton, M. Boley, L. Ghiringhelli, M. Rupp, J. Vreeken, and M. Scheffler,, “Identifying Domains of Applicability of Machine Learning Models for Materials Science,” Nature Communications, vol. 11, 2020.
Export
BibTeX
@article{sutton:20:natcomm, TITLE = {Identifying Domains of Applicability of Machine Learning Models for Materials Science}, AUTHOR = {Sutton, Chris and Boley, Mario and Ghiringhelli, Luca and Rupp, Matthias and Vreeken, Jilles and Scheffler,, Matthias}, LANGUAGE = {eng}, ISSN = {2041-1723}, DOI = {10.1038/s41467-020-17112-9}, PUBLISHER = {Nature Publishing Group}, ADDRESS = {London}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, JOURNAL = {Nature Communications}, VOLUME = {11}, EID = {4428}, }
Endnote
%0 Journal Article %A Sutton, Chris %A Boley, Mario %A Ghiringhelli, Luca %A Rupp, Matthias %A Vreeken, Jilles %A Scheffler,, Matthias %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Identifying Domains of Applicability of Machine Learning Models for Materials Science : %G eng %U http://hdl.handle.net/21.11116/0000-0008-26CF-5 %R 10.1038/s41467-020-17112-9 %7 2020 %D 2020 %J Nature Communications %O Nat. Commun. %V 11 %Z sequence number: 4428 %I Nature Publishing Group %C London %@ false
[82]
E. Terolli, P. Ernst, and G. Weikum, “Focused Query Expansion with Entity Cores for Patient-Centric Health Search,” in The Semantic Web -- ISWC 2020, Athens, Greece (Virtual Conference), 2020.
Export
BibTeX
@inproceedings{Terolli_ISWC2020, TITLE = {Focused Query Expansion with Entity Cores for Patient-Centric Health Search}, AUTHOR = {Terolli, Erisa and Ernst, Patrick and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-62418-7}, DOI = {10.1007/978-3-030-62419-4_31}, PUBLISHER = {Springer}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, BOOKTITLE = {The Semantic Web -- ISWC 2020}, EDITOR = {Pan, Jeff Z. and Tamma, Valentina and D'Amato, Claudia and Janowicz, Krzysztof and Fu, Bo and Polleres, Axel and Seneviratne, Oshani and Kagal, Lalana}, PAGES = {547--564}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {12506}, ADDRESS = {Athens, Greece (Virtual Conference)}, }
Endnote
%0 Conference Proceedings %A Terolli, Erisa %A Ernst, Patrick %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Focused Query Expansion with Entity Cores for Patient-Centric Health Search : %G eng %U http://hdl.handle.net/21.11116/0000-0007-78D7-0 %R 10.1007/978-3-030-62419-4_31 %D 2020 %B 19th International Semantic Web Conference %Z date of event: 2020-11-02 - 2020-11-06 %C Athens, Greece (Virtual Conference) %B The Semantic Web -- ISWC 2020 %E Pan, Jeff Z.; Tamma, Valentina; D'Amato, Claudia; Janowicz, Krzysztof; Fu, Bo; Polleres, Axel; Seneviratne, Oshani; Kagal, Lalana %P 547 - 564 %I Springer %@ 978-3-030-62418-7 %B Lecture Notes in Computer Science %N 12506
[83]
A. Tigunova, A. Yates, P. Mirza, and G. Weikum, “CHARM: Inferring Personal Attributes from Conversations,” in The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Online, 2020.
Export
BibTeX
@inproceedings{Tigunova_EMNLP20, TITLE = {{CHARM}: {I}nferring Personal Attributes from Conversations}, AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-952148-60-6}, URL = {https://www.aclweb.org/anthology/2020.emnlp-main.434}, DOI = {10.18653/v1/2020.emnlp-main.434}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)}, EDITOR = {Webber, Bonnie and Cohn, Trevor and He, Yulan and Liu, Yang}, PAGES = {5391--5404}, ADDRESS = {Online}, }
Endnote
%0 Conference Proceedings %A Tigunova, Anna %A Yates, Andrew %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T CHARM: Inferring Personal Attributes from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EEDB-7 %U https://www.aclweb.org/anthology/2020.emnlp-main.434 %R 10.18653/v1/2020.emnlp-main.434 %D 2020 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2020-11-16 - 2020-11-20 %C Online %B The 2020 Conference on Empirical Methods in Natural Language Processing %E Webber, Bonnie; Cohn, Trevor; He, Yulan; Liu, Yang %P 5391 - 5404 %I ACM %@ 978-1-952148-60-6 %U https://www.aclweb.org/anthology/2020.emnlp-main.434.pdf
[84]
A. Tigunova, P. Mirza, A. Yates, and G. Weikum, “RedDust: a Large Reusable Dataset of Reddit User Traits,” in Twelfth Language Resources and Evaluation Conference (LREC 2020), Marseille, France, 2020.
Export
BibTeX
@inproceedings{Tigunova_ELREC20, TITLE = {{RedDust}: a Large Reusable Dataset of {Reddit} User Traits}, AUTHOR = {Tigunova, Anna and Mirza, Paramita and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {979-10-95546-34-4}, URL = {https://www.aclweb.org/anthology/2020.lrec-1.751}, PUBLISHER = {ELRA}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Twelfth Language Resources and Evaluation Conference (LREC 2020)}, EDITOR = {Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odiik, Jan and Piperidis, Stelios}, PAGES = {6118--6126}, ADDRESS = {Marseille, France}, }
Endnote
%0 Conference Proceedings %A Tigunova, Anna %A Mirza, Paramita %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T RedDust: a Large Reusable Dataset of Reddit User Traits : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F0A9-B %U https://www.aclweb.org/anthology/2020.lrec-1.751 %D 2020 %B 12th Language Resources and Evaluation Conference %Z date of event: 2020-05-11 - 2020-05-16 %C Marseille, France %B Twelfth Language Resources and Evaluation Conference %E Calzolari, Nicoletta; Béchet, Frédéric; Blache, Philippe; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Mariani, Joseph; Mazo, Hélène; Moreno, Asuncion; Odiik, Jan; Piperidis, Stelios %P 6118 - 6126 %I ELRA %@ 979-10-95546-34-4 %U https://www.aclweb.org/anthology/2020.lrec-1.751.pdf
[85]
A. Tigunova, “Extracting Personal Information from Conversations,” in Companion of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.
Export
BibTeX
@inproceedings{tigunova2020extracting, TITLE = {Extracting Personal Information from Conversations}, AUTHOR = {Tigunova, Anna}, LANGUAGE = {eng}, ISBN = {978-1-4503-7024-0}, DOI = {10.1145/3366424.3382089}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2020)}, EDITOR = {El Fallah, Amal and Sukthankar, Gita and Liu, Tie-Yan and van Steen, Maarten}, PAGES = {284--288}, ADDRESS = {Taipei, Taiwan}, }
Endnote
%0 Conference Proceedings %A Tigunova, Anna %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Extracting Personal Information from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F845-4 %R 10.1145/3366424.3382089 %D 2020 %B The World Wide Web Conference %Z date of event: 2020-04-20 - 2020-04-24 %C Taipei, Taiwan %B Companion of The World Wide Web Conference %E El Fallah, Amal; Sukthankar, Gita; Liu, Tie-Yan; van Steen, Maarten %P 284 - 288 %I ACM %@ 978-1-4503-7024-0
[86]
G. H. Torbati, A. Yates, and G. Weikum, “Personalized Entity Search by Sparse and Scrutable User Profiles,” in CHIIR ’20, Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval, Vancouver, BC, Canada, 2020.
Export
BibTeX
@inproceedings{CHIIR2020Torbati, TITLE = {Personalized Entity Search by Sparse and Scrutable User Profiles}, AUTHOR = {Torbati, Ghazaleh Haratinezhad and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450368926}, DOI = {10.1145/3343413.3378011}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {CHIIR '20, Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval}, EDITOR = {O'Brain, Heather and Freund, Luanne}, PAGES = {427--431}, ADDRESS = {Vancouver, BC, Canada}, }
Endnote
%0 Conference Proceedings %A Torbati, Ghazaleh Haratinezhad %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Personalized Entity Search by Sparse and Scrutable User Profiles : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EAD7-F %R 10.1145/3343413.3378011 %D 2020 %B Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval %Z date of event: 2020-03-14 - 2020-03-18 %C Vancouver, BC, Canada %B CHIIR '20 %E O'Brain, Heather; Freund, Luanne %P 427 - 431 %I ACM %@ 9781450368926
[87]
T.-K. Tran, M. H. Gad-Elrab, D. Stepanova, E. Kharlamov, and J. Strötgen, “Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs,” in Companion of The World Wide Web Conference (WWW 2020), Taipei, Taiwan, 2020.
Export
BibTeX
@inproceedings{DBLP:conf/www/TranG0KS20, TITLE = {Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs}, AUTHOR = {Tran, Trung-Kien and Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Kharlamov, Evgeny and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-1-4503-7024-0}, DOI = {10.1145/3366423.3380014}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2020)}, EDITOR = {El Fallah, Amal and Sukthankar, Gita and Liu, Tie-Yan and van Steen, Maarten}, PAGES = {2613--2619}, ADDRESS = {Taipei, Taiwan}, }
Endnote
%0 Conference Proceedings %A Tran, Trung-Kien %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Kharlamov, Evgeny %A Strötgen, Jannik %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Fast Computation of Explanations for Inconsistency in Large-Scale Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F861-4 %R 10.1145/3366423.3380014 %D 2020 %B The World Wide Web Conference %Z date of event: 2020-04-20 - 2020-04-24 %C Taipei, Taiwan %B Companion of The World Wide Web Conference %E El Fallah, Amal; Sukthankar, Gita; Liu, Tie-Yan; van Steen, Maarten %P 2613 - 2619 %I ACM %@ 978-1-4503-7024-0
[88]
L. Wang, X. Shen, G. de Melo, and G. Weikum, “Cross-Domain Learning for Classifying Propaganda in Online Contents,” in Proceedings of the 2020 Truth and Trust Online Conference (TTO 2020), Virtual, 2020.
Export
BibTeX
@inproceedings{Wang_TTO2020, TITLE = {Cross-Domain Learning for Classifying Propaganda in Online Contents}, AUTHOR = {Wang, Liqiang and Shen, Xiaoyu and de Melo, Gerard and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-7359904-0-8}, URL = {https://truthandtrustonline.com/wp-content/uploads/2020/10/TTO03.pdf}, PUBLISHER = {Hacks Hackers}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2020 Truth and Trust Online Conference (TTO 2020)}, EDITOR = {De Cristofaro, Emiliano and Nakov, Preslav}, PAGES = {21--31}, ADDRESS = {Virtual}, }
Endnote
%0 Conference Proceedings %A Wang, Liqiang %A Shen, Xiaoyu %A de Melo, Gerard %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cross-Domain Learning for Classifying Propaganda in Online Contents : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F169-3 %U https://truthandtrustonline.com/wp-content/uploads/2020/10/TTO03.pdf %D 2020 %B Truth and Trust Online Conference %Z date of event: 2020-10-16 - 2020-10-17 %C Virtual %B Proceedings of the 2020 Truth and Trust Online Conference %E De Cristofaro, Emiliano; Nakov, Preslav %P 21 - 31 %I Hacks Hackers %@ 978-1-7359904-0-8 %U https://truthandtrustonline.com/wp-content/uploads/2020/10/TTO03.pdf
[89]
L. Wang, X. Shen, G. de Melo, and G. Weikum, “Cross-Domain Learning for Classifying Propaganda in Online Contents,” 2020. [Online]. Available: https://arxiv.org/abs/2011.06844. (arXiv: 2011.06844)
Abstract
As news and social media exhibit an increasing amount of manipulative polarized content, detecting such propaganda has received attention as a new task for content analysis. Prior work has focused on supervised learning with training data from the same domain. However, as propaganda can be subtle and keeps evolving, manual identification and proper labeling are very demanding. As a consequence, training data is a major bottleneck. In this paper, we tackle this bottleneck and present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic. We devise informative features and build various classifiers for propaganda labeling, using cross-domain learning. Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step. We further analyze the influence of various features, and characterize salient indicators of propaganda.
Export
BibTeX
@online{Wang_2011.06844, TITLE = {Cross-Domain Learning for Classifying Propaganda in Online Contents}, AUTHOR = {Wang, Liqiang and Shen, Xiaoyu and de Melo, Gerard and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2011.06844}, EPRINT = {2011.06844}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {As news and social media exhibit an increasing amount of manipulative polarized content, detecting such propaganda has received attention as a new task for content analysis. Prior work has focused on supervised learning with training data from the same domain. However, as propaganda can be subtle and keeps evolving, manual identification and proper labeling are very demanding. As a consequence, training data is a major bottleneck. In this paper, we tackle this bottleneck and present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic. We devise informative features and build various classifiers for propaganda labeling, using cross-domain learning. Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step. We further analyze the influence of various features, and characterize salient indicators of propaganda.}, }
Endnote
%0 Report %A Wang, Liqiang %A Shen, Xiaoyu %A de Melo, Gerard %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cross-Domain Learning for Classifying Propaganda in Online Contents : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FEBF-5 %U https://arxiv.org/abs/2011.06844 %D 2020 %X As news and social media exhibit an increasing amount of manipulative polarized content, detecting such propaganda has received attention as a new task for content analysis. Prior work has focused on supervised learning with training data from the same domain. However, as propaganda can be subtle and keeps evolving, manual identification and proper labeling are very demanding. As a consequence, training data is a major bottleneck. In this paper, we tackle this bottleneck and present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic. We devise informative features and build various classifiers for propaganda labeling, using cross-domain learning. Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step. We further analyze the influence of various features, and characterize salient indicators of propaganda. %K Computer Science, Computation and Language, cs.CL
[90]
G. Weikum, “Entities with Quantities,” Bulletin of the Technical Committee on Data Engineering, vol. 43, no. 1, 2020.
Export
BibTeX
@article{Weikum_Entities2020, TITLE = {Entities with Quantities}, AUTHOR = {Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://sites.computer.org/debull/A20mar/p4.pdf}, PUBLISHER = {IEEE Computer Society}, ADDRESS = {Los Alamitos, CA}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, JOURNAL = {Bulletin of the Technical Committee on Data Engineering}, VOLUME = {43}, NUMBER = {1}, PAGES = {4--8}, }
Endnote
%0 Journal Article %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Entities with Quantities : %G eng %U http://hdl.handle.net/21.11116/0000-0007-EBBB-E %U http://sites.computer.org/debull/A20mar/p4.pdf %7 2020 %D 2020 %J Bulletin of the Technical Committee on Data Engineering %V 43 %N 1 %& 4 %P 4 - 8 %I IEEE Computer Society %C Los Alamitos, CA
[91]
G. Weikum, L. Dong, S. Razniewski, and F. Suchanek, “Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases,” 2020. [Online]. Available: https://arxiv.org/abs/2009.11564. (arXiv: 2009.11564)
Abstract
Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.
Export
BibTeX
@online{Weikum_2009.11564, TITLE = {Machine Knowledge: {C}reation and Curation of Comprehensive Knowledge Bases}, AUTHOR = {Weikum, Gerhard and Dong, Luna and Razniewski, Simon and Suchanek, Fabian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2009.11564}, EPRINT = {2009.11564}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.}, }
Endnote
%0 Report %A Weikum, Gerhard %A Dong, Luna %A Razniewski, Simon %A Suchanek, Fabian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0007-F1A6-D %U https://arxiv.org/abs/2009.11564 %D 2020 %X Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB,Computer Science, General Literature, cs.GL
[92]
B. Xu, S. Qiu, J. Zhang, Y. Wang, X. Shen, and G. de Melo, “Data Augmentation for Multiclass Utterance Classification - A Systematic Study,” in The 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain (Online), 2020.
Export
BibTeX
@inproceedings{xu2020data, TITLE = {Data Augmentation for Multiclass Utterance Classification -- A Systematic Study}, AUTHOR = {Xu, Binxia and Qiu, Siyuan and Zhang, Jie and Wang, Yafang and Shen, Xiaoyu and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-952148-27-9}, URL = {https://www.aclweb.org/anthology/2020.coling-main.479}, DOI = {10.18653/v1/2020.coling-main.479}, PUBLISHER = {ACL}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 28th International Conference on Computational Linguistics (COLING 2020)}, EDITOR = {Scott, Donia and Bel, Nuria and Zong, Chengqing}, PAGES = {5494--5506}, ADDRESS = {Barcelona, Spain (Online)}, }
Endnote
%0 Conference Proceedings %A Xu, Binxia %A Qiu, Siyuan %A Zhang, Jie %A Wang, Yafang %A Shen, Xiaoyu %A de Melo, Gerard %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Data Augmentation for Multiclass Utterance Classification - A Systematic Study : %G eng %U http://hdl.handle.net/21.11116/0000-0008-1498-6 %U https://www.aclweb.org/anthology/2020.coling-main.479 %R 10.18653/v1/2020.coling-main.479 %D 2020 %B The 28th International Conferenceon Computational Linguistics %Z date of event: 2020-12-08 - 2020-12-13 %C Barcelona, Spain (Online) %B The 28th International Conference on Computational Linguistics %E Scott, Donia; Bel, Nuria; Zong, Chengqing %P 5494 - 5506 %I ACL %@ 978-1-952148-27-9
[93]
A. Yates, S. Arora, X. Zhang, W. Yang, K. M. Jose, and J. Lin, “Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval,” in WSDM ’20, 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 2020.
Export
BibTeX
@inproceedings{YatesWSDM2020, TITLE = {Capreolus: {A} Toolkit for End-to-End Neural Ad Hoc Retrieval}, AUTHOR = {Yates, Andrew and Arora, Siddhant and Zhang, Xinyu and Yang, Wei and Jose, Kevin Martin and Lin, Jimmy}, LANGUAGE = {eng}, ISBN = {9781450368223}, DOI = {10.1145/3336191.3371868}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {WSDM '20, 13th International Conference on Web Search and Data Mining}, EDITOR = {Caverlee, James and Hu, Xia Ben}, PAGES = {861--864}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Yates, Andrew %A Arora, Siddhant %A Zhang, Xinyu %A Yang, Wei %A Jose, Kevin Martin %A Lin, Jimmy %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0006-A28E-3 %R 10.1145/3336191.3371868 %D 2020 %B 13th International Conference on Web Search and Data Mining %Z date of event: 2020-02-03 - 2020-02-07 %C Houston, TX, USA %B WSDM '20 %E Caverlee, James; Hu, Xia Ben %P 861 - 864 %I ACM %@ 9781450368223
[94]
A. Yates, K. M. Jose, X. Zhang, and J. Lin, “Flexible IR Pipelines with Capreolus,” in CIKM ’20, 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 2020.
Abstract
Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.
Export
BibTeX
@inproceedings{Yates_CIKM2020, TITLE = {Flexible {IR} Pipelines with {Capreolus}}, AUTHOR = {Yates, Andrew and Jose, Kevin Martin and Zhang, Xinyu and Lin, Jimmy}, LANGUAGE = {eng}, ISBN = {978-1-4503-6859-9}, DOI = {10.1145/3340531.3412780}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, DATE = {2020}, ABSTRACT = {Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off.}, BOOKTITLE = {CIKM '20, 29th ACM International Conference on Information \& Knowledge Management}, EDITOR = {d{\textquoteright}Aquin, Mathieu and Dietze, Stefan}, PAGES = {3181--3188}, ADDRESS = {Virtual Event, Ireland}, }
Endnote
%0 Conference Proceedings %A Yates, Andrew %A Jose, Kevin Martin %A Zhang, Xinyu %A Lin, Jimmy %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Flexible IR Pipelines with Capreolus : %G eng %U http://hdl.handle.net/21.11116/0000-0008-066A-B %R 10.1145/3340531.3412780 %D 2020 %B 29th ACM International Conference on Information & Knowledge Management %Z date of event: 2020-10-19 - 2020-10-23 %C Virtual Event, Ireland %X Structured world knowledge is at the foundation of knowledge-centric AI applications. Despite considerable research on knowledge base construction, beyond mere statement counts, little is known about the progress of KBs, in particular concerning their coverage, and one may wonder whether there is constant progress, or diminishing returns. In this paper we employ question answering and entity summarization as extrinsic use cases for a longitudinal study of the progress of KB coverage. Our analysis shows a near-continuous improvement of two popular KBs, DBpedia and Wikidata, over the last 19 years, with little signs of flattening out or leveling off. %B CIKM '20 %E d’Aquin, Mathieu; Dietze, Stefan %P 3181 - 3188 %I ACM %@ 978-1-4503-6859-9
[95]
Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates, “BERT-QE: Contextualized Query Expansion for Document Re-ranking,” in Findings of the ACL: EMNLP 2020, Online, 2020.
Export
BibTeX
@inproceedings{Zheng_EMNLP20, TITLE = {{BERT-QE}: {C}ontextualized Query Expansion for Document Re-ranking}, AUTHOR = {Zheng, Zhi and Hui, Kai and He, Ben and Han, Xianpei and Sun, Le and Yates, Andrew}, LANGUAGE = {eng}, ISBN = {978-1-952148-90-3}, URL = {https://www.aclweb.org/anthology/2020.findings-emnlp.424}, DOI = {10.18653/v1/2020.findings-emnlp.424}, PUBLISHER = {ACM}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Findings of the ACL: EMNLP 2020}, EDITOR = {Cohn, Trevor and He, Yulan and Liu, Yang}, PAGES = {4718--4728}, SERIES = {Findings of the Association for Computational Linguistics}, VOLUME = {1}, ADDRESS = {Online}, }
Endnote
%0 Conference Proceedings %A Zheng, Zhi %A Hui, Kai %A He, Ben %A Han, Xianpei %A Sun, Le %A Yates, Andrew %+ External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T BERT-QE: Contextualized Query Expansion for Document Re-ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0687-9 %U https://www.aclweb.org/anthology/2020.findings-emnlp.424 %R 10.18653/v1/2020.findings-emnlp.424 %D 2020 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2020-11-16 - 2020-11-20 %C Online %B Findings of the ACL: EMNLP 2020 %E Cohn, Trevor; He, Yulan; Liu, Yang %P 4718 - 4728 %I ACM %@ 978-1-952148-90-3 %B Findings of the Association for Computational Linguistics %N 1 %U https://www.aclweb.org/anthology/2020.findings-emnlp.424.pdf
[96]
Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates, “BERT-QE: Contextualized Query Expansion for Document Re-ranking,” 2020. [Online]. Available: https://arxiv.org/abs/2009.07258. (arXiv: 2009.07258)
Abstract
Query expansion aims to mitigate the mismatch between the language used in a query and in a document. However, query expansion methods can suffer from introducing non-relevant information when expanding the query. To bridge this gap, inspired by recent advances in applying contextualized models like BERT to the document retrieval task, this paper proposes a novel query expansion model that leverages the strength of the BERT model to select relevant document chunks for expansion. In evaluation on the standard TREC Robust04 and GOV2 test collections, the proposed BERT-QE model significantly outperforms BERT-Large models.
Export
BibTeX
@online{Zheng2009.07258, TITLE = {{BERT}-{QE}: Contextualized Query Expansion for Document Re-ranking}, AUTHOR = {Zheng, Zhi and Hui, Kai and He, Ben and Han, Xianpei and Sun, Le and Yates, Andrew}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/2009.07258}, EPRINT = {2009.07258}, EPRINTTYPE = {arXiv}, YEAR = {2020}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Query expansion aims to mitigate the mismatch between the language used in a query and in a document. However, query expansion methods can suffer from introducing non-relevant information when expanding the query. To bridge this gap, inspired by recent advances in applying contextualized models like BERT to the document retrieval task, this paper proposes a novel query expansion model that leverages the strength of the BERT model to select relevant document chunks for expansion. In evaluation on the standard TREC Robust04 and GOV2 test collections, the proposed BERT-QE model significantly outperforms BERT-Large models.}, }
Endnote
%0 Report %A Zheng, Zhi %A Hui, Kai %A He, Ben %A Han, Xianpei %A Sun, Le %A Yates, Andrew %+ External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T BERT-QE: Contextualized Query Expansion for Document Re-ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0008-06D5-1 %U https://arxiv.org/abs/2009.07258 %D 2020 %X Query expansion aims to mitigate the mismatch between the language used in a query and in a document. However, query expansion methods can suffer from introducing non-relevant information when expanding the query. To bridge this gap, inspired by recent advances in applying contextualized models like BERT to the document retrieval task, this paper proposes a novel query expansion model that leverages the strength of the BERT model to select relevant document chunks for expansion. In evaluation on the standard TREC Robust04 and GOV2 test collections, the proposed BERT-QE model significantly outperforms BERT-Large models. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
2019
[97]
M. Abouhamra, “AligNarr: Aligning Narratives of Different Length for Movie Summarization,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
Export
BibTeX
@mastersthesis{AbouhamraMSc2019, TITLE = {{AligNarr}: Aligning Narratives of Different Length for Movie Summarization}, AUTHOR = {Abouhamra, Mostafa}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.}, }
Endnote
%0 Thesis %A Abouhamra, Mostafa %Y Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T AligNarr: Aligning Narratives of Different Length for Movie Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0004-5836-D %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 54 p. %V master %9 master %X Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
[98]
A. Abujabal, “Question Answering over Knowledge Bases with Continuous Learning,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.
Export
BibTeX
@phdthesis{Abujabalphd2013, TITLE = {Question Answering over Knowledge Bases with Continuous Learning}, AUTHOR = {Abujabal, Abdalghani}, LANGUAGE = {eng}, DOI = {10.22028/D291-27968}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.}, }
Endnote
%0 Thesis %A Abujabal, Abdalghani %Y Weikum, Gerhard %A referee: Linn, Jimmy %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Question Answering over Knowledge Bases with Continuous Learning : %G eng %U http://hdl.handle.net/21.11116/0000-0003-AEC0-0 %R 10.22028/D291-27968 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 141 p. %V phd %9 phd %X Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27438
[99]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,” in The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), Minneapolis, MN, USA, 2019.
Export
BibTeX
@inproceedings{abujabal19comqa, TITLE = {{ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-950737-13-0}, URL = {https://www.aclweb.org/anthology/N19-1027}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019)}, EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar}, PAGES = {307--317}, ADDRESS = {Minneapolis, MN, USA}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters : %G eng %U http://hdl.handle.net/21.11116/0000-0003-11A7-D %U https://www.aclweb.org/anthology/N19-1027 %D 2019 %B Annual Conference of the North American Chapter of the Association for Computational Linguistics %Z date of event: 2019-06-02 - 2019-06-07 %C Minneapolis, MN, USA %B The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies %E Burstein, Jill; Doran, Christy; Solorio, Thamar %P 307 - 317 %I ACL %@ 978-1-950737-13-0 %U https://www.aclweb.org/anthology/N19-1027
[100]
M. Alikhani, S. Nag Chowdhury, G. de Melo, and M. Stone, “CITE: A Corpus Of Text-Image Discourse Relations,” in The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, MN, USA, 2019.
Export
BibTeX
@inproceedings{AlikhaniEtAl2019CITETextImageDiscourse, TITLE = {{CITE}: {A} Corpus Of Text-Image Discourse Relations}, AUTHOR = {Alikhani, Malihe and Nag Chowdhury, Sreyasi and de Melo, Gerard and Stone, Matthew}, LANGUAGE = {eng}, ISBN = {978-1-950737-13-0}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019)}, EDITOR = {Burstein, Jill and Doran, Christy and Solorio, Thamar}, PAGES = {570--575}, ADDRESS = {Minneapolis, MN, USA}, }
Endnote
%0 Conference Proceedings %A Alikhani, Malihe %A Nag Chowdhury, Sreyasi %A de Melo, Gerard %A Stone, Matthew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CITE: A Corpus Of Text-Image Discourse Relations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-78D8-3 %D 2019 %B Annual Conference of the North American Chapter of the Association for Computational Linguistics %Z date of event: 2019-06-02 - 2019-06-07 %C Minneapolis, MN, USA %B The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies %E Burstein, Jill; Doran, Christy; Solorio, Thamar %P 570 - 575 %I ACL %@ 978-1-950737-13-0 %U https://aclweb.org/anthology/papers/N/N19/N19-1056/
[101]
S. Arora and A. Yates, “Investigating Retrieval Method Selection with Axiomatic Features,” in Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (AMIR 2019), Cologne, Germany, 2019.
Export
BibTeX
@inproceedings{Arora_AMIR2019, TITLE = {Investigating Retrieval Method Selection with Axiomatic Features}, AUTHOR = {Arora, Siddhant and Yates, Andrew}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-2360-3}, PUBLISHER = {CEUR-WS}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval (AMIR 2019)}, EDITOR = {Beel, Joeran and Kolthoff, Lars}, PAGES = {18--31}, EID = {4}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2360}, ADDRESS = {Cologne, Germany}, }
Endnote
%0 Conference Proceedings %A Arora, Siddhant %A Yates, Andrew %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Investigating Retrieval Method Selection with Axiomatic Features : %G eng %U http://hdl.handle.net/21.11116/0000-0004-028E-A %D 2019 %B The 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval %Z date of event: 2019-04-14 - 2019-04-14 %C Cologne, Germany %B Proceedings of the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval co-located with the 41st European Conference on Information Retrieval %E Beel, Joeran; Kolthoff, Lars %P 18 - 31 %Z sequence number: 4 %I CEUR-WS %B CEUR Workshop Proceedings %N 2360 %@ false %U http://ceur-ws.org/Vol-2360/paper4Axiomatic.pdf
[102]
S. Arora and A. Yates, “Investigating Retrieval Method Selection with Axiomatic Features,” 2019. [Online]. Available: http://arxiv.org/abs/1904.05737. (arXiv: 1904.05737)
Abstract
We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior.
Export
BibTeX
@online{Arora_arXiv1904.05737, TITLE = {Investigating Retrieval Method Selection with Axiomatic Features}, AUTHOR = {Arora, Siddhant and Yates, Andrew}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.05737}, EPRINT = {1904.05737}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior.}, }
Endnote
%0 Report %A Arora, Siddhant %A Yates, Andrew %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Investigating Retrieval Method Selection with Axiomatic Features : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02BF-3 %U http://arxiv.org/abs/1904.05737 %D 2019 %X We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior. %K Computer Science, Information Retrieval, cs.IR
[103]
J. A. Biega, “Enhancing Privacy and Fairness in Search Systems,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.
Export
BibTeX
@phdthesis{biegaphd2019, TITLE = {Enhancing Privacy and Fairness in Search Systems}, AUTHOR = {Biega, Joanna Asia}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-278861}, DOI = {10.22028/D291-27886}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.}, }
Endnote
%0 Thesis %A Biega, Joanna Asia %Y Weikum, Gerhard %A referee: Gummadi, Krishna %A referee: Nejdl, Wolfgang %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society External Organizations %T Enhancing Privacy and Fairness in Search Systems : %G eng %U http://hdl.handle.net/21.11116/0000-0003-9AED-5 %R 10.22028/D291-27886 %U urn:nbn:de:bsz:291--ds-278861 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 111 p. %V phd %9 phd %X Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27389
[104]
A. Chakraborty, N. Mota, A. J. Biega, K. P. Gummadi, and H. Heidari, “On the Impact of Choice Architectures on Inequality in Online Donation Platforms,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Chakraborty_WWW2019b, TITLE = {On the Impact of Choice Architectures on Inequality in Online Donation Platforms}, AUTHOR = {Chakraborty, Abhijnan and Mota, Nuno and Biega, Asia J. and Gummadi, Krishna P. and Heidari, Hoda}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313663}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {2623--2629}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Chakraborty, Abhijnan %A Mota, Nuno %A Biega, Asia J. %A Gummadi, Krishna P. %A Heidari, Hoda %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T On the Impact of Choice Architectures on Inequality in Online Donation Platforms : %G eng %U http://hdl.handle.net/21.11116/0000-0002-FC88-9 %R 10.1145/3308558.3313663 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 2623 - 2629 %I ACM %@ 978-1-4503-6674-8
[105]
F. Chierichetti, R. Kumar, A. Panconesi, and E. Terolli, “On the Distortion of Locality Sensitive Hashing,” SIAM Journal on Computing, vol. 48, no. 2, 2019.
Export
BibTeX
@article{Chierichetti2019, TITLE = {On the Distortion of Locality Sensitive Hashing}, AUTHOR = {Chierichetti, Flavio and Kumar, Ravi and Panconesi, Alessandro and Terolli, Erisa}, LANGUAGE = {eng}, ISSN = {0097-5397}, DOI = {10.1137/17M1127752}, PUBLISHER = {SIAM}, ADDRESS = {Philadelphia, PA}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {SIAM Journal on Computing}, VOLUME = {48}, NUMBER = {2}, PAGES = {350--372}, }
Endnote
%0 Journal Article %A Chierichetti, Flavio %A Kumar, Ravi %A Panconesi, Alessandro %A Terolli, Erisa %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T On the Distortion of Locality Sensitive Hashing : %G eng %U http://hdl.handle.net/21.11116/0000-0003-A7E7-C %R 10.1137/17M1127752 %7 2019 %D 2019 %J SIAM Journal on Computing %V 48 %N 2 %& 350 %P 350 - 372 %I SIAM %C Philadelphia, PA %@ false
[106]
P. Christmann, R. Saha Roy, A. Abujabal, J. Singh, and G. Weikum, “Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion,” 2019. [Online]. Available: http://arxiv.org/abs/1910.03262. (arXiv: 1910.03262)
Abstract
Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user's inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies.
Export
BibTeX
@online{Christmann_arXiv1910.03262, TITLE = {Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion}, AUTHOR = {Christmann, Phlipp and Saha Roy, Rishiraj and Abujabal, Abdalghani and Singh, Jyotsna and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1910.03262}, EPRINT = {1910.03262}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user's inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies.}, }
Endnote
%0 Report %A Christmann, Phlipp %A Saha Roy, Rishiraj %A Abujabal, Abdalghani %A Singh, Jyotsna %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83DC-F %U http://arxiv.org/abs/1910.03262 %D 2019 %X Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user's inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[107]
P. Christmann, R. Saha Roy, A. Abujabal, J. Singh, and G. Weikum, “Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion,” in CIKM ’19, 28th ACM International Conference on Information and Knowledge Management, Beijing China, 2019.
Export
BibTeX
@inproceedings{Christmann_CIKM2019, TITLE = {Look before you Hop: {C}onversational Question Answering over Knowledge Graphs Using Judicious Context Expansion}, AUTHOR = {Christmann, Phlipp and Saha Roy, Rishiraj and Abujabal, Abdalghani and Singh, Jyotsna and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450369763}, DOI = {10.1145/3357384.3358016}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {CIKM '19, 28th ACM International Conference on Information and Knowledge Management}, EDITOR = {Zhu, Wenwu and Tao, Dacheng}, PAGES = {729--738}, ADDRESS = {Beijing China}, }
Endnote
%0 Conference Proceedings %A Christmann, Phlipp %A Saha Roy, Rishiraj %A Abujabal, Abdalghani %A Singh, Jyotsna %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8231-0 %R 10.1145/3357384.3358016 %D 2019 %B 28th ACM International Conference on Information and Knowledge Management %Z date of event: 2019-11-03 - 2019-11-07 %C Beijing China %B CIKM '19 %E Zhu, Wenwu; Tao, Dacheng %P 729 - 738 %I ACM %@ 9781450369763
[108]
C. X. Chu, S. Razniewski, and G. Weikum, “TiFi: Taxonomy Induction for Fictional Domains [Extended version],” 2019. [Online]. Available: http://arxiv.org/abs/1901.10263. (arXiv: 1901.10263)
Abstract
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.
Export
BibTeX
@online{Chu_arXIv1901.10263, TITLE = {{TiFi}: Taxonomy Induction for Fictional Domains [Extended version]}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1901.10263}, EPRINT = {1901.10263}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.}, }
Endnote
%0 Report %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TiFi: Taxonomy Induction for Fictional Domains [Extended version] : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FE67-C %U http://arxiv.org/abs/1901.10263 %D 2019 %X Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Retrieval, cs.IR
[109]
C. X. Chu, S. Razniewski, and G. Weikum, “TiFi: Taxonomy Induction for Fictional Domains,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Chu_WWW2019, TITLE = {{TiFi}: {T}axonomy Induction for Fictional Domains}, AUTHOR = {Chu, Cuong Xuan and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313519}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {2673--2679}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Chu, Cuong Xuan %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TiFi: Taxonomy Induction for Fictional Domains : %G eng %U http://hdl.handle.net/21.11116/0000-0003-6558-9 %R 10.1145/3308558.3313519 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 2673 - 2679 %I ACM %@ 978-1-4503-6674-8
[110]
S. A. Cotop, “How to be Grim,” Universität des Saarlandes, Saarbrücken, 2019.
Export
BibTeX
@mastersthesis{cotop:19:grim, TITLE = {How to be Grim}, AUTHOR = {Cotop, Simina Ana}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, }
Endnote
%0 Thesis %A Cotop, Simina Ana %Y Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T How to be Grim : Explaining Data at Different Granularity Levels %G eng %U http://hdl.handle.net/21.11116/0000-0007-FF05-5 %I Universität des Saarlandes %C Saarbrücken %D 2019 %V master %9 master
[111]
J. Cueppers, “How to Make Cake: Finding Causal Patterns for Marked Events in Sequences,” Universität des Saarlandes, Saarbrücken, 2019.
Export
BibTeX
@mastersthesis{cuepper:19:cake, TITLE = {How to Make Cake: Finding Causal Patterns for Marked Events in Sequences}, AUTHOR = {Cueppers, Joscha}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, }
Endnote
%0 Thesis %A Cueppers, Joscha %Y Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T How to Make Cake: Finding Causal Patterns for Marked Events in Sequences : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FF09-1 %I Universität des Saarlandes %C Saarbrücken %D 2019 %V master %9 master
[112]
I. Dikeoulias, J. Strötgen, and S. Razniewski, “Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties,” in Companion of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Dikeoulias_WWW2019, TITLE = {Epitaph or Breaking News? {A}nalyzing and Predicting the Stability of Knowledge Base Properties}, AUTHOR = {Dikeoulias, Ioannis and Str{\"o}tgen, Jannik and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-1-4503-6675-5}, DOI = {10.1145/3308560.3314998}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Companion of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {1155--1158}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Dikeoulias, Ioannis %A Strötgen, Jannik %A Razniewski, Simon %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Epitaph or Breaking News? Analyzing and Predicting the Stability of Knowledge Base Properties : %G eng %U http://hdl.handle.net/21.11116/0000-0004-0281-7 %R 10.1145/3308560.3314998 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Companion of The World Wide Web Conference %E McAuley, Julian %P 1155 - 1158 %I ACM %@ 978-1-4503-6675-5
[113]
P. Ernst, E. Terolli, and G. Weikum, “LongLife: a Platform for Personalized Search for Health and Life Sciences,” in Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019 Satellites), Auckland, New Zealand, 2019.
Export
BibTeX
@inproceedings{Ernst_ISWC2019, TITLE = {{LongLife}: a Platform for Personalized Search for Health and Life Sciences}, AUTHOR = {Ernst, Patrick and Terolli, Erisa and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {http://ceur-ws.org/Vol-2456/paper62.pdf; urn:nbn:de:0074-2456-4}, PUBLISHER = {ceur-ws.org}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the ISWC 2019 Satellite Tracks (Posters \& Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019 Satellites)}, EDITOR = {Su{\'a}rez-Figueroa, Mari Carmen and Cheng, Gong and Gentile, Anna Lisa and Gu{\'e}ret, Christophe and Keet, Maria and Bernstein, Abraham}, PAGES = {237--240}, EID = {62}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2456}, ADDRESS = {Auckland, New Zealand}, }
Endnote
%0 Conference Proceedings %A Ernst, Patrick %A Terolli, Erisa %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T LongLife: a Platform for Personalized Search for Health and Life Sciences : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83A6-B %U http://ceur-ws.org/Vol-2456/paper62.pdf %D 2019 %B 18th Semantic Web Conference %Z date of event: 2019-10-26 - 2019-10-30 %C Auckland, New Zealand %B Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference %E Suárez-Figueroa, Mari Carmen; Cheng, Gong; Gentile, Anna Lisa; Guéret, Christophe; Keet, Maria; Bernstein, Abraham %P 237 - 240 %Z sequence number: 62 %I ceur-ws.org %B CEUR Workshop Proceedings %N 2456 %@ false
[114]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “Tracy: Tracing Facts over Knowledge Graphs and Text,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Gad-Elrab_WWW2019, TITLE = {Tracy: {T}racing Facts over Knowledge Graphs and Text}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3314126}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {3516--3520}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Tracy: Tracing Facts over Knowledge Graphs and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-08AA-5 %R 10.1145/3308558.3314126 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 3516 - 3520 %I ACM %@ 978-1-4503-6674-8
[115]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text ,” in WSDM ’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Gad-Elrab_WSDM2019, TITLE = {{ExFaKT}: {A} Framework for Explaining Facts over Knowledge Graphs and Text}, AUTHOR = {Gad-Elrab, Mohamed Hassan and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290996}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM '19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {87--95}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed Hassan %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text  : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9C44-2 %R 10.1145/3289600.3290996 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM '19 %P 87 - 95 %I ACM %@ 978-1-4503-5940-5
[116]
A. Ghazimatin, O. Balalau, R. Saha Roy, and G. Weikum, “PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems,” 2019. [Online]. Available: http://arxiv.org/abs/1911.08378. (arXiv: 1911.08378)
Abstract
Interpretable explanations for recommender systems and other machine learning models are crucial to gain user trust. Prior works that have focused on paths connecting users and items in a heterogeneous network have several limitations, such as discovering relationships rather than true explanations, or disregarding other users' privacy. In this work, we take a fresh perspective, and present PRINCE: a provider-side mechanism to produce tangible explanations for end-users, where an explanation is defined to be a set of minimal actions performed by the user that, if removed, changes the recommendation to a different item. Given a recommendation, PRINCE uses a polynomial-time optimal algorithm for finding this minimal set of a user's actions from an exponential search space, based on random walks over dynamic graphs. Experiments on two real-world datasets show that PRINCE provides more compact explanations than intuitive baselines, and insights from a crowdsourced user-study demonstrate the viability of such action-based explanations. We thus posit that PRINCE produces scrutable, actionable, and concise explanations, owing to its use of counterfactual evidence, a user's own actions, and minimal sets, respectively.
Export
BibTeX
@online{Ghazimatin_arXiv1911.08378, TITLE = {{PRINCE}: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems}, AUTHOR = {Ghazimatin, Azin and Balalau, Oana and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1911.08378}, EPRINT = {1911.08378}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Interpretable explanations for recommender systems and other machine learning models are crucial to gain user trust. Prior works that have focused on paths connecting users and items in a heterogeneous network have several limitations, such as discovering relationships rather than true explanations, or disregarding other users' privacy. In this work, we take a fresh perspective, and present PRINCE: a provider-side mechanism to produce tangible explanations for end-users, where an explanation is defined to be a set of minimal actions performed by the user that, if removed, changes the recommendation to a different item. Given a recommendation, PRINCE uses a polynomial-time optimal algorithm for finding this minimal set of a user's actions from an exponential search space, based on random walks over dynamic graphs. Experiments on two real-world datasets show that PRINCE provides more compact explanations than intuitive baselines, and insights from a crowdsourced user-study demonstrate the viability of such action-based explanations. We thus posit that PRINCE produces scrutable, actionable, and concise explanations, owing to its use of counterfactual evidence, a user's own actions, and minimal sets, respectively.}, }
Endnote
%0 Report %A Ghazimatin, Azin %A Balalau, Oana %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8415-E %U http://arxiv.org/abs/1911.08378 %D 2019 %X Interpretable explanations for recommender systems and other machine learning models are crucial to gain user trust. Prior works that have focused on paths connecting users and items in a heterogeneous network have several limitations, such as discovering relationships rather than true explanations, or disregarding other users' privacy. In this work, we take a fresh perspective, and present PRINCE: a provider-side mechanism to produce tangible explanations for end-users, where an explanation is defined to be a set of minimal actions performed by the user that, if removed, changes the recommendation to a different item. Given a recommendation, PRINCE uses a polynomial-time optimal algorithm for finding this minimal set of a user's actions from an exponential search space, based on random walks over dynamic graphs. Experiments on two real-world datasets show that PRINCE provides more compact explanations than intuitive baselines, and insights from a crowdsourced user-study demonstrate the viability of such action-based explanations. We thus posit that PRINCE produces scrutable, actionable, and concise explanations, owing to its use of counterfactual evidence, a user's own actions, and minimal sets, respectively. %K Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI,Statistics, Machine Learning, stat.ML
[117]
A. Ghazimatin, R. Saha Roy, and G. Weikum, “FAIRY: A Framework for Understanding Relationships between Users’ Actions and their Social Feeds,” 2019. [Online]. Available: http://arxiv.org/abs/1908.03109. (arXiv: 1908.03109)
Abstract
Users increasingly rely on social media feeds for consuming daily information. The items in a feed, such as news, questions, songs, etc., usually result from the complex interplay of a user's social contacts, her interests and her actions on the platform. The relationship of the user's own behavior and the received feed is often puzzling, and many users would like to have a clear explanation on why certain items were shown to them. Transparency and explainability are key concerns in the modern world of cognitive overload, filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a framework that systematically discovers, ranks, and explains relationships between users' actions and items in their social media feeds. We model the user's local neighborhood on the platform as an interaction graph, a form of heterogeneous information network constructed solely from information that is easily accessible to the concerned user. We posit that paths in this interaction graph connecting the user and her feed items can act as pertinent explanations for the user. These paths are scored with a learning-to-rank model that captures relevance and surprisal. User studies on two social platforms demonstrate the practical viability and user benefits of the FAIRY method.
Export
BibTeX
@online{Ghazimatin_arXiv1908.03109, TITLE = {{FAIRY}: A Framework for Understanding Relationships between Users' Actions and their Social Feeds}, AUTHOR = {Ghazimatin, Azin and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1908.03109}, EPRINT = {1908.03109}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Users increasingly rely on social media feeds for consuming daily information. The items in a feed, such as news, questions, songs, etc., usually result from the complex interplay of a user's social contacts, her interests and her actions on the platform. The relationship of the user's own behavior and the received feed is often puzzling, and many users would like to have a clear explanation on why certain items were shown to them. Transparency and explainability are key concerns in the modern world of cognitive overload, filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a framework that systematically discovers, ranks, and explains relationships between users' actions and items in their social media feeds. We model the user's local neighborhood on the platform as an interaction graph, a form of heterogeneous information network constructed solely from information that is easily accessible to the concerned user. We posit that paths in this interaction graph connecting the user and her feed items can act as pertinent explanations for the user. These paths are scored with a learning-to-rank model that captures relevance and surprisal. User studies on two social platforms demonstrate the practical viability and user benefits of the FAIRY method.}, }
Endnote
%0 Report %A Ghazimatin, Azin %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T FAIRY: A Framework for Understanding Relationships between Users' Actions and their Social Feeds : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83B9-6 %U http://arxiv.org/abs/1908.03109 %D 2019 %X Users increasingly rely on social media feeds for consuming daily information. The items in a feed, such as news, questions, songs, etc., usually result from the complex interplay of a user's social contacts, her interests and her actions on the platform. The relationship of the user's own behavior and the received feed is often puzzling, and many users would like to have a clear explanation on why certain items were shown to them. Transparency and explainability are key concerns in the modern world of cognitive overload, filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a framework that systematically discovers, ranks, and explains relationships between users' actions and items in their social media feeds. We model the user's local neighborhood on the platform as an interaction graph, a form of heterogeneous information network constructed solely from information that is easily accessible to the concerned user. We posit that paths in this interaction graph connecting the user and her feed items can act as pertinent explanations for the user. These paths are scored with a learning-to-rank model that captures relevance and surprisal. User studies on two social platforms demonstrate the practical viability and user benefits of the FAIRY method. %K cs.SI,Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML,
[118]
A. Ghazimatin, R. Saha Roy, and G. Weikum, “FAIRY: A Framework for Understanding Relationships between Users’ Actions and their Social Feeds,” in WSDM ’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Ghazimatin_WSDM2019, TITLE = {{FAIRY}: {A} Framework for Understanding Relationships between Users' Actions and their Social Feeds}, AUTHOR = {Ghazimatin, Azin and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290990}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM '19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {240--248}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Ghazimatin, Azin %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T FAIRY: A Framework for Understanding Relationships between Users' Actions and their Social Feeds : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9BD9-B %R 10.1145/3289600.3290990 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM '19 %P 240 - 248 %I ACM %@ 978-1-4503-5940-5
[119]
A. Guimarães, O. Balalau, E. Terolli, and G. Weikum, “Analyzing the Traits and Anomalies of Political Discussions on Reddit,” in Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019), Munich, Germany, 2019.
Export
BibTeX
@inproceedings{Guimaraes_ICWSM2019, TITLE = {Analyzing the Traits and Anomalies of Political Discussions on {R}eddit}, AUTHOR = {Guimar{\~a}es, Anna and Balalau, Oana and Terolli, Erisa and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {2334-0770}, PUBLISHER = {AAAI}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Thirteenth International Conference on Web and Social Media (ICWSM 2019)}, PAGES = {205--213}, ADDRESS = {Munich, Germany}, }
Endnote
%0 Conference Proceedings %A Guimarães, Anna %A Balalau, Oana %A Terolli, Erisa %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Analyzing the Traits and Anomalies of Political Discussions on Reddit : %G eng %U http://hdl.handle.net/21.11116/0000-0003-3649-F %D 2019 %B 13th International Conference on Web and Social Media %Z date of event: 2019-06-11 - 2019-06-14 %C Munich, Germany %B Proceedings of the Thirteenth International Conference on Web and Social Media %P 205 - 213 %I AAAI %@ false
[120]
D. Gupta and K. Berberich, “Structured Search in Annotated Document Collections,” in WSDM ’19, 12h ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 2019.
Export
BibTeX
@inproceedings{Gupta_WSDM2019Demo, TITLE = {Structured Search in Annotated Document Collections}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-5940-5}, DOI = {10.1145/3289600.3290618}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {WSDM '19, 12h ACM International Conference on Web Search and Data Mining}, PAGES = {794--797}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Structured Search in Annotated Document Collections : Demo paper %G eng %U http://hdl.handle.net/21.11116/0000-0002-A8D6-F %R 10.1145/3289600.3290618 %D 2019 %B 12h ACM International Conference on Web Search and Data Mining %Z date of event: 2019-02-11 - 2019-02-15 %C Melbourne, Australia %B WSDM '19 %P 794 - 797 %I ACM %@ 978-1-4503-5940-5
[121]
D. Gupta, “Search and Analytics Using Semantic Annotations,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.
Export
BibTeX
@phdthesis{GUPTAphd2019, TITLE = {Search and Analytics Using Semantic Annotations}, AUTHOR = {Gupta, Dhruv}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-300780}, DOI = {10.22028/D291-30078}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.}, }
Endnote
%0 Thesis %A Gupta, Dhruv %Y Berberich, Klaus %A referee: Weikum, Gerhard %A referee: Bedathur, Srikanta %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Search and Analytics Using Semantic Annotations : %G eng %U http://hdl.handle.net/21.11116/0000-0005-7695-E %R 10.22028/D291-30078 %U urn:nbn:de:bsz:291--ds-300780 %F OTHER: hdl:20.500.11880/28516 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P xxviii, 211 p. %V phd %9 phd %X Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28516
[122]
D. Gupta, “Search and Analytics Using Semantic Annotations,” ACM SIGIR Forum, vol. 53, no. 2, 2019.
Abstract
Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.
Export
BibTeX
@article{Gupta_SIGIR19, TITLE = {Search and Analytics Using Semantic Annotations}, AUTHOR = {Gupta, Dhruv}, LANGUAGE = {eng}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections.}, JOURNAL = {ACM SIGIR Forum}, VOLUME = {53}, NUMBER = {2}, PAGES = {100--101}, }
Endnote
%0 Journal Article %A Gupta, Dhruv %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society %T Search and Analytics Using Semantic Annotations : Doctorial Abstract %G eng %U http://hdl.handle.net/21.11116/0000-0005-A1C2-9 %7 2019 %D 2019 %X Search systems help users locate relevant information in the form of text documents for keyword queries. Using text alone, it is often difficult to satisfy the user's information need. To discern the user's intent behind queries, we turn to semantic annotations (e.g., named entities and temporal expressions) that natural language processing tools can now deliver with great accuracy. This thesis develops methods and an infrastructure that leverage semantic annotations to efficiently and effectively search large document collections. This thesis makes contributions in three areas: indexing, querying, and mining of semantically annotated document collections. First, we describe an indexing infrastructure for semantically annotated document collections. The indexing infrastructure can support knowledge-centric tasks such as information extraction, relationship extraction, question answering, fact spotting and semantic search at scale across millions of documents. Second, we propose methods for exploring large document collections by suggesting semantic aspects for queries. These semantic aspects are generated by considering annotations in the form of temporal expressions, geographic locations, and other named entities. The generated aspects help guide the user to relevant documents without the need to read their contents. Third and finally, we present methods that can generate events, structured tables, and insightful visualizations from semantically annotated document collections. %J ACM SIGIR Forum %V 53 %N 2 %& 100 %P 100 - 101 %I ACM %C New York, NY %U http://sigir.org/wp-content/uploads/2019/december/p100.pdf
[123]
D. Gupta and K. Berberich, “JIGSAW: Structuring Text into Tables,” in ICTIR ’19, ACM SIGIR International Conference on Theory of Information Retrieval, Santa Clara, CA, USA, 2019.
Export
BibTeX
@inproceedings{Gupta_ICTIR2019, TITLE = {{JIGSAW}: {S}tructuring Text into Tables}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-6881-0}, DOI = {10.1145/3341981.3344228}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICTIR '19, ACM SIGIR International Conference on Theory of Information Retrieval}, EDITOR = {Fang, Yi and Zhang, Yi}, PAGES = {237--244}, ADDRESS = {Santa Clara, CA, USA}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T JIGSAW: Structuring Text into Tables : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8479-E %R 10.1145/3341981.3344228 %D 2019 %B ACM SIGIR International Conference on Theory of Information Retrieval %Z date of event: 2019-10-02 - 2019-10-05 %C Santa Clara, CA, USA %B ICTIR '19 %E Fang, Yi; Zhang, Yi %P 237 - 244 %I ACM %@ 978-1-4503-6881-0
[124]
D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti, “Generating Semantic Aspects for Queries,” in The Semantic Web (ESWC 2019), Portorož, Slovenia, 2019.
Export
BibTeX
@inproceedings{GuptaESWC2019, TITLE = {Generating Semantic Aspects for Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-3-030-21347-3}, DOI = {10.1007/978-3-030-21348-0_11}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {The Semantic Web (ESWC 2019)}, EDITOR = {Hitzler, Pascal and Fern{\'a}ndez, Miriam and Janowicz, Krzysztof and Zaveri, Amrapali and Gray, Alasdair J. G. and Lopez, Vanessa and Haller, Armin and Hammar, Karl}, PAGES = {162--178}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11503}, ADDRESS = {Portoro{\v z}, Slovenia}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %A Strötgen, Jannik %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Generating Semantic Aspects for Queries : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FF5F-5 %R 10.1007/978-3-030-21348-0_11 %D 2019 %B 16th Extended Semantic Web Conference %Z date of event: 2019-06-02 - 2019-06-06 %C Portorož, Slovenia %B The Semantic Web %E Hitzler, Pascal; Fernández, Miriam; Janowicz, Krzysztof; Zaveri, Amrapali; Gray, Alasdair J. G.; Lopez, Vanessa; Haller, Armin; Hammar, Karl %P 162 - 178 %I Springer %@ 978-3-030-21347-3 %B Lecture Notes in Computer Science %N 11503
[125]
D. Gupta and K. Berberich, “Efficient Retrieval of Knowledge Graph Fact Evidences,” in The Semantic Web: ESWC 2019 Satellite Events, Portorož, Slovenia, 2019.
Export
BibTeX
@inproceedings{GuptaESWC2019a, TITLE = {Efficient Retrieval of Knowledge Graph Fact Evidences}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-030-32326-4}, DOI = {10.1007/978-3-030-32327-1_18}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {The Semantic Web: ESWC 2019 Satellite Events}, EDITOR = {Hitzler, Pascal and Kirrane, Sabrina and Hartig, Olaf and de Boer, Victor and Vidal, Maria-Esther and Maleshova, Maria and Schlobach, Stefan and Hammar, Karl and Lasierra, Nelia and Stadtm{\"u}ller, Steffen and Hose, Katja and Verborgh, Ruben}, PAGES = {90--94}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11762}, ADDRESS = {Portoro{\v z}, Slovenia}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficient Retrieval of Knowledge Graph Fact Evidences : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8477-0 %R 10.1007/978-3-030-32327-1_18 %D 2019 %B 16th Extended Semantic Web Conference %Z date of event: 2019-06-02 - 2019-06-06 %C Portorož, Slovenia %B The Semantic Web: ESWC 2019 Satellite Events %E Hitzler, Pascal; Kirrane, Sabrina; Hartig, Olaf; de Boer, Victor; Vidal, Maria-Esther; Maleshova, Maria; Schlobach, Stefan; Hammar, Karl; Lasierra, Nelia; Stadtmüller, Steffen; Hose, Katja; Verborgh, Ruben %P 90 - 94 %I Springer %@ 978-3-030-32326-4 %B Lecture Notes in Computer Science %N 11762
[126]
M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo, “Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” in Proceedings of the 13th International Conference on Computational Semantics - Long Papers (IWCS 2019), Gothenburg, Sweden, 2019.
Export
BibTeX
@inproceedings{Hedderich_IWCS2019, TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries}, AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-950737-19-2}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 13th International Conference on Computational Semantics -- Long Papers (IWCS 2019)}, EDITOR = {Dobnik, Simon and Chatzikyriakidis, Stergios and Demberg, Vera}, PAGES = {247--258}, ADDRESS = {Gothenburg, Sweden}, }
Endnote
%0 Conference Proceedings %A Hedderich, Michael A. %A Yates, Andrew %A Klakow, Dietrich %A de Melo, Gerard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02A4-0 %D 2019 %B 13th International Conference on Computational Semantics %Z date of event: 2019-05-23 - 2019-05-27 %C Gothenburg, Sweden %B Proceedings of the 13th International Conference on Computational Semantics - Long Papers %E Dobnik, Simon; Chatzikyriakidis, Stergios; Demberg, Vera %P 247 - 258 %I ACL %@ 978-1-950737-19-2 %U https://www.aclweb.org/anthology/W19-0421
[127]
M. A. Hedderich, A. Yates, D. Klakow, and G. de Melo, “Using Multi-Sense Vector Embeddings for Reverse Dictionaries,” 2019. [Online]. Available: http://arxiv.org/abs/1904.01451. (arXiv: 1904.01451)
Abstract
Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.
Export
BibTeX
@online{Hedderich_arXiv1904.01451, TITLE = {Using Multi-Sense Vector Embeddings for Reverse Dictionaries}, AUTHOR = {Hedderich, Michael A. and Yates, Andrew and Klakow, Dietrich and de Melo, Gerard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.01451}, EPRINT = {1904.01451}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.}, }
Endnote
%0 Report %A Hedderich, Michael A. %A Yates, Andrew %A Klakow, Dietrich %A de Melo, Gerard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Using Multi-Sense Vector Embeddings for Reverse Dictionaries : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02B4-E %U http://arxiv.org/abs/1904.01451 %D 2019 %X Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well. %K Computer Science, Computation and Language, cs.CL,Computer Science, Learning, cs.LG
[128]
V. T. Ho, Y. Ibrahim, K. Pal, K. Berberich, and G. Weikum, “Qsearch: Answering Quantity Queries from Text,” in The Semantic Web -- ISWC 2019, Auckland, New Zealand, 2019.
Export
BibTeX
@inproceedings{Ho_ISWC2019, TITLE = {Qsearch: {A}nswering Quantity Queries from Text}, AUTHOR = {Ho, Vinh Thinh and Ibrahim, Yusra and Pal, Koninika and Berberich, Klaus and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {0302-9743}, ISBN = {978-3-030-30792-9}, DOI = {10.1007/978-3-030-30793-6_14}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {The Semantic Web -- ISWC 2019}, DEBUG = {author: Gandon, Fabien}, EDITOR = {Ghidini, Chiara and Hartig, Olaf and Maleshkova, Maria and Sv{\'a}tek, Vojt{\u e}ch and Cruz, Isabel and Hogan, Aidan and Song, Jie and Lefran{\c c}ois, Maxime}, PAGES = {237--257}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11778}, ADDRESS = {Auckland, New Zealand}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Ibrahim, Yusra %A Pal, Koninika %A Berberich, Klaus %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Qsearch: Answering Quantity Queries from Text : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83AB-6 %R 10.1007/978-3-030-30793-6_14 %D 2019 %B 18th Semantic Web Conference %Z date of event: 2019-10-26 - 2019-10-30 %C Auckland, New Zealand %B The Semantic Web -- ISWC 2019 %E Ghidini, Chiara; Hartig, Olaf; Maleshkova, Maria; Svátek, Vojtĕch; Cruz, Isabel; Hogan, Aidan; Song, Jie; Lefrançois, Maxime; Gandon, Fabien %P 237 - 257 %I Springer %@ 978-3-030-30792-9 %B Lecture Notes in Computer Science %N 11778 %@ false
[129]
Y. Ibrahim, “Understanding Quantities in Web Tables and Text,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: - We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. - We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. - We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. - We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.
Export
BibTeX
@phdthesis{yusraphd2019, TITLE = {Understanding Quantities in Web Tables and Text}, AUTHOR = {Ibrahim, Yusra}, LANGUAGE = {eng}, DOI = {10.22028/D291-29657}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: -- We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. -- We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. -- We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. -- We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems.}, }
Endnote
%0 Thesis %A Ibrahim, Yusra %Y Weikum, Gerhard %A referee: Riedewald, Mirek %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Understanding Quantities in Web Tables and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0005-4384-A %R 10.22028/D291-29657 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 116 p. %V phd %9 phd %X There is a wealth of schema-free tables on the web. The text accompanying these tables explains and qualifies the numerical quantities given in the tables. Despite this ubiquity of tabular data, there is little research that harnesses this wealth of data by semantically understanding the information that is conveyed rather ambiguously in these tables. This information can be disambiguated only by the help of the accompanying text. In the process of understanding quantity mentions in tables and text, we are faced with the following challenges; First, there is no comprehensive knowledge base for anchoring quantity mentions. Second, tables are created ad-hoc without a standard schema and with ambiguous header names; also table cells usually contain abbreviations. Third, quantities can be written in multiple forms and units of measures. Fourth, the text usually refers to the quantities in tables using aggregation, approximation, and different scales. In this thesis, we target these challenges through the following contributions: - We present the Quantity Knowledge Base (QKB), a knowledge base for representing Quantity mentions. We construct the QKB by importing information from Freebase, Wikipedia, and other online sources. - We propose Equity: a system for automatically canonicalizing header names and cell values onto concepts, classes, entities, and uniquely represented quantities registered in a knowledge base. We devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities, and quantities. Then, we cast the inference problem into an efficient algorithm based on random walks over weighted graphs. baselines. - We introduce the quantity alignment problem: computing bidirectional links between textual mentions of quantities and the corresponding table cells. We propose BriQ: a system for computing such alignments. BriQ copes with the specific challenges of approximate quantities, aggregated quantities, and calculated quantities. - We design ExQuisiTe: a web application that identifies mentions of quantities in text and tables, aligns quantity mentions in the text with related quantity mentions in tables, and generates salient suggestions for extractive text summarization systems. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28300
[130]
Y. Ibrahim and G. Weikum, “ExQuisiTe: Explaining Quantities in Text,” in Proceedings of the World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{Ibrahim_WWW2019, TITLE = {{ExQuisiTe}: {E}xplaining Quantities in Text}, AUTHOR = {Ibrahim, Yusra and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3314134}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {3541--3544}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ibrahim, Yusra %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ExQuisiTe: Explaining Quantities in Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-01B3-1 %R 10.1145/3308558.3314134 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of the World Wide Web Conference %E McAuley, Julian %P 3541 - 3544 %I ACM %@ 978-1-4503-6674-8
[131]
Y. Ibrahim, M. Riedewald, G. Weikum, and D. Zeinalipour-Yazti, “Bridging Quantities in Tables and Text,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.
Export
BibTeX
@inproceedings{Ibrahim_ICDE2019, TITLE = {Bridging Quantities in Tables and Text}, AUTHOR = {Ibrahim, Yusra and Riedewald, Mirek and Weikum, Gerhard and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-7474-1}, DOI = {10.1109/ICDE.2019.00094}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering}, PAGES = {1010--1021}, ADDRESS = {Macau, China}, }
Endnote
%0 Conference Proceedings %A Ibrahim, Yusra %A Riedewald, Mirek %A Weikum, Gerhard %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Bridging Quantities in Tables and Text : %G eng %U http://hdl.handle.net/21.11116/0000-0003-01AB-B %R 10.1109/ICDE.2019.00094 %D 2019 %B 35th IEEE International Conference on Data Engineering %Z date of event: 2019-04-08 - 2019-04-12 %C Macau, China %B ICDE 2019 %P 1010 - 1021 %I IEEE %@ 978-1-5386-7474-1
[132]
Y. Ismaeil, O. Balalau, and P. Mirza, “Discovering the Functions of Language in Online Forums,” in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2020), Hong Kong, China, 2019.
Export
BibTeX
@inproceedings{ismaeil-etal-2019-discovering, TITLE = {Discovering the Functions of Language in Online Forums}, AUTHOR = {Ismaeil, Youmna and Balalau, Oana and Mirza, Paramita}, LANGUAGE = {eng}, ISBN = {978-1-950737-84-0}, DOI = {10.18653/v1/D19-5534}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2020)}, EDITOR = {Xu, Wei and Ritter, Alan and Baldwin, Tim and Rahimi, Afshin}, PAGES = {259--264}, ADDRESS = {Hong Kong, China}, }
Endnote
%0 Conference Proceedings %A Ismaeil, Youmna %A Balalau, Oana %A Mirza, Paramita %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering the Functions of Language in Online Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0405-E %R 10.18653/v1/D19-5534 %F OTHER: D19-5534 %D 2019 %B 5th Workshop on Noisy User-generated Text %Z date of event: 2019-11-04 - 2019-11-04 %C Hong Kong, China %B Proceedings of the 5th Workshop on Noisy User-generated Text %E Xu, Wei; Ritter, Alan; Baldwin, Tim; Rahimi, Afshin %P 259 - 264 %I ACL %@ 978-1-950737-84-0 %U https://www.aclweb.org/anthology/D19-5534
[133]
Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum, “TEQUILA: Temporal Question Answering over Knowledge Bases,” 2019. [Online]. Available: http://arxiv.org/abs/1908.03650. (arXiv: 1908.03650)
Abstract
Question answering over knowledge bases (KB-QA) poses challenges in handling complex questions that need to be decomposed into sub-questions. An important case, addressed here, is that of temporal questions, where cues for temporal relations need to be discovered and handled. We present TEQUILA, an enabler method for temporal QA that can run on top of any KB-QA engine. TEQUILA has four stages. It detects if a question has temporal intent. It decomposes and rewrites the question into non-temporal sub-questions and temporal constraints. Answers to sub-questions are then retrieved from the underlying KB-QA engine. Finally, TEQUILA uses constraint reasoning on temporal intervals to compute final answers to the full question. Comparisons against state-of-the-art baselines show the viability of our method.
Export
BibTeX
@online{Jia_arXiv1908.03650, TITLE = {{TEQUILA}: Temporal Question Answering over Knowledge Bases}, AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1908.03650}, EPRINT = {1908.03650}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Question answering over knowledge bases (KB-QA) poses challenges in handling complex questions that need to be decomposed into sub-questions. An important case, addressed here, is that of temporal questions, where cues for temporal relations need to be discovered and handled. We present TEQUILA, an enabler method for temporal QA that can run on top of any KB-QA engine. TEQUILA has four stages. It detects if a question has temporal intent. It decomposes and rewrites the question into non-temporal sub-questions and temporal constraints. Answers to sub-questions are then retrieved from the underlying KB-QA engine. Finally, TEQUILA uses constraint reasoning on temporal intervals to compute final answers to the full question. Comparisons against state-of-the-art baselines show the viability of our method.}, }
Endnote
%0 Report %A Jia, Zhen %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Strötgen, Jannik %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T TEQUILA: Temporal Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83BE-1 %U http://arxiv.org/abs/1908.03650 %D 2019 %X Question answering over knowledge bases (KB-QA) poses challenges in handling complex questions that need to be decomposed into sub-questions. An important case, addressed here, is that of temporal questions, where cues for temporal relations need to be discovered and handled. We present TEQUILA, an enabler method for temporal QA that can run on top of any KB-QA engine. TEQUILA has four stages. It detects if a question has temporal intent. It decomposes and rewrites the question into non-temporal sub-questions and temporal constraints. Answers to sub-questions are then retrieved from the underlying KB-QA engine. Finally, TEQUILA uses constraint reasoning on temporal intervals to compute final answers to the full question. Comparisons against state-of-the-art baselines show the viability of our method. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[134]
M. Kaiser, R. Saha Roy, and G. Weikum, “CROWN: Conversational Passage Ranking by Reasoning over Word Networks,” 2019. [Online]. Available: http://arxiv.org/abs/1911.02850. (arXiv: 1911.02850)
Abstract
Information needs around a topic cannot be satisfied in a single turn; users typically ask follow-up questions referring to the same theme and a system must be capable of understanding the conversational context of a request to retrieve correct answers. In this paper, we present our submission to the TREC Conversational Assistance Track 2019, in which such a conversational setting is explored. We propose a simple unsupervised method for conversational passage ranking by formulating the passage score for a query as a combination of similarity and coherence. To be specific, passages are preferred that contain words semantically similar to the words used in the question, and where such words appear close by. We built a word-proximity network (WPN) from a large corpus, where words are nodes and there is an edge between two nodes if they co-occur in the same passages in a statistically significant way, within a context window. Our approach, named CROWN, improved nDCG scores over a provided Indri baseline on the CAsT training data. On the evaluation data for CAsT, our best run submission achieved above-average performance with respect to AP@5 and nDCG@1000.
Export
BibTeX
@online{Kaiser_arXiv1911.02850, TITLE = {{CROWN}: Conversational Passage Ranking by Reasoning over Word Networks}, AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1911.02850}, EPRINT = {1911.02850}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Information needs around a topic cannot be satisfied in a single turn; users typically ask follow-up questions referring to the same theme and a system must be capable of understanding the conversational context of a request to retrieve correct answers. In this paper, we present our submission to the TREC Conversational Assistance Track 2019, in which such a conversational setting is explored. We propose a simple unsupervised method for conversational passage ranking by formulating the passage score for a query as a combination of similarity and coherence. To be specific, passages are preferred that contain words semantically similar to the words used in the question, and where such words appear close by. We built a word-proximity network (WPN) from a large corpus, where words are nodes and there is an edge between two nodes if they co-occur in the same passages in a statistically significant way, within a context window. Our approach, named CROWN, improved nDCG scores over a provided Indri baseline on the CAsT training data. On the evaluation data for CAsT, our best run submission achieved above-average performance with respect to AP@5 and nDCG@1000.}, }
Endnote
%0 Report %A Kaiser, Magdalena %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T CROWN: Conversational Passage Ranking by Reasoning over Word Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83ED-C %U http://arxiv.org/abs/1911.02850 %D 2019 %X Information needs around a topic cannot be satisfied in a single turn; users typically ask follow-up questions referring to the same theme and a system must be capable of understanding the conversational context of a request to retrieve correct answers. In this paper, we present our submission to the TREC Conversational Assistance Track 2019, in which such a conversational setting is explored. We propose a simple unsupervised method for conversational passage ranking by formulating the passage score for a query as a combination of similarity and coherence. To be specific, passages are preferred that contain words semantically similar to the words used in the question, and where such words appear close by. We built a word-proximity network (WPN) from a large corpus, where words are nodes and there is an edge between two nodes if they co-occur in the same passages in a statistically significant way, within a context window. Our approach, named CROWN, improved nDCG scores over a provided Indri baseline on the CAsT training data. On the evaluation data for CAsT, our best run submission achieved above-average performance with respect to AP@5 and nDCG@1000. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[135]
M. Kaiser, R. Saha Roy, and G. Weikum, “CROWN: Conversational Passage Ranking by Reasoning over Word Networks,” in Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019), Gaithersburg, MD, USA, 2019.
Export
BibTeX
@inproceedings{KaiserTrec19, TITLE = {{CROWN}: {C}onversational Passage Ranking by Reasoning over Word Networks}, AUTHOR = {Kaiser, Magdalena and Saha Roy, Rishiraj and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {NIST}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019)}, EDITOR = {Voorhees, Ellen M. and Ellis, Angela}, SERIES = {NIST Special Publication}, VOLUME = {1250}, ADDRESS = {Gaithersburg, MD, USA}, }
Endnote
%0 Conference Proceedings %A Kaiser, Magdalena %A Saha Roy, Rishiraj %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T CROWN: Conversational Passage Ranking by Reasoning over Word Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0008-03C3-8 %D 2019 %B Twenty-Eighth Text REtrieval Conference %Z date of event: 2019-11-13 - 2019-11-15 %C Gaithersburg, MD, USA %B Proceedings of the Twenty-Eighth Text REtrieval Conference %E Voorhees, Ellen M.; Ellis, Angela %I NIST %B NIST Special Publication %N 1250
[136]
J. Kalofolias, M. Boley, and J. Vreeken, “Discovering Robustly Connected Subgraphs with Simple Descriptions,” in 19th IEEE International Conference on Data Mining (ICDM 2019), Beijing, China, 2019.
Export
BibTeX
@inproceedings{kalofolias:19:rosi, TITLE = {Discovering Robustly Connected Subgraphs with Simple Descriptions}, AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-7281-4604-1}, DOI = {10.1109/ICDM.2019.00139}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {19th IEEE International Conference on Data Mining (ICDM 2019)}, PAGES = {1150--1155}, ADDRESS = {Beijing, China}, }
Endnote
%0 Conference Proceedings %A Kalofolias, Janis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Robustly Connected Subgraphs with Simple Descriptions : %G eng %U http://hdl.handle.net/21.11116/0000-0008-26D3-F %R 10.1109/ICDM.2019.00139 %D 2019 %B 19th IEEE International Conference on Data Mining %Z date of event: 2019-11-08 - 2019-11-11 %C Beijing, China %B 19th IEEE International Conference on Data Mining %P 1150 - 1155 %I IEEE %@ 978-1-7281-4604-1
[137]
D. Kaltenpoth and J. Vreeken, “We Are Not Your Real Parents: Telling Causal from Confounded using MDL,” 2019. [Online]. Available: http://arxiv.org/abs/1901.06950. (arXiv: 1901.06950)
Abstract
Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence.
Export
BibTeX
@online{Kaltenpoth_arXiv1901.06950, TITLE = {We Are Not Your Real Parents: Telling Causal from Confounded using {MDL}}, AUTHOR = {Kaltenpoth, David and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1901.06950}, EPRINT = {1901.06950}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence.}, }
Endnote
%0 Report %A Kaltenpoth, David %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T We Are Not Your Real Parents: Telling Causal from Confounded using MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FFEE-3 %U http://arxiv.org/abs/1901.06950 %D 2019 %X Given data over variables $(X_1,...,X_m, Y)$ we consider the problem of finding out whether $X$ jointly causes $Y$ or whether they are all confounded by an unobserved latent variable $Z$. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where $X$ causes $Y$ and where there exists a latent variables $Z$ confounding both $X$ and $Y$ and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence. %K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
[138]
D. Kaltenpoth and J. Vreeken, “We Are Not Your Real Parents: Telling Causal from Confounded by MDL,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.
Export
BibTeX
@inproceedings{Kaltenpoth_SDM2019, TITLE = {We Are Not Your Real Parents: {T}elling Causal from Confounded by {MDL}}, AUTHOR = {Kaltenpoth, David and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-567-3}, DOI = {10.1137/1.9781611975673.23}, PUBLISHER = {SIAM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)}, EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh}, PAGES = {199--207}, ADDRESS = {Calgary, Canada}, }
Endnote
%0 Conference Proceedings %A Kaltenpoth, David %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T We Are Not Your Real Parents: Telling Causal from Confounded by MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D37-2 %R 10.1137/1.9781611975673.23 %D 2019 %B SIAM International Conference on Data Mining %Z date of event: 2019-05-02 - 2019-05-04 %C Calgary, Canada %B Proceedings of the 2019 SIAM International Conference on Data Mining %E Berger-Wolf, Tanya; Chawla, Nitesh %P 199 - 207 %I SIAM %@ 978-1-61197-567-3
[139]
S. Karaev, “Matrix Factorization over Diods and its Applications in Data Mining,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (“negative numbers”) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used – the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.
Export
BibTeX
@phdthesis{Karaevphd2019, TITLE = {Matrix Factorization over Diods and its Applications in Data Mining}, AUTHOR = {Karaev, Sanjar}, LANGUAGE = {eng}, DOI = {10.22028/D291-28661}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse ({\textquotedblleft}negative numbers{\textquotedblright}) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used -- the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.}, }
Endnote
%0 Thesis %A Karaev, Sanjar %Y Miettinen, Pauli %A referee: Weikum, Gerhard %A referee: van Leeuwen, Matthijs %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Matrix Factorization over Diods and its Applications in Data Mining : %G eng %U http://hdl.handle.net/21.11116/0000-0005-4369-A %R 10.22028/D291-28661 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 113 p. %V phd %9 phd %X Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (“negative numbers”) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used – the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27903
[140]
S. Karaev and P. Miettinen, “Algorithms for Approximate Subtropical Matrix Factorization,” Data Mining and Knowledge Discovery, vol. 33, no. 2, 2019.
Export
BibTeX
@article{Karaev_DMKD2018, TITLE = {Algorithms for Approximate Subtropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1007/s10618-018-0599-1}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Data Mining and Knowledge Discovery}, VOLUME = {33}, NUMBER = {2}, PAGES = {526--576}, }
Endnote
%0 Journal Article %A Karaev, Sanjar %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Algorithms for Approximate Subtropical Matrix Factorization : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9FD5-B %R 10.1007/s10618-018-0599-1 %7 2018 %D 2019 %J Data Mining and Knowledge Discovery %O DMKD %V 33 %N 2 %& 526 %P 526 - 576 %I Springer %C New York, NY
[141]
A. Konstantinidis, P. Irakleous, Z. Georgiou, D. Zeinalipour-Yazti, and P. K. Chrysanthis, “IoT Data Prefetching in Indoor Navigation SOAs,” ACM Transactions on Internet Technology, vol. 19, no. 1, 2019.
Export
BibTeX
@article{Konstantinidis:2018:IDP:3283809.3177777, TITLE = {{IoT} Data Prefetching in Indoor Navigation {SOAs}}, AUTHOR = {Konstantinidis, Andreas and Irakleous, Panagiotis and Georgiou, Zacharias and Zeinalipour-Yazti, Demetrios and Chrysanthis, Panos K.}, LANGUAGE = {eng}, ISSN = {1533-5399}, DOI = {10.1145/3177777}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {ACM Transactions on Internet Technology}, VOLUME = {19}, NUMBER = {1}, EID = {10}, }
Endnote
%0 Journal Article %A Konstantinidis, Andreas %A Irakleous, Panagiotis %A Georgiou, Zacharias %A Zeinalipour-Yazti, Demetrios %A Chrysanthis, Panos K. %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T IoT Data Prefetching in Indoor Navigation SOAs : %G eng %U http://hdl.handle.net/21.11116/0000-0002-CA09-1 %R 10.1145/3177777 %7 2019 %D 2019 %J ACM Transactions on Internet Technology %O TOIT %V 19 %N 1 %Z sequence number: 10 %I ACM %C New York, NY %@ false
[142]
P. Lahoti, K. P. Gummadi, and G. Weikum, “Operationalizing Individual Fairness with Pairwise Fair Representations,” 2019. [Online]. Available: http://arxiv.org/abs/1907.01439. (arXiv: 1907.01439)
Abstract
We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.
Export
BibTeX
@online{Lahoti_arXiv1907.01439, TITLE = {Operationalizing Individual Fairness with Pairwise Fair Representations}, AUTHOR = {Lahoti, Preethi and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1907.01439}, EPRINT = {1907.01439}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.}, }
Endnote
%0 Report %A Lahoti, Preethi %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Operationalizing Individual Fairness with Pairwise Fair Representations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FF17-5 %U http://arxiv.org/abs/1907.01439 %D 2019 %X We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation(PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including humans judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable. %K Computer Science, Learning, cs.LG,Statistics, Machine Learning, stat.ML
[143]
P. Lahoti, K. Gummadi, and G. Weikum, “Operationalizing Individual Fairness with Pairwise Fair Representations,” Proceedings of the VLDB Endowment (Proc. VLDB 2019), vol. 13, no. 4, 2019.
Export
BibTeX
@article{Lahoti2019_PVLDB, TITLE = {Operationalizing Individual Fairness with Pairwise Fair Representations}, AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.14778/3372716.3372723}, PUBLISHER = {VLDB Endowment Inc.}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)}, VOLUME = {13}, NUMBER = {4}, PAGES = {506--518}, BOOKTITLE = {Proceedings of the 45h International Conference on Very Large Data Bases (VLDB 2019)}, EDITOR = {Balazinska, Magdalena and Zhou, Xiaofang}, }
Endnote
%0 Journal Article %A Lahoti, Preethi %A Gummadi, Krishna %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Operationalizing Individual Fairness with Pairwise Fair Representations : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8168-4 %R 10.14778/3372716.3372723 %7 2019 %D 2019 %J Proceedings of the VLDB Endowment %O PVLDB %V 13 %N 4 %& 506 %P 506 - 518 %I VLDB Endowment Inc. %B Proceedings of the 45h International Conference on Very Large Data Bases %O VLDB 2019 Los Angeles, CA, USA, 26-30 August 2019
[144]
P. Lahoti, K. Gummadi, and G. Weikum, “iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making,” in ICDE 2019, 35th IEEE International Conference on Data Engineering, Macau, China, 2019.
Export
BibTeX
@inproceedings{Lahoti_ICDE2019, TITLE = {{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making}, AUTHOR = {Lahoti, Preethi and Gummadi, Krishna and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-5386-7474-1}, DOI = {10.1109/ICDE.2019.00121}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {ICDE 2019, 35th IEEE International Conference on Data Engineering}, PAGES = {1334--1345}, ADDRESS = {Macau, China}, }
Endnote
%0 Conference Proceedings %A Lahoti, Preethi %A Gummadi, Krishna %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making : %G eng %U http://hdl.handle.net/21.11116/0000-0003-F395-2 %R 10.1109/ICDE.2019.00121 %D 2019 %B 35th IEEE International Conference on Data Engineering %Z date of event: 2019-04-08 - 2019-04-12 %C Macau, China %B ICDE 2019 %P 1334 - 1345 %I IEEE %@ 978-1-5386-7474-1
[145]
X. Lu, S. Pramanik, R. Saha Roy, A. Abujabal, Y. Wang, and G. Weikum, “Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs,” in SIGIR ’19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 2019.
Export
BibTeX
@inproceedings{lu19answering, TITLE = {Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs}, AUTHOR = {Lu, Xiaolu and Pramanik, Soumajit and Saha Roy, Rishiraj and Abujabal, Abdalghani and Wang, Yafang and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6172-9}, DOI = {10.1145/3331184.3331252}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {SIGIR '19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, EDITOR = {Piwowarski, Benjamin and Chevalier, Max and Gaussier, {\'E}ric}, PAGES = {105--114}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A Lu, Xiaolu %A Pramanik, Soumajit %A Saha Roy, Rishiraj %A Abujabal, Abdalghani %A Wang, Yafang %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0003-7085-8 %R 10.1145/3331184.3331252 %D 2019 %B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2019-07-21 - 2019-07-25 %C Paris, France %B SIGIR '19 %E Piwowarski, Benjamin; Chevalier, Max; Gaussier, Éric %P 105 - 114 %I ACM %@ 978-1-4503-6172-9
[146]
X. Lu, S. Pramanik, R. Saha Roy, A. Abujabal, Y. Wang, and G. Weikum, “Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs,” 2019. [Online]. Available: http://arxiv.org/abs/1908.00469. (arXiv: 1908.00469)
Abstract
Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines.
Export
BibTeX
@online{Lu_arXiv1908.00469, TITLE = {Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs}, AUTHOR = {Lu, Xiaolu and Pramanik, Soumajit and Saha Roy, Rishiraj and Abujabal, Abdalghani and Wang, Yafang and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1908.00469}, EPRINT = {1908.00469}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines.}, }
Endnote
%0 Report %A Lu, Xiaolu %A Pramanik, Soumajit %A Saha Roy, Rishiraj %A Abujabal, Abdalghani %A Wang, Yafang %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83B3-C %U http://arxiv.org/abs/1908.00469 %D 2019 %X Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines. %K Computer Science, Information Retrieval, cs.IR
[147]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, “CEDR: Contextualized Embeddings for Document Ranking,” in SIGIR ’19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 2019.
Export
BibTeX
@inproceedings{MacAvaney_SIGIR2019, TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {9781450361729}, DOI = {10.1145/3331184.3331317}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {SIGIR '19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, EDITOR = {Piwowarski, Benjamin and Chevalier, Max and Gaussier, {\'E}ric}, PAGES = {1101--1104}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CEDR: Contextualized Embeddings for Document Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02D3-B %R 10.1145/3331184.3331317 %D 2019 %B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2019-07-21 - 2019-07-25 %C Paris, France %B SIGIR '19 %E Piwowarski, Benjamin; Chevalier, Max; Gaussier, Éric %P 1101 - 1104 %I ACM %@ 9781450361729
[148]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, “CEDR: Contextualized Embeddings for Document Ranking,” 2019. [Online]. Available: http://arxiv.org/abs/1904.07094. (arXiv: 1904.07094)
Abstract
Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.
Export
BibTeX
@online{MacAvaney_arXiv1904.07094, TITLE = {{CEDR}: Contextualized Embeddings for Document Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.07094}, EPRINT = {1904.07094}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T CEDR: Contextualized Embeddings for Document Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02C7-9 %U http://arxiv.org/abs/1904.07094 %D 2019 %X Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language modes (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[149]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Overcoming Low-Utility Facets for Complex Answer Retrieval,” Information Retrieval Journal, vol. 22, no. 3–4, 2019.
Export
BibTeX
@article{MacAvaney2019, TITLE = {Overcoming Low-Utility Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, ISSN = {1386-4564}, DOI = {10.1007/s10791-018-9343-0}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Information Retrieval Journal}, VOLUME = {22}, NUMBER = {3-4}, PAGES = {395--418}, }
Endnote
%0 Journal Article %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Overcoming Low-Utility Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0003-C4A1-9 %R 10.1007/s10791-018-9343-0 %7 2019 %D 2019 %J Information Retrieval Journal %V 22 %N 3-4 %& 395 %P 395 - 418 %I Springer %C New York, NY %@ false
[150]
S. MacAvaney, A. Yates, K. Hui, and O. Frieder, “Content-Based Weak Supervision for Ad-Hoc Re-Ranking,” 2019. [Online]. Available: http://arxiv.org/abs/1707.00189. (arXiv: 1707.00189)
Abstract
One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to eliminate training samples that are too far out of domain using two techniques: a heuristic-based approach and novel supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures and multiple weak supervision datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that filtering can further improve performance.
Export
BibTeX
@online{MacAvaney_arXiv1707.00189, TITLE = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1707.00189}, EPRINT = {1707.00189}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to eliminate training samples that are too far out of domain using two techniques: a heuristic-based approach and novel supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures and multiple weak supervision datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that filtering can further improve performance.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Yates, Andrew %A Hui, Kai %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Content-Based Weak Supervision for Ad-Hoc Re-Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0005-6B59-0 %U http://arxiv.org/abs/1707.00189 %D 2019 %X One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to eliminate training samples that are too far out of domain using two techniques: a heuristic-based approach and novel supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures and multiple weak supervision datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that filtering can further improve performance. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[151]
S. MacAvaney, A. Yates, K. Hui, and O. Frieder, “Content-Based Weak Supervision for Ad-Hoc Re-Ranking,” in SIGIR ’19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 2019.
Export
BibTeX
@inproceedings{MacAvaney_SIGIR2019b, TITLE = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir}, LANGUAGE = {eng}, ISBN = {9781450361729}, DOI = {10.1145/3331184.3331316}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {SIGIR '19, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, EDITOR = {Piwowarski, Benjamin and Chevalier, Max and Gaussier, {\'E}ric}, PAGES = {993--996}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Hui, Kai %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Content-Based Weak Supervision for Ad-Hoc Re-Ranking : %G eng %U http://hdl.handle.net/21.11116/0000-0005-6B55-4 %R 10.1145/3331184.3331316 %D 2019 %B 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2019-07-21 - 2019-07-25 %C Paris, France %B SIGIR '19 %E Piwowarski, Benjamin; Chevalier, Max; Gaussier, Éric %P 993 - 996 %I ACM %@ 9781450361729
[152]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019), Macao, 2019.
Export
BibTeX
@inproceedings{mandros_IJCAI2019, TITLE = {Discovering Reliable Dependencies from Data: {H}ardness and Improved Algorithms}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-0-9992411-4-1}, DOI = {10.24963/ijcai.2019/864}, PUBLISHER = {IJCAI}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019)}, EDITOR = {Krais, Sarit}, PAGES = {6206--6210}, ADDRESS = {Macao}, }
Endnote
%0 Conference Proceedings %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms : %G eng %U http://hdl.handle.net/21.11116/0000-0005-848A-A %R 10.24963/ijcai.2019/864 %D 2019 %B Twenty-Eighth International Joint Conference on Artificial Intelligence %Z date of event: 2019-08-10 - 2019-08-16 %C Macao %B Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence %E Krais, Sarit %P 6206 - 6210 %I IJCAI %@ 978-0-9992411-4-1 %U https://www.ijcai.org/Proceedings/2019/0864.pdf
[153]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Correlations in Categorical Data,” 2019. [Online]. Available: http://arxiv.org/abs/1908.11682. (arXiv: 1908.11682)
Abstract
In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through two case studies we confirm that our discovery framework identifies interesting and meaningful correlations.
Export
BibTeX
@online{Mandros_arXiv1908.11682, TITLE = {Discovering Reliable Correlations in Categorical Data}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1908.11682}, EPRINT = {1908.11682}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through two case studies we confirm that our discovery framework identifies interesting and meaningful correlations.}, }
Endnote
%0 Report %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Discovering Reliable Correlations in Categorical Data : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8491-1 %U http://arxiv.org/abs/1908.11682 %D 2019 %X In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through two case studies we confirm that our discovery framework identifies interesting and meaningful correlations. %K Computer Science, Learning, cs.LG,Computer Science, Databases, cs.DB,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT,Statistics, Machine Learning, stat.ML
[154]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Correlations in Categorical Data,” in 19th IEEE International Conference on Data Mining (ICDM 2019), Beijing, China, 2019.
Export
BibTeX
@inproceedings{Mandros_ICDM2019, TITLE = {Discovering Reliable Correlations in Categorical Data}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-7281-4604-1}, DOI = {10.1109/ICDM.2019.00156}, PUBLISHER = {IEEE}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {19th IEEE International Conference on Data Mining (ICDM 2019)}, PAGES = {1252--1257}, ADDRESS = {Beijing, China}, }
Endnote
%0 Conference Proceedings %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Discovering Reliable Correlations in Categorical Data : %G eng %U http://hdl.handle.net/21.11116/0000-0006-F27B-F %R 10.1109/ICDM.2019.00156 %D 2019 %B 19th IEEE International Conference on Data Mining %Z date of event: 2019-11-08 - 2019-11-11 %C Beijing, China %B 19th IEEE International Conference on Data Mining %P 1252 - 1257 %I IEEE %@ 978-1-7281-4604-1
[155]
A. Marx and J. Vreeken, “Testing Conditional Independence on Discrete Data using Stochastic Complexity,” in Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019), Naha, Okinawa, Japan, 2019.
Export
BibTeX
@inproceedings{Marx_AISTATS2019, TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, PUBLISHER = {PMLR}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)}, EDITOR = {Chaudhuri, Kamalika and Sugiyama, Masashi}, PAGES = {496--505}, SERIES = {Proceedings of the Machine Learning Research}, VOLUME = {89}, ADDRESS = {Naha, Okinawa, Japan}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Testing Conditional Independence on Discrete Data using Stochastic Complexity : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D3C-D %D 2019 %B 22nd International Conference on Artificial Intelligence and Statistics %Z date of event: 2019-04-16 - 2019-04-18 %C Naha, Okinawa, Japan %B Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics %E Chaudhuri, Kamalika; Sugiyama, Masashi %P 496 - 505 %I PMLR %B Proceedings of the Machine Learning Research %N 89 %U http://proceedings.mlr.press/v89/marx19a/marx19a.pdf
[156]
A. Marx and J. Vreeken, “Testing Conditional Independence on Discrete Data using Stochastic Complexity,” 2019. [Online]. Available: http://arxiv.org/abs/1903.04829. (arXiv: 1903.04829)
Abstract
Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.
Export
BibTeX
@online{Marx_arXiv1903.04829, TITLE = {Testing Conditional Independence on Discrete Data using Stochastic Complexity}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1903.04829}, EPRINT = {1903.04829}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.}, }
Endnote
%0 Report %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Testing Conditional Independence on Discrete Data using Stochastic Complexity : %G eng %U http://hdl.handle.net/21.11116/0000-0004-027A-1 %U http://arxiv.org/abs/1903.04829 %D 2019 %X Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision. %K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG
[157]
A. Marx and J. Vreeken, “Telling Cause from Effect by Local and Global Regression,” Knowledge and Information Systems, vol. 60, no. 3, 2019.
Export
BibTeX
@article{marx:19:crack, TITLE = {Telling Cause from Effect by Local and Global Regression}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISSN = {0219-1377}, DOI = {10.1007/s10115-018-1286-7}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Knowledge and Information Systems}, VOLUME = {60}, NUMBER = {3}, PAGES = {1277--1305}, }
Endnote
%0 Journal Article %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Telling Cause from Effect by Local and Global Regression : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EAD-A %R 10.1007/s10115-018-1286-7 %7 2018-12-07 %D 2019 %J Knowledge and Information Systems %V 60 %N 3 %& 1277 %P 1277 - 1305 %I Springer %C New York, NY %@ false
[158]
A. Marx and J. Vreeken, “Identifiability of Cause and Effect using Regularized Regression,” in KDD ’19, 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019.
Export
BibTeX
@inproceedings{Marx_KDD2019, TITLE = {Identifiability of Cause and Effect using Regularized Regression}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-4503-6201-6}, DOI = {10.1145/3292500.3330854}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {KDD '19, 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining}, PAGES = {852--861}, ADDRESS = {Anchorage, AK, USA}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Identifiability of Cause and Effect using Regularized Regression : %G eng %U http://hdl.handle.net/21.11116/0000-0004-858C-8 %R 10.1145/3292500.3330854 %D 2019 %B 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining %Z date of event: 2019-08-04 - 2019-08-08 %C Anchorage, AK, USA %B KDD '19 %P 852 - 861 %I ACM %@ 978-1-4503-6201-6
[159]
A. Marx and J. Vreeken, “Causal Inference on Multivariate and Mixed-Type Data,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018), Dublin, Ireland, 2019.
Export
BibTeX
@inproceedings{marx:18:crack, TITLE = {Causal Inference on Multivariate and Mixed-Type Data}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-3-030-10927-1}, DOI = {10.1007/978-3-030-10928-8_39}, PUBLISHER = {Springer}, YEAR = {2018}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018)}, EDITOR = {Berlingerio, Michele and Bonchi, Francesco and G{\"a}rtner, Thomas and Hurley, Neil and Ifrim, Georgiana}, PAGES = {655--671}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {11052}, ADDRESS = {Dublin, Ireland}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference on Multivariate and Mixed-Type Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9E86-5 %R 10.1007/978-3-030-10928-8_39 %D 2019 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2018-09-10 - 2018-09-14 %C Dublin, Ireland %B Machine Learning and Knowledge Discovery in Databases %E Berlingerio, Michele; Bonchi, Francesco; Gärtner, Thomas; Hurley, Neil; Ifrim, Georgiana %P 655 - 671 %I Springer %@ 978-3-030-10927-1 %B Lecture Notes in Artificial Intelligence %N 11052
[160]
A. Marx and J. Vreeken, “Approximating Algorithmic Conditional Independence for Discrete Data,” in Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI, Stanford, CA, USA. (Accepted/in press)
Export
BibTeX
@inproceedings{Marx_AAAISpringSymp2019, TITLE = {Approximating Algorithmic Conditional Independence for Discrete Data}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, YEAR = {2019}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI}, ADDRESS = {Stanford, CA, USA}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Approximating Algorithmic Conditional Independence for Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0D4C-B %D 2019 %B First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI %Z date of event: 2019-05-25 - 2019-05-27 %C Stanford, CA, USA %B Proceedings of the First AAAI Spring Symposium Beyond Curve Fitting: Causation, Counterfactuals, and Imagination-based AI
[161]
F. Mesquita, M. Cannaviccio, J. Schmidek, P. Mirza, and D. Barbosa, “KnowledgeNet: A Benchmark Dataset for Knowledge Base Population,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.
Export
BibTeX
@inproceedings{mesquita-etal-2019-knowledgenet, TITLE = {{KnowledgeNet}: {A} Benchmark Dataset for Knowledge Base Population}, AUTHOR = {Mesquita, Filipe and Cannaviccio, Matteo and Schmidek, Jordan and Mirza, Paramita and Barbosa, Denilson}, LANGUAGE = {eng}, ISBN = {978-1-950737-90-1}, DOI = {10.18653/v1/D19-1069}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)}, EDITOR = {Inui, Kentaro and Jiang, Jing and Ng, Vincent and Wan, Xiaojun}, PAGES = {749--758}, ADDRESS = {Hong Kong, China}, }
Endnote
%0 Conference Proceedings %A Mesquita, Filipe %A Cannaviccio, Matteo %A Schmidek, Jordan %A Mirza, Paramita %A Barbosa, Denilson %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T KnowledgeNet: A Benchmark Dataset for Knowledge Base Population : %G eng %U http://hdl.handle.net/21.11116/0000-0008-0410-1 %R 10.18653/v1/D19-1069 %F OTHER: D19-1069 %D 2019 %B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %Z date of event: 2019-11-03 - 2019-11-07 %C Hong Kong, China %B Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing %E Inui, Kentaro; Jiang, Jing; Ng, Vincent; Wan, Xiaojun %P 749 - 758 %I ACL %@ 978-1-950737-90-1 %U https://www.aclweb.org/anthology/D19-1069
[162]
S. Metzler, S. Günnemann, and P. Miettinen, “Stability and Dynamics of Communities on Online Question-Answer Sites,” Social Networks, vol. 58, 2019.
Export
BibTeX
@article{Metzler2019, TITLE = {Stability and Dynamics of Communities on Online Question-Answer Sites}, AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {0378-8733}, DOI = {10.1016/j.socnet.2018.12.004}, PUBLISHER = {Elsevier}, ADDRESS = {Amsterdam}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Social Networks}, VOLUME = {58}, PAGES = {50--58}, }
Endnote
%0 Journal Article %A Metzler, Saskia %A Günnemann, Stephan %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Stability and Dynamics of Communities on Online Question-Answer Sites : %G eng %U http://hdl.handle.net/21.11116/0000-0002-BCC1-0 %R 10.1016/j.socnet.2018.12.004 %7 2019 %D 2019 %J Social Networks %V 58 %& 50 %P 50 - 58 %I Elsevier %C Amsterdam %@ false
[163]
S. Metzler and P. Miettinen, “HyGen: Generating Random Graphs with Hyperbolic Communities,” Applied Network Science, vol. 4, 2019.
Export
BibTeX
@article{Metzler_Miettienen19, TITLE = {{HyGen}: {G}enerating Random Graphs with Hyperbolic Communities}, AUTHOR = {Metzler, Saskia and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {2364-8228}, DOI = {10.1007/s41109-019-0166-8}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, JOURNAL = {Applied Network Science}, VOLUME = {4}, EID = {53}, }
Endnote
%0 Journal Article %A Metzler, Saskia %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T HyGen: Generating Random Graphs with Hyperbolic Communities : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8E5E-3 %R 10.1007/s41109-019-0166-8 %7 2019 %D 2019 %J Applied Network Science %O ANS Appl Netw Sci %V 4 %Z sequence number: 53 %I Springer %C New York, NY %@ false
[164]
O. A. Mian, “Causal Discovery using MDL-based Regression,” Universität des Saarlandes, Saarbrücken, 2019.
Export
BibTeX
@mastersthesis{mian:19:cdregression, TITLE = {Causal Discovery using {MDL}-based Regression}, AUTHOR = {Mian, Osman Ali}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, }
Endnote
%0 Thesis %A Mian, Osman Ali %Y Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Discovery using MDL-based Regression : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FF0D-D %I Universität des Saarlandes %C Saarbrücken %D 2019 %V master %9 master
[165]
M. Mohanty, M. Ramanath, M. Yahya, and G. Weikum, “Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs,” in Advances in Database Technology (EDBT 2019), Lisbon, Portugal, 2019.
Export
BibTeX
@inproceedings{Mohanty:EDBT2019, TITLE = {{Spec-QP}: {S}peculative Query Planning for Joins over Knowledge Graphs}, AUTHOR = {Mohanty, Madhulika and Ramanath, Maya and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-89318-081-3}, DOI = {10.5441/002/edbt.2019.07}, PUBLISHER = {OpenProceedings.org}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Advances in Database Technology (EDBT 2019)}, EDITOR = {Herschel, Melanie and Galhardas, Helena and Reinwald, Berthold and Fundlaki, Irini and Binning, Carsten and Kaoudi, Zoi}, PAGES = {61--72}, ADDRESS = {Lisbon, Portugal}, }
Endnote
%0 Conference Proceedings %A Mohanty, Madhulika %A Ramanath, Maya %A Yahya, Mohamed %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0003-3A7D-1 %R 10.5441/002/edbt.2019.07 %D 2019 %B 22nd International Conference on Extending Database Technology %Z date of event: 2019-03-26 - 2019-03-29 %C Lisbon, Portugal %B Advances in Database Technology %E Herschel, Melanie; Galhardas, Helena; Reinwald, Berthold; Fundlaki, Irini; Binning, Carsten; Kaoudi, Zoi %P 61 - 72 %I OpenProceedings.org %@ 978-3-89318-081-3
[166]
S. Nag Chowdhury, N. Tandon, H. Ferhatosmanoglu, and G. Weikum, “VISIR: Visual and Semantic Image Label Refinement,” 2019. [Online]. Available: http://arxiv.org/abs/1909.00741. (arXiv: 1909.00741)
Abstract
The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more informative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on generalizations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semantically refining and expanding the labels suggested by learning-based object detection. We consider the semantic coherence between the labels for different objects, leverage lexical and commonsense knowledge, and cast the label assignment into a constrained optimization problem solved by an integer linear program. Experiments show that our method, called VISIR, improves the quality of the state-of-the-art visual labeling tools like LSDA and YOLO.
Export
BibTeX
@online{Nag_arXiv1909.00741, TITLE = {{VISIR}: Visual and Semantic Image Label Refinement}, AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Ferhatosmanoglu, Hakan and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1909.00741}, EPRINT = {1909.00741}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more informative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on generalizations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semantically refining and expanding the labels suggested by learning-based object detection. We consider the semantic coherence between the labels for different objects, leverage lexical and commonsense knowledge, and cast the label assignment into a constrained optimization problem solved by an integer linear program. Experiments show that our method, called VISIR, improves the quality of the state-of-the-art visual labeling tools like LSDA and YOLO.}, }
Endnote
%0 Report %A Nag Chowdhury, Sreyasi %A Tandon, Niket %A Ferhatosmanoglu, Hakan %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T VISIR: Visual and Semantic Image Label Refinement : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83CE-F %U http://arxiv.org/abs/1909.00741 %D 2019 %X The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more informative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on generalizations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semantically refining and expanding the labels suggested by learning-based object detection. We consider the semantic coherence between the labels for different objects, leverage lexical and commonsense knowledge, and cast the label assignment into a constrained optimization problem solved by an integer linear program. Experiments show that our method, called VISIR, improves the quality of the state-of-the-art visual labeling tools like LSDA and YOLO. %K Computer Science, Multimedia, cs.MM,Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Information Retrieval, cs.IR
[167]
S. Nag Chowdhury, S. Razniewski, and G. Weikum, “Story-oriented Image Selection and Placement,” 2019. [Online]. Available: http://arxiv.org/abs/1909.00692. (arXiv: 1909.00692)
Abstract
Multimodal contents have become commonplace on the Internet today, manifested as news articles, social media posts, and personal or business blog posts. Among the various kinds of media (images, videos, graphics, icons, audio) used in such multimodal stories, images are the most popular. The selection of images from a collection - either author's personal photo album, or web repositories - and their meticulous placement within a text, builds a succinct multimodal commentary for digital consumption. In this paper we present a system that automates the process of selecting relevant images for a story and placing them at contextual paragraphs within the story for a multimodal narration. We leverage automatic object recognition, user-provided tags, and commonsense knowledge, and use an unsupervised combinatorial optimization to solve the selection and placement problems seamlessly as a single unit.
Export
BibTeX
@online{Nag_arXiv1909.00692, TITLE = {Story-oriented Image Selection and Placement}, AUTHOR = {Nag Chowdhury, Sreyasi and Razniewski, Simon and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1909.00692}, EPRINT = {1909.00692}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Multimodal contents have become commonplace on the Internet today, manifested as news articles, social media posts, and personal or business blog posts. Among the various kinds of media (images, videos, graphics, icons, audio) used in such multimodal stories, images are the most popular. The selection of images from a collection -- either author's personal photo album, or web repositories -- and their meticulous placement within a text, builds a succinct multimodal commentary for digital consumption. In this paper we present a system that automates the process of selecting relevant images for a story and placing them at contextual paragraphs within the story for a multimodal narration. We leverage automatic object recognition, user-provided tags, and commonsense knowledge, and use an unsupervised combinatorial optimization to solve the selection and placement problems seamlessly as a single unit.}, }
Endnote
%0 Report %A Nag Chowdhury, Sreyasi %A Razniewski, Simon %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Story-oriented Image Selection and Placement : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83C9-4 %U http://arxiv.org/abs/1909.00692 %D 2019 %X Multimodal contents have become commonplace on the Internet today, manifested as news articles, social media posts, and personal or business blog posts. Among the various kinds of media (images, videos, graphics, icons, audio) used in such multimodal stories, images are the most popular. The selection of images from a collection - either author's personal photo album, or web repositories - and their meticulous placement within a text, builds a succinct multimodal commentary for digital consumption. In this paper we present a system that automates the process of selecting relevant images for a story and placing them at contextual paragraphs within the story for a multimodal narration. We leverage automatic object recognition, user-provided tags, and commonsense knowledge, and use an unsupervised combinatorial optimization to solve the selection and placement problems seamlessly as a single unit. %K Computer Science, Computation and Language, cs.CL
[168]
S. Nag Chowdhury, N. Tandon, and G. Weikum, “Know2Look: Commonsense Knowledge for Visual Search,” 2019. [Online]. Available: http://arxiv.org/abs/1909.00749. (arXiv: 1909.00749)
Abstract
With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval.
Export
BibTeX
@online{Nag_arXiv1909.00749, TITLE = {{Know2Look}: Commonsense Knowledge for Visual Search}, AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1909.00749}, EPRINT = {1909.00749}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities -- text, visual cues, and commonsense knowledge pertaining to the query -- as a recipe for efficient search and retrieval.}, }
Endnote
%0 Report %A Nag Chowdhury, Sreyasi %A Tandon, Niket %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Know2Look: Commonsense Knowledge for Visual Search : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83D2-9 %U http://arxiv.org/abs/1909.00749 %D 2019 %X With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval. %K Computer Science, Information Retrieval, cs.IR
[169]
S. Paramonov, D. Stepanova, and P. Miettinen, “Hybrid ASP-based Approach to Pattern Mining,” Theory and Practice of Logic Programming, vol. 19, no. 4, 2019.
Export
BibTeX
@article{ParamonovTPLP, TITLE = {Hybrid {ASP}-based Approach to Pattern Mining}, AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {1471-0684}, DOI = {10.1017/S1471068418000467}, PUBLISHER = {Cambridge University Press}, ADDRESS = {Cambridge}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Theory and Practice of Logic Programming}, VOLUME = {19}, NUMBER = {4}, PAGES = {505--535}, }
Endnote
%0 Journal Article %A Paramonov, Sergey %A Stepanova, Daria %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hybrid ASP-based Approach to Pattern Mining : %G eng %U http://hdl.handle.net/21.11116/0000-0003-0CC4-3 %R 10.1017/S1471068418000467 %7 2019 %D 2019 %J Theory and Practice of Logic Programming %O TPLP %V 19 %N 4 %& 505 %P 505 - 535 %I Cambridge University Press %C Cambridge %@ false
[170]
K. Popat, “Credibility Analysis of Textual Claimswith Explainable Evidence,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.
Export
BibTeX
@phdthesis{Popatphd2019, TITLE = {Credibility Analysis of Textual Claimswith Explainable Evidence}, AUTHOR = {Popat, Kashyap}, LANGUAGE = {eng}, DOI = {10.22028/D291-30005}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.}, }
Endnote
%0 Thesis %A Popat, Kashyap %Y Weikum, Gerhard %A referee: Naumann, Felix %A referee: Yates, Andrew %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Credibility Analysis of Textual Claimswith Explainable Evidence : %G eng %U http://hdl.handle.net/21.11116/0000-0005-654D-4 %R 10.22028/D291-30005 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 134 p. %V phd %9 phd %X Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/28481
[171]
K. Popat, S. Mukherjee, A. Yates, and G. Weikum, “STANCY: Stance Classification Based on Consistency Cues,” 2019. [Online]. Available: http://arxiv.org/abs/1910.06048. (arXiv: 1910.06048)
Abstract
Controversial claims are abundant in online media and discussion forums. A better understanding of such claims requires analyzing them from different perspectives. Stance classification is a necessary step for inferring these perspectives in terms of supporting or opposing the claim. In this work, we present a neural network model for stance classification leveraging BERT representations and augmenting them with a novel consistency constraint. Experiments on the Perspectrum dataset, consisting of claims and users' perspectives from various debate websites, demonstrate the effectiveness of our approach over state-of-the-art baselines.
Export
BibTeX
@online{Popat_arXiv1910.06048, TITLE = {{STANCY}: Stance Classification Based on Consistency Cues}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1910.06048}, EPRINT = {1910.06048}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Controversial claims are abundant in online media and discussion forums. A better understanding of such claims requires analyzing them from different perspectives. Stance classification is a necessary step for inferring these perspectives in terms of supporting or opposing the claim. In this work, we present a neural network model for stance classification leveraging BERT representations and augmenting them with a novel consistency constraint. Experiments on the Perspectrum dataset, consisting of claims and users' perspectives from various debate websites, demonstrate the effectiveness of our approach over state-of-the-art baselines.}, }
Endnote
%0 Report %A Popat, Kashyap %A Mukherjee, Subhabrata %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T STANCY: Stance Classification Based on Consistency Cues : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83E2-7 %U http://arxiv.org/abs/1910.06048 %D 2019 %X Controversial claims are abundant in online media and discussion forums. A better understanding of such claims requires analyzing them from different perspectives. Stance classification is a necessary step for inferring these perspectives in terms of supporting or opposing the claim. In this work, we present a neural network model for stance classification leveraging BERT representations and augmenting them with a novel consistency constraint. Experiments on the Perspectrum dataset, consisting of claims and users' perspectives from various debate websites, demonstrate the effectiveness of our approach over state-of-the-art baselines. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG
[172]
K. Popat, S. Mukherjee, A. Yates, and G. Weikum, “STANCY: Stance Classification Based on Consistency Cues,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.
Export
BibTeX
@inproceedings{D19-1675, TITLE = {STANCY: {S}tance Classification Based on Consistency Cues}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Yates, Andrew and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-950737-90-1}, URL = {https://www.aclweb.org/anthology/D19-1675/}, DOI = {10.18653/v1/D19-1675}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)}, EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun}, PAGES = {6412--6417}, ADDRESS = {Hong Kong, China}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %A Mukherjee, Subhabrata %A Yates, Andrew %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T STANCY: Stance Classification Based on Consistency Cues : %G eng %U http://hdl.handle.net/21.11116/0000-0005-827A-F %U https://www.aclweb.org/anthology/D19-1675/ %R 10.18653/v1/D19-1675 %D 2019 %B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %Z date of event: 2019-11-03 - 2019-11-07 %C Hong Kong, China %B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun %P 6412 - 6417 %I ACM %@ 978-1-950737-90-1
[173]
S. Razniewski, N. Jain, P. Mirza, and G. Weikum, “Coverage of Information Extraction from Sentences and Paragraphs,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.
Export
BibTeX
@inproceedings{D19-1000, TITLE = {Coverage of Information Extraction from Sentences and Paragraphs}, AUTHOR = {Razniewski, Simon and Jain, Nitisha and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-950737-90-1}, URL = {https://www.aclweb.org/anthology/D19-1000}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)}, EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun}, PAGES = {5770--5775}, ADDRESS = {Hong Kong, China}, }
Endnote
%0 Conference Proceedings %A Razniewski, Simon %A Jain, Nitisha %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Coverage of Information Extraction from Sentences and Paragraphs : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8265-6 %U https://www.aclweb.org/anthology/D19-1000 %D 2019 %B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %Z date of event: 2019-11-03 - 2019-11-07 %C Hong Kong, China %B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun %P 5770 - 5775 %I ACM %@ 978-1-950737-90-1
[174]
J. Romero, S. Razniewski, K. Pal, J. Z. Pan, A. Sakhadeo, and G. Weikum, “Commonsense Properties from Query Logs and Question Answering Forums,” in CIKM ’19, 28th ACM International Conference on Information and Knowledge Management, Beijing China, 2019.
Export
BibTeX
@inproceedings{Romero_CIKM2019, TITLE = {Commonsense Properties from Query Logs and Question Answering Forums}, AUTHOR = {Romero, Julien and Razniewski, Simon and Pal, Koninika and Pan, Jeff Z. and Sakhadeo, Archit and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {9781450369763}, DOI = {10.1145/3357384.3357955}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {CIKM '19, 28th ACM International Conference on Information and Knowledge Management}, EDITOR = {Zhu, Wenwu and Tao, Dacheng}, PAGES = {1411--1420}, ADDRESS = {Beijing China}, }
Endnote
%0 Conference Proceedings %A Romero, Julien %A Razniewski, Simon %A Pal, Koninika %A Pan, Jeff Z. %A Sakhadeo, Archit %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Commonsense Properties from Query Logs and Question Answering Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8255-8 %R 10.1145/3357384.3357955 %D 2019 %B 28th ACM International Conference on Information and Knowledge Management %Z date of event: 2019-11-03 - 2019-11-07 %C Beijing China %B CIKM '19 %E Zhu, Wenwu; Tao, Dacheng %P 1411 - 1420 %I ACM %@ 9781450369763
[175]
J. Romero, S. Razniewski, K. Pal, J. Z. Pan, A. Sakhadeo, and G. Weikum, “Commonsense Properties from Query Logs and Question Answering Forums,” 2019. [Online]. Available: http://arxiv.org/abs/1905.10989. (arXiv: 1905.10989)
Abstract
Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.
Export
BibTeX
@online{Romero_arXiv1905.10989, TITLE = {Commonsense Properties from Query Logs and Question Answering Forums}, AUTHOR = {Romero, Julien and Razniewski, Simon and Pal, Koninika and Pan, Jeff Z. and Sakhadeo, Archit and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1905.10989}, EPRINT = {1905.10989}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.}, }
Endnote
%0 Report %A Romero, Julien %A Razniewski, Simon %A Pal, Koninika %A Pan, Jeff Z. %A Sakhadeo, Archit %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Commonsense Properties from Query Logs and Question Answering Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FEEE-4 %U http://arxiv.org/abs/1905.10989 %D 2019 %X Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality. %K Computer Science, Computation and Language, cs.CL,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[176]
D. Saran, “Summarizing Dynamic Graphs using MDL,” Universität des Saarlandes, Saarbrücken, 2019.
Export
BibTeX
@mastersthesis{saran:19:dyngraphs, TITLE = {Summarizing Dynamic Graphs using {MDL}}, AUTHOR = {Saran, Divyam}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, }
Endnote
%0 Thesis %A Saran, Divyam %Y Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Summarizing Dynamic Graphs using MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FF10-8 %I Universität des Saarlandes %C Saarbrücken %D 2019 %V master %9 master
[177]
X. Shen, Y. Zhao, H. Su, and D. Klakow, “Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.
Export
BibTeX
@inproceedings{shen2019improving, TITLE = {Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator}, AUTHOR = {Shen, Xiaoyu and Zhao, Yang and Su, Hui and Klakow, Dietrich}, LANGUAGE = {eng}, ISBN = {978-1-950737-90-1}, URL = {https://www.aclweb.org/anthology/D19-1390}, DOI = {10.18653/v1/D19-1390}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)}, EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun}, PAGES = {3762--3773}, ADDRESS = {Hong Kong, China}, }
Endnote
%0 Conference Proceedings %A Shen, Xiaoyu %A Zhao, Yang %A Su, Hui %A Klakow, Dietrich %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator : %G eng %U http://hdl.handle.net/21.11116/0000-0008-13C2-7 %U https://www.aclweb.org/anthology/D19-1390 %R 10.18653/v1/D19-1390 %D 2019 %B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %Z date of event: 2019-11-03 - 2019-11-07 %C Hong Kong, China %B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun %P 3762 - 3773 %I ACM %@ 978-1-950737-90-1
[178]
X. Shen, J. Suzuki, K. Inui, H. Su, D. Klakow, and S. Sekine, “Select and Attend: Towards Controllable Content Selection in Text Generation,” in 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, China, 2019.
Export
BibTeX
@inproceedings{shen2019select, TITLE = {Select and Attend: {T}owards Controllable Content Selection in Text Generation}, AUTHOR = {Shen, Xiaoyu and Suzuki, Jun and Inui, Kentaro and Su, Hui and Klakow, Dietrich and Sekine, Satoshi}, LANGUAGE = {eng}, ISBN = {978-1-950737-90-1}, URL = {https://www.aclweb.org/anthology/D19-1054}, DOI = {10.18653/v1/D19-1054}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)}, EDITOR = {Inui, Kentaro and JIng, Jiang and Ng, Vincent and Wan, Xiaojun}, PAGES = {579--590}, ADDRESS = {Hong Kong, China}, }
Endnote
%0 Conference Proceedings %A Shen, Xiaoyu %A Suzuki, Jun %A Inui, Kentaro %A Su, Hui %A Klakow, Dietrich %A Sekine, Satoshi %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Select and Attend: Towards Controllable Content Selection in Text Generation : %G eng %U http://hdl.handle.net/21.11116/0000-0008-13BD-E %U https://www.aclweb.org/anthology/D19-1054 %R 10.18653/v1/D19-1054 %D 2019 %B Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %Z date of event: 2019-11-03 - 2019-11-07 %C Hong Kong, China %B 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing %E Inui, Kentaro; JIng, Jiang; Ng, Vincent; Wan, Xiaojun %P 579 - 590 %I ACM %@ 978-1-950737-90-1
[179]
F. M. Suchanek, J. Lajus, A. Boschin, and G. Weikum, “Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases,” in Reasoning Web -- Explainable Artificial Intelligence, Berlin: Springer, 2019.
Export
BibTeX
@incollection{Suchanek_LNCS11810, TITLE = {Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases}, AUTHOR = {Suchanek, Fabian M. and Lajus, Jonathan and Boschin, Armand and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-31422-4}, DOI = {10.1007/978-3-030-31423-1_4}, PUBLISHER = {Springer}, ADDRESS = {Berlin}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Reasoning Web -- Explainable Artificial Intelligence}, DEBUG = {author: Krötzsch, Markus; author: Stpanova, Daria}, PAGES = {110--152}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11810}, }
Endnote
%0 Book Section %A Suchanek, Fabian M. %A Lajus, Jonathan %A Boschin, Armand %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0005-8298-C %R 10.1007/978-3-030-31423-1_4 %D 2019 %B Reasoning Web -- Explainable Artificial Intelligence %E Krötzsch, Markus; Stpanova, Daria %P 110 - 152 %I Springer %C Berlin %@ 978-3-030-31422-4 %S Lecture Notes in Computer Science %N 11810
[180]
H. Su, X. Shen, R. Zhang, F. Sun, P. Hu, C. Niu, and J. Zhou, “Improving Multi-turn Dialogue Modelling with Utterance ReWriter,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 2019.
Export
BibTeX
@inproceedings{Su_2019, TITLE = {Improving Multi-turn Dialogue Modelling with Utterance {ReWriter}}, AUTHOR = {Su, Hui and Shen, Xiaoyu and Zhang, Rongzhi and Sun, Fei and Hu, Pengwei and Niu, Cheng and Zhou, Jie}, LANGUAGE = {eng}, URL = {https://www.aclweb.org/anthology/P19-1003}, DOI = {10.18653/v1/P19-1003}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)}, EDITOR = {Korhonen, Anna and Traum, David and M{\`a}rquez, Llu{\'i}s}, PAGES = {22--31}, ADDRESS = {Florence, Italy}, }
Endnote
%0 Conference Proceedings %A Su, Hui %A Shen, Xiaoyu %A Zhang, Rongzhi %A Sun, Fei %A Hu, Pengwei %A Niu, Cheng %A Zhou, Jie %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Improving Multi-turn Dialogue Modelling with Utterance ReWriter : %G eng %U http://hdl.handle.net/21.11116/0000-0005-6982-2 %U https://www.aclweb.org/anthology/P19-1003 %R 10.18653/v1/P19-1003 %D 2019 %B 57th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2019-07-28 - 2019-08-02 %C Florence, Italy %B Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics %E Korhonen, Anna; Traum, David; Màrquez, Lluís %P 22 - 31 %I ACL
[181]
N. Tatti and P. Miettinen, “Boolean Matrix Factorization Meets Consecutive Ones Property,” in Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019), Calgary, Canada, 2019.
Export
BibTeX
@inproceedings{Tatti_SDM2019, TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property}, AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-61197-567-3}, DOI = {10.1137/1.9781611975673.82}, PUBLISHER = {SIAM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 2019 SIAM International Conference on Data Mining (SDM 2019)}, EDITOR = {Berger-Wolf, Tanya and Chawla, Nitesh}, PAGES = {729--737}, ADDRESS = {Calgary, Canada}, }
Endnote
%0 Conference Proceedings %A Tatti, Nikolaj %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Boolean Matrix Factorization Meets Consecutive Ones Property : %G eng %U http://hdl.handle.net/21.11116/0000-0004-030A-E %R 10.1137/1.9781611975673.82 %D 2019 %B SIAM International Conference on Data Mining %Z date of event: 2019-05-02 - 2019-05-04 %C Calgary, Canada %B Proceedings of the 2019 SIAM International Conference on Data Mining %E Berger-Wolf, Tanya; Chawla, Nitesh %P 729 - 737 %I SIAM %@ 978-1-61197-567-3
[182]
N. Tatti and P. Miettinen, “Boolean Matrix Factorization Meets Consecutive Ones Property,” 2019. [Online]. Available: http://arxiv.org/abs/1901.05797. (arXiv: 1901.05797)
Abstract
Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well.
Export
BibTeX
@online{Tatti_arXiv1901.05797, TITLE = {Boolean Matrix Factorization Meets Consecutive Ones Property}, AUTHOR = {Tatti, Nikolaj and Miettinen, Pauli}, URL = {http://arxiv.org/abs/1901.05797}, EPRINT = {1901.05797}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well.}, }
Endnote
%0 Report %A Tatti, Nikolaj %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Boolean Matrix Factorization Meets Consecutive Ones Property : %U http://hdl.handle.net/21.11116/0000-0004-02F0-A %U http://arxiv.org/abs/1901.05797 %D 2019 %X Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well. %K Computer Science, Data Structures and Algorithms, cs.DS,Computer Science, Discrete Mathematics, cs.DM,Computer Science, Learning, cs.LG
[183]
A. Tigunova, A. Yates, P. Mirza, and G. Weikum, “Listening between the Lines: Learning Personal Attributes from Conversations,” 2019. [Online]. Available: http://arxiv.org/abs/1904.10887. (arXiv: 1904.10887)
Abstract
Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.
Export
BibTeX
@online{Tigunova_arXiv1904.10887, TITLE = {Listening between the Lines: Learning Personal Attributes from Conversations}, AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1904.10887}, EPRINT = {1904.10887}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.}, }
Endnote
%0 Report %A Tigunova, Anna %A Yates, Andrew %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Listening between the Lines: Learning Personal Attributes from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FE7F-2 %U http://arxiv.org/abs/1904.10887 %D 2019 %X Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines. %K Computer Science, Computation and Language, cs.CL
[184]
A. Tigunova, A. Yates, P. Mirza, and G. Weikum, “Listening between the Lines: Learning Personal Attributes from Conversations,” in Proceedings of The World Wide Web Conference (WWW 2019), San Francisco, CA, USA, 2019.
Export
BibTeX
@inproceedings{tigunova2019listening, TITLE = {Listening between the Lines: {L}earning Personal Attributes from Conversations}, AUTHOR = {Tigunova, Anna and Yates, Andrew and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6674-8}, DOI = {10.1145/3308558.3313498}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of The World Wide Web Conference (WWW 2019)}, EDITOR = {McAuley, Julian}, PAGES = {1818--1828}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Tigunova, Anna %A Yates, Andrew %A Mirza, Paramita %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Listening between the Lines: Learning Personal Attributes from Conversations : %G eng %U http://hdl.handle.net/21.11116/0000-0003-1460-A %R 10.1145/3308558.3313498 %D 2019 %B The Web Conference %Z date of event: 2019-05-13 - 2019-05-17 %C San Francisco, CA, USA %B Proceedings of The World Wide Web Conference %E McAuley, Julian %P 1818 - 1828 %I ACM %@ 978-1-4503-6674-8
[185]
B. D. Trisedya, G. Weikum, J. Qi, and R. Zhang, “Neural Relation Extraction for Knowledge Base Enrichment,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 2019.
Export
BibTeX
@inproceedings{Trisedya_ACL2019, TITLE = {Neural Relation Extraction for Knowledge Base Enrichment}, AUTHOR = {Trisedya, Bayu Distiawan and Weikum, Gerhard and Qi, Jianzhong and Zhang, Rui}, LANGUAGE = {eng}, URL = {https://www.aclweb.org/anthology/P19-1023}, DOI = {10.18653/v1/P19-1023}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)}, EDITOR = {Korhonen, Anna and Traum, David and M{\`a}rguez, Llu{\'i}s}, PAGES = {229--240}, ADDRESS = {Florence, Italy}, }
Endnote
%0 Conference Proceedings %A Trisedya, Bayu Distiawan %A Weikum, Gerhard %A Qi, Jianzhong %A Zhang, Rui %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Neural Relation Extraction for Knowledge Base Enrichment : %G eng %U http://hdl.handle.net/21.11116/0000-0005-6B08-B %U https://www.aclweb.org/anthology/P19-1023 %R 10.18653/v1/P19-1023 %D 2019 %B 57th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2019-07-28 - 2019-08-02 %C Florence, Italy %B Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics %E Korhonen, Anna; Traum, David; Màrguez, Lluís %P 229 - 240 %I ACL
[186]
M. Unterkalmsteiner and A. Yates, “Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation,” in Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019), Essen, Germany, 2019.
Export
BibTeX
@inproceedings{Unterkalmsteiner_NLP4RE2019, TITLE = {Expert-sourcing Domain-specific Knowledge: {The} Case of Synonym Validation}, AUTHOR = {Unterkalmsteiner, Michael and Yates, Andrew}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-2376-8}, PUBLISHER = {CEUR-WS}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) (NLP4RE 2019)}, EDITOR = {Dalpiaz, Fabiano and Ferrari, Alessio and Franch, Xavier and Gregory, Sarah and Houdek, Frank and Palomares, Cristina}, EID = {8}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2376}, ADDRESS = {Essen, Germany}, }
Endnote
%0 Conference Proceedings %A Unterkalmsteiner, Michael %A Yates, Andrew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Expert-sourcing Domain-specific Knowledge: The Case of Synonym Validation : %G eng %U http://hdl.handle.net/21.11116/0000-0004-02AE-6 %D 2019 %B 2nd Workshop on Natural Language Processing for Requirements Engineering and NLP Tool Showcase %Z date of event: 2019-03-18 - 2019-03-18 %C Essen, Germany %B Joint Proceedings of REFSQ-2019 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 25th International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2019) %E Dalpiaz, Fabiano; Ferrari, Alessio; Franch, Xavier; Gregory, Sarah; Houdek, Frank; Palomares, Cristina %Z sequence number: 8 %I CEUR-WS %B CEUR Workshop Proceedings %N 2376 %@ false %U http://ceur-ws.org/Vol-2376/NLP4RE19_paper08.pdf
[187]
M. van Leeuwen, P. Chau, J. Vreeken, D. Shahaf, and C. Faloutsos, “Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors,” ACM Transactions on Knowledge Discovery from Data, vol. 13, no. 1, 2019.
Export
BibTeX
@article{vanLeeuwen2019, TITLE = {Addendum to the Special Issue on Interactive Data Exploration and Analytics ({TKDD}, Vol. 12, Iss. 1): Introduction by the Guest Editors}, AUTHOR = {van Leeuwen, Matthijs and Chau, Polo and Vreeken, Jilles and Shahaf, Dafna and Faloutsos, Christos}, LANGUAGE = {eng}, ISSN = {1556-4681}, DOI = {10.1145/3298786}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM Transactions on Knowledge Discovery from Data}, VOLUME = {13}, NUMBER = {1}, EID = {13}, }
Endnote
%0 Journal Article %A van Leeuwen, Matthijs %A Chau, Polo %A Vreeken, Jilles %A Shahaf, Dafna %A Faloutsos, Christos %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Addendum to the Special Issue on Interactive Data Exploration and Analytics (TKDD, Vol. 12, Iss. 1): Introduction by the Guest Editors : %G eng %U http://hdl.handle.net/21.11116/0000-0003-FFD5-E %R 10.1145/3298786 %7 2019 %D 2019 %J ACM Transactions on Knowledge Discovery from Data %V 13 %N 1 %Z sequence number: 13 %I ACM %C New York, NY %@ false
[188]
H. Wang, N. Grgic-Hlaca, P. Lahoti, K. P. Gummadi, and A. Weller, “An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision,” 2019. [Online]. Available: https://arxiv.org/abs/1910.10255. (arXiv: 1910.10255)
Abstract
The notion of individual fairness requires that similar people receive similar treatment. However, this is hard to achieve in practice since it is difficult to specify the appropriate similarity metric. In this work, we attempt to learn such similarity metric from human annotated data. We gather a new dataset of human judgments on a criminal recidivism prediction (COMPAS) task. By assuming the human supervision obeys the principle of individual fairness, we leverage prior work on metric learning, evaluate the performance of several metric learning methods on our dataset, and show that the learned metrics outperform the Euclidean and Precision metric under various criteria. We do not provide a way to directly learn a similarity metric satisfying the individual fairness, but to provide an empirical study on how to derive the similarity metric from human supervisors, then future work can use this as a tool to understand human supervision.
Export
BibTeX
@online{DBLP:journals/corr/abs-1910-10255, TITLE = {An Empirical Study on Learning Fairness Metrics for {COMPAS} Data with Human Supervision}, AUTHOR = {Wang, Hanchen and Grgic-Hlaca, Nina and Lahoti, Preethi and Gummadi, Krishna P. and Weller, Adrian}, LANGUAGE = {eng}, URL = {https://arxiv.org/abs/1910.10255}, EPRINT = {1910.10255}, EPRINTTYPE = {arXiv}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The notion of individual fairness requires that similar people receive similar treatment. However, this is hard to achieve in practice since it is difficult to specify the appropriate similarity metric. In this work, we attempt to learn such similarity metric from human annotated data. We gather a new dataset of human judgments on a criminal recidivism prediction (COMPAS) task. By assuming the human supervision obeys the principle of individual fairness, we leverage prior work on metric learning, evaluate the performance of several metric learning methods on our dataset, and show that the learned metrics outperform the Euclidean and Precision metric under various criteria. We do not provide a way to directly learn a similarity metric satisfying the individual fairness, but to provide an empirical study on how to derive the similarity metric from human supervisors, then future work can use this as a tool to understand human supervision.}, }
Endnote
%0 Report %A Wang, Hanchen %A Grgic-Hlaca, Nina %A Lahoti, Preethi %A Gummadi, Krishna P. %A Weller, Adrian %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human Supervision : %G eng %U http://hdl.handle.net/21.11116/0000-0007-FCD3-F %U https://arxiv.org/abs/1910.10255 %D 2019 %X The notion of individual fairness requires that similar people receive similar treatment. However, this is hard to achieve in practice since it is difficult to specify the appropriate similarity metric. In this work, we attempt to learn such similarity metric from human annotated data. We gather a new dataset of human judgments on a criminal recidivism prediction (COMPAS) task. By assuming the human supervision obeys the principle of individual fairness, we leverage prior work on metric learning, evaluate the performance of several metric learning methods on our dataset, and show that the learned metrics outperform the Euclidean and Precision metric under various criteria. We do not provide a way to directly learn a similarity metric satisfying the individual fairness, but to provide an empirical study on how to derive the similarity metric from human supervisors, then future work can use this as a tool to understand human supervision. %K Computer Science, Computers and Society, cs.CY,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Learning, cs.LG
[189]
L. Wang, Y. Wang, G. de Melo, and G. Weikum, “Understanding Archetypes of Fake News via Fine-grained Classification,” Social Network Analysis and Mining, vol. 9, no. 1, 2019.
Export
BibTeX
@article{Wang2019_Understanding, TITLE = {Understanding Archetypes of Fake News via Fine-grained Classification}, AUTHOR = {Wang, Liqiang and Wang, Yafang and de Melo, Gerard and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {1869-5450}, DOI = {10.1007/s13278-019-0580-z}, PUBLISHER = {Springer}, ADDRESS = {Cham}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, JOURNAL = {Social Network Analysis and Mining}, VOLUME = {9}, NUMBER = {1}, EID = {37}, }
Endnote
%0 Journal Article %A Wang, Liqiang %A Wang, Yafang %A de Melo, Gerard %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Understanding Archetypes of Fake News via Fine-grained Classification : %G eng %U http://hdl.handle.net/21.11116/0000-0005-789A-7 %R 10.1007/s13278-019-0580-z %7 2019 %D 2019 %J Social Network Analysis and Mining %V 9 %N 1 %Z sequence number: 37 %I Springer %C Cham %@ false
[190]
G. Weikum, J. Hoffart, and F. Suchanek, “Knowledge Harvesting: Achievements and Challenges,” in Computing and Software Science, Berlin: Springer, 2019.
Export
BibTeX
@incollection{Weikum_KnowHarv2019, TITLE = {Knowledge Harvesting: Achievements and Challenges}, AUTHOR = {Weikum, Gerhard and Hoffart, Johannes and Suchanek, Fabian}, LANGUAGE = {eng}, ISBN = {978-3-319-91907-2}, DOI = {10.1007/978-3-319-91908-9_13}, PUBLISHER = {Springer}, ADDRESS = {Berlin}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Computing and Software Science}, EDITOR = {Steffen, Bernhard and Woeginger, Gerhard}, PAGES = {217--235}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10000}, }
Endnote
%0 Book Section %A Weikum, Gerhard %A Hoffart, Johannes %A Suchanek, Fabian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Knowledge Harvesting: Achievements and Challenges : %G eng %U http://hdl.handle.net/21.11116/0000-0005-83B1-E %R 10.1007/978-3-319-91908-9_13 %D 2019 %B Computing and Software Science %E Steffen, Bernhard; Woeginger, Gerhard %P 217 - 235 %I Springer %C Berlin %@ 978-3-319-91907-2 %S Lecture Notes in Computer Science %N 10000
[191]
A. Wisesa, F. Darari, A. Krisnadhi, W. Nutt, and S. Razniewski, “Wikidata Completeness Profiling Using ProWD,” in K-CAP’ 19, 10th International Conference on Knowledge Capture, Marina del Rey, CA, USA, 2019.
Export
BibTeX
@inproceedings{Wisesa_K-CAP2019, TITLE = {Wikidata Completeness Profiling Using {ProWD}}, AUTHOR = {Wisesa, Avicenna and Darari, Fariz and Krisnadhi, Adila and Nutt, Werner and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-1-4503-7008-0}, DOI = {10.1145/3360901.3364425}, PUBLISHER = {ACM}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {K-CAP' 19, 10th International Conference on Knowledge Capture}, EDITOR = {Kejriwal, Maynak and Szekely, Pedro}, PAGES = {123--130}, ADDRESS = {Marina del Rey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Wisesa, Avicenna %A Darari, Fariz %A Krisnadhi, Adila %A Nutt, Werner %A Razniewski, Simon %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Wikidata Completeness Profiling Using ProWD : %G eng %U http://hdl.handle.net/21.11116/0000-0005-849B-7 %R 10.1145/3360901.3364425 %D 2019 %B 10th International Conference on Knowledge Capture %Z date of event: 2019-11-19 - 2019-11-21 %C Marina del Rey, CA, USA %B K-CAP' 19 %E Kejriwal, Maynak; Szekely, Pedro %P 123 - 130 %I ACM %@ 978-1-4503-7008-0
[192]
A. Yates and M. Unterkalmsteiner, “Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain,” in Advances in Information Retrieval (ECIR 2019), Cologne, Germany, 2019.
Export
BibTeX
@inproceedings{Yates_ECIR2019, TITLE = {Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain}, AUTHOR = {Yates, Andrew and Unterkalmsteiner, Michael}, LANGUAGE = {eng}, ISBN = {978-3-030-15711-1}, DOI = {10.1007/978-3-030-15712-8_28}, PUBLISHER = {Springer}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2019)}, EDITOR = {Azzopardi, Leif and Stein, Benno and Fuhr, Norbert and Mayr, Philipp and Hauff, Claudia and Hiemstra, Djoerd}, PAGES = {429--442}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11437}, ADDRESS = {Cologne, Germany}, }
Endnote
%0 Conference Proceedings %A Yates, Andrew %A Unterkalmsteiner, Michael %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Replicating Relevance-Ranked Synonym Discovery in a New Language and Domain : %G eng %U http://hdl.handle.net/21.11116/0000-0004-029B-B %R 10.1007/978-3-030-15712-8_28 %D 2019 %B 41st European Conference on IR Research %Z date of event: 2019-04-14 - 2019-04-18 %C Cologne, Germany %B Advances in Information Retrieval %E Azzopardi, Leif; Stein, Benno; Fuhr, Norbert; Mayr, Philipp; Hauff, Claudia; Hiemstra, Djoerd %P 429 - 442 %I Springer %@ 978-3-030-15711-1 %B Lecture Notes in Computer Science %N 11437
[193]
Y. Zhao, X. Shen, W. Bi, and A. Aizawa, “Unsupervised Rewriter for Multi-Sentence Compression,” in The 57th Annual Meeting of theAssociation for Computational Linguistics (ACL 2019), Florence, Italy, 2019.
Export
BibTeX
@inproceedings{zhao2019unsupervised, TITLE = {Unsupervised Rewriter for Multi-Sentence Compression}, AUTHOR = {Zhao, Yang and Shen, Xiaoyu and Bi, Wei and Aizawa, Akiko}, LANGUAGE = {eng}, ISBN = {978-1-950737-48-2}, URL = {https://www.aclweb.org/anthology/P19-1216}, DOI = {10.18653/v1/P19-1216}, PUBLISHER = {ACL}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 57th Annual Meeting of theAssociation for Computational Linguistics (ACL 2019)}, EDITOR = {Korhonen, Anna and Traum, David and M{\`a}rquez, Llu{\'i}s}, PAGES = {2235--2240}, ADDRESS = {Florence, Italy}, }
Endnote
%0 Conference Proceedings %A Zhao, Yang %A Shen, Xiaoyu %A Bi, Wei %A Aizawa, Akiko %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Unsupervised Rewriter for Multi-Sentence Compression : %G eng %U http://hdl.handle.net/21.11116/0000-0008-14AB-1 %R 10.18653/v1/P19-1216 %U https://www.aclweb.org/anthology/P19-1216 %D 2019 %B The 57th Annual Meeting of theAssociation for Computational Linguistics %Z date of event: 2019-07-28 - 2019-08-02 %C Florence, Italy %B The 57th Annual Meeting of theAssociation for Computational Linguistics %E Korhonen, Anna; Traum, David; Màrquez, Lluís %P 2235 - 2240 %I ACL %@ 978-1-950737-48-2
2018
[194]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases,” in Proceedings of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{AbujabalWWW_2018, TITLE = {Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5639-8}, DOI = {10.1145/3178876.3186004}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel and Lalmas, Mounia and Ipeirotis, Panagiotis G.}, PAGES = {1053--1062}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3C91-8 %R 10.1145/3178876.3186004 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Proceedings of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; Médini, Lionel; Lalmas, Mounia; Ipeirotis, Panagiotis G. %P 1053 - 1062 %I ACM %@ 978-1-4503-5639-8
[195]
A. Abujabal, R. Saha Roy, M. Yahya, and G. Weikum, “ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters,” 2018. [Online]. Available: http://arxiv.org/abs/1809.09528. (arXiv: 1809.09528)
Abstract
To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.
Export
BibTeX
@online{Abujabal_arXiv1809.09528, TITLE = {{ComQA}: {A} Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters}, AUTHOR = {Abujabal, Abdalghani and Saha Roy, Rishiraj and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1809.09528}, EPRINT = {1809.09528}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.}, }
Endnote
%0 Report %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A0FE-B %U http://arxiv.org/abs/1809.09528 %D 2018 %X To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what real users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as temporal reasoning, compositionality, etc. ComQA questions come from the WikiAnswers community QA platform. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA. %K Computer Science, Computation and Language, cs.CL
[196]
P. Agarwal, J. Strötgen, L. Del Corro, J. Hoffart, and G. Weikum, “diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora,” in The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018.
Export
BibTeX
@inproceedings{AgrawalACL2018a, TITLE = {{diaNED}: {T}ime-Aware Named Entity Disambiguation for Diachronic Corpora}, AUTHOR = {Agarwal, Prabal and Str{\"o}tgen, Jannik and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-948087-34-6}, URL = {https://aclanthology.coli.uni-saarland.de/volumes/proceedings-of-the-56th-annual-meeting-of-the-association-for-computational-linguistics-volume-2-short-papers}, PUBLISHER = {ACL}, YEAR = {2018}, BOOKTITLE = {The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)}, EDITOR = {Gurevych, Iryna and Miyao, Yusuke}, PAGES = {686--693}, EID = {602}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Agarwal, Prabal %A Strötgen, Jannik %A Del Corro, Luciano %A Hoffart, Johannes %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora : %G eng %U http://hdl.handle.net/21.11116/0000-0001-9055-C %D 2018 %B The 56th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2018-07-15 - 2018-07-20 %C Melbourne, Australia %B The 56th Annual Meeting of the Association for Computational Linguistics %E Gurevych, Iryna; Miyao, Yusuke %P 686 - 693 %Z sequence number: 602 %I ACL %@ 978-1-948087-34-6 %U http://aclweb.org/anthology/P18-2109
[197]
M. Antenore, G. Leone, A. Panconesi, and E. Terolli, “Together We Buy, Alone I Quit: Some Experimental Studies of Online Persuaders,” in DTUC’18 Digital Tools & Uses Congres, Paris, France, 2018.
Export
BibTeX
@inproceedings{Antenore:2018:TWB:3240117.3240119, TITLE = {Together We Buy, Alone {I} Quit: {S}ome Experimental Studies of Online Persuaders}, AUTHOR = {Antenore, Marzia and Leone, Giovanna and Panconesi, Alessandro and Terolli, Erisa}, LANGUAGE = {eng}, ISBN = {978-1-4503-6451-5}, DOI = {10.1145/3240117.3240119}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {DTUC'18 Digital Tools \& Uses Congres}, EDITOR = {Reyes, E. and Szoniecky, S. and Mkadmi, A. and Kembellec, G. and Fournier-S'niehotta, R. and Siala-Kallel, F. and Ammi, M. and Labelle, S.}, EID = {2}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A Antenore, Marzia %A Leone, Giovanna %A Panconesi, Alessandro %A Terolli, Erisa %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Together We Buy, Alone I Quit: Some Experimental Studies of Online Persuaders : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A89D-0 %R 10.1145/3240117.3240119 %D 2018 %B First International Digital Tools & Uses Congress %Z date of event: 2018-10-03 - 2018-10-05 %C Paris, France %B DTUC'18 Digital Tools & Uses Congres %E Reyes, E.; Szoniecky, S.; Mkadmi, A.; Kembellec, G.; Fournier-S'niehotta, R.; Siala-Kallel, F.; Ammi, M.; Labelle, S. %Z sequence number: 2 %I ACM %@ 978-1-4503-6451-5
[198]
O. Balalau, C. Castillo, and M. Sozio, “EviDense: A Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions,” in Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018), Stanford, CA, USA, 2018.
Export
BibTeX
@inproceedings{Balalau_ICWSM2018, TITLE = {{EviDense}: {A} Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions}, AUTHOR = {Balalau, Oana and Castillo, Carlos and Sozio, Mauro}, LANGUAGE = {eng}, ISBN = {978-1-57735-798-8}, PUBLISHER = {AAAI}, YEAR = {2018}, BOOKTITLE = {Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM 2018)}, PAGES = {560--563}, ADDRESS = {Stanford, CA, USA}, }
Endnote
%0 Conference Proceedings %A Balalau, Oana %A Castillo, Carlos %A Sozio, Mauro %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T EviDense: A Graph-Based Method for Finding Unique High-Impact Events with Succinct Keyword-Based Descriptions : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9CE8-9 %D 2018 %B 12th International AAAI Conference on Web and Social Media %Z date of event: 2018-06-25 - 2018-06-28 %C Stanford, CA, USA %B Proceedings of the Twelfth International AAAI Conference on Web and Social Media %P 560 - 563 %I AAAI %@ 978-1-57735-798-8 %U https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17889
[199]
V. Balaraman, S. Razniewski, and W. Nutt, “Recoin: Relative Completeness in Wikidata,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{BalaramanWWW2017, TITLE = {Recoin: {R}elative Completeness in {W}ikidata}, AUTHOR = {Balaraman, Vevake and Razniewski, Simon and Nutt, Werner}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3191641}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {1787--1792}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Balaraman, Vevake %A Razniewski, Simon %A Nutt, Werner %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Recoin: Relative Completeness in Wikidata : %G eng %U http://hdl.handle.net/21.11116/0000-0001-414A-3 %R 10.1145/3184558.3191641 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; Médini, Lionel %P 1787 - 1792 %I ACM %@ 978-1-4503-5640-4
[200]
A. J. Biega, K. P. Gummadi, and G. Weikum, “Equity of Attention: Amortizing Individual Fairness in Rankings,” in SIGIR’18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 2018.
Export
BibTeX
@inproceedings{BiegaSIGIR2018, TITLE = {Equity of Attention: {A}mortizing Individual Fairness in Rankings}, AUTHOR = {Biega, Asia J. and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5022-8}, DOI = {10.1145/3209978.3210063}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {SIGIR'18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {405--414}, ADDRESS = {Ann Arbor, MI, USA}, }
Endnote
%0 Conference Proceedings %A Biega, Asia J. %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Equity of Attention: Amortizing Individual Fairness in Rankings : %G eng %U http://hdl.handle.net/21.11116/0000-0002-0D8A-5 %R 10.1145/3209978.3210063 %D 2018 %B 41st International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2018-07-08 - 2018-07-12 %C Ann Arbor, MI, USA %B SIGIR'18 %P 405 - 414 %I ACM %@ 978-1-4503-5022-8
[201]
A. J. Biega, K. P. Gummadi, and G. Weikum, “Equity of Attention: Amortizing Individual Fairness in Rankings,” 2018. [Online]. Available: http://arxiv.org/abs/1805.01788. (arXiv: 1805.01788)
Abstract
Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality.
Export
BibTeX
@online{Biega_arXiv1805.01788, TITLE = {Equity of Attention: Amortizing Individual Fairness in Rankings}, AUTHOR = {Biega, Asia J. and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1805.01788}, EPRINT = {1805.01788}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality.}, }
Endnote
%0 Report %A Biega, Asia J. %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Equity of Attention: Amortizing Individual Fairness in Rankings : %G eng %U http://hdl.handle.net/21.11116/0000-0002-1563-7 %U http://arxiv.org/abs/1805.01788 %D 2018 %X Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computers and Society, cs.CY
[202]
N. Boldyrev, M. Spaniol, and G. Weikum, “Multi-Cultural Interlinking of Web Taxonomies with ACROSS,” The Journal of Web Science, vol. 4, no. 2, 2018.
Export
BibTeX
@article{Boldyrev2018, TITLE = {Multi-Cultural Interlinking of Web Taxonomies with {ACROSS}}, AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1561/106.00000012}, PUBLISHER = {Now Publishers}, ADDRESS = {Boston}, YEAR = {2018}, JOURNAL = {The Journal of Web Science}, VOLUME = {4}, NUMBER = {2}, PAGES = {20--33}, }
Endnote
%0 Journal Article %A Boldyrev, Natalia %A Spaniol, Marc %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Multi-Cultural Interlinking of Web Taxonomies with ACROSS : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3CA4-3 %R 10.1561/106.00000012 %7 2018 %D 2018 %J The Journal of Web Science %O Web Science %V 4 %N 2 %& 20 %P 20 - 33 %I Now Publishers %C Boston
[203]
K. Budhathoki, M. Boley, and J. Vreeken, “Rule Discovery for Exploratory Causal Reasoning,” in Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018), Montréal, Canada, 2018.
Export
BibTeX
@inproceedings{budhathoki:18:dice, TITLE = {Rule Discovery for Exploratory Causal Reasoning}, AUTHOR = {Budhathoki, Kailash and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {https://drive.google.com/file/d/1r-KTsok3VLIz-wUh0YtsK5YaEu53DcTf/view}, YEAR = {2018}, BOOKTITLE = {Proceedings of the NeurIPS 2018 workshop on Causal Learning (NeurIPS CL 2018)}, EID = {14}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Rule Discovery for Exploratory Causal Reasoning : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EBC-9 %U https://drive.google.com/file/d/1r-KTsok3VLIz-wUh0YtsK5YaEu53DcTf/view %D 2018 %B NeurIPS 2018 Workshop on Causal Learning %Z date of event: 2018-12-07 - 2018-12-07 %C Montréal, Canada %B Proceedings of the NeurIPS 2018 workshop on Causal Learning %Z sequence number: 14
[204]
K. Budhathoki and J. Vreeken, “Origo: Causal Inference by Compression,” Knowledge and Information Systems, vol. 56, no. 2, 2018.
Export
BibTeX
@article{Budhathoki2018, TITLE = {Origo: {C}ausal Inference by Compression}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISSN = {0219-1377}, DOI = {10.1007/s10115-017-1130-5}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2018}, DATE = {2018}, JOURNAL = {Knowledge and Information Systems}, VOLUME = {56}, NUMBER = {2}, PAGES = {285--307}, }
Endnote
%0 Journal Article %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Origo: Causal Inference by Compression : %G eng %U http://hdl.handle.net/21.11116/0000-0001-AF2B-B %R 10.1007/s10115-017-1130-5 %7 2018 %D 2018 %J Knowledge and Information Systems %V 56 %N 2 %& 285 %P 285 - 307 %I Springer %C New York, NY %@ false
[205]
K. Budhathoki and J. Vreeken, “Causal Inference on Event Sequences,” in Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018), San Diego, CA, USA, 2018.
Export
BibTeX
@inproceedings{budhathoki_SDM2018, TITLE = {Causal Inference on Event Sequences}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-532-1}, DOI = {10.1137/1.9781611975321.7}, PUBLISHER = {SIAM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018)}, EDITOR = {Ester, Martin and Pedreschi, Dino}, PAGES = {55--63}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference on Event Sequences : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5F34-A %R 10.1137/1.9781611975321.7 %D 2018 %B SIAM International Conference on Data Mining %Z date of event: 2018-05-03 - 2018-05-05 %C San Diego, CA, USA %B Proceedings of the 2018 SIAM International Conference on Data Mining %E Ester, Martin; Pedreschi, Dino %P 55 - 63 %I SIAM %@ 978-1-61197-532-1
[206]
K. Budhathoki and J. Vreeken, “Accurate Causal Inference on Discrete Data,” in IEEE International Conference on Data Mining (ICDM 2018), Singapore, Singapore, 2018.
Export
BibTeX
@inproceedings{budhathoki:18:acid, TITLE = {Accurate Causal Inference on Discrete Data}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-9159-5}, DOI = {10.1109/ICDM.2018.00105}, PUBLISHER = {IEEE}, YEAR = {2018}, BOOKTITLE = {IEEE International Conference on Data Mining (ICDM 2018)}, PAGES = {881--886}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Accurate Causal Inference on Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9E96-3 %R 10.1109/ICDM.2018.00105 %D 2018 %B IEEE International Conference on Data Mining %Z date of event: 2018-11-17 - 2018-11-20 %C Singapore, Singapore %B IEEE International Conference on Data Mining %P 881 - 886 %I IEEE %@ 978-1-5386-9159-5
[207]
A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney, and N. Goharian, “SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions,” 2018. [Online]. Available: http://arxiv.org/abs/1806.05258. (arXiv: 1806.05258)
Abstract
Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language.
Export
BibTeX
@online{cohan_arXiv1806.05258, TITLE = {{SMHD}: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions}, AUTHOR = {Cohan, Arman and Desmet, Bart and Yates, Andrew and Soldaini, Luca and MacAvaney, Sean and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1806.05258}, EPRINT = {1806.05258}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language.}, }
Endnote
%0 Report %A Cohan, Arman %A Desmet, Bart %A Yates, Andrew %A Soldaini, Luca %A MacAvaney, Sean %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ED4-6 %U http://arxiv.org/abs/1806.05258 %D 2018 %X Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language. %K Computer Science, Computation and Language, cs.CL
[208]
A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney, and N. Goharian, “SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions,” in The 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, NM, USA, 2018.
Export
BibTeX
@inproceedings{Cohan_COLING2018, TITLE = {{SMHD}: {A} Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions}, AUTHOR = {Cohan, Arman and Desmet, Bart and Yates, Andrew and Soldaini, Luca and MacAvaney, Sean and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {978-1-948087-50-6}, URL = {http://aclweb.org/anthology/C18-1126}, PUBLISHER = {ACL}, YEAR = {2018}, BOOKTITLE = {The 27th International Conference on Computational Linguistics (COLING 2018)}, EDITOR = {Bender, Emily M. and Derczynski, Leon and Isabelle, Pierre}, PAGES = {1485--1497}, ADDRESS = {Santa Fe, NM, USA}, }
Endnote
%0 Conference Proceedings %A Cohan, Arman %A Desmet, Bart %A Yates, Andrew %A Soldaini, Luca %A MacAvaney, Sean %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E91-1 %U http://aclweb.org/anthology/C18-1126 %D 2018 %B 27th International Conference on Computational Linguistics %Z date of event: 2018-08-20 - 2018-08-26 %C Santa Fe, NM, USA %B The 27th International Conference on Computational Linguistics %E Bender, Emily M.; Derczynski, Leon; Isabelle, Pierre %P 1485 - 1497 %I ACL %@ 978-1-948087-50-6
[209]
M. Danisch, O. Balalau, and M. Sozio, “Listing k-cliques in Sparse Real-World Graphs,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{Danisch_WWW2018, TITLE = {Listing k-cliques in Sparse Real-World Graphs}, AUTHOR = {Danisch, Maximilien and Balalau, Oana and Sozio, Mauro}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3178876.3186125}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {589--598}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Danisch, Maximilien %A Balalau, Oana %A Sozio, Mauro %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Listing k-cliques in Sparse Real-World Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9CDE-5 %R 10.1145/3178876.3186125 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; Médini, Lionel %P 589 - 598 %I ACM %@ 978-1-4503-5640-4
[210]
F. Darari, W. Nutt, G. Pirrò, and S. Razniewski, “Completeness Management for RDF Data Sources,” ACM Transactions on the Web, vol. 12, no. 3, 2018.
Export
BibTeX
@article{Darari2018, TITLE = {Completeness Management for {RDF} Data Sources}, AUTHOR = {Darari, Fariz and Nutt, Werner and Pirr{\`o}, Giuseppe and Razniewski, Simon}, LANGUAGE = {eng}, DOI = {10.1145/3196248}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2018}, DATE = {2018}, JOURNAL = {ACM Transactions on the Web}, VOLUME = {12}, NUMBER = {3}, EID = {18}, }
Endnote
%0 Journal Article %A Darari, Fariz %A Nutt, Werner %A Pirrò, Giuseppe %A Razniewski, Simon %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Completeness Management for RDF Data Sources : %G eng %U http://hdl.handle.net/21.11116/0000-0001-E17F-3 %R 10.1145/3196248 %7 2018 %D 2018 %J ACM Transactions on the Web %V 12 %N 3 %Z sequence number: 18 %I ACM %C New York, NY
[211]
F. Darari, W. Nutt, and S. Razniewski, “Comparing Index Structures for Completeness Reasoning,” in IWBIS 2018, International Workshop on Big Data and Information Security, Jakarta, Indonesia, 2018.
Export
BibTeX
@inproceedings{DarariIWBIS2018, TITLE = {Comparing Index Structures for Completeness Reasoning}, AUTHOR = {Darari, Fariz and Nutt, Werner and Razniewski, Simon}, LANGUAGE = {eng}, ISBN = {978-1-5386-5525-2}, DOI = {10.1109/IWBIS.2018.8471712}, PUBLISHER = {IEEE}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {IWBIS 2018, International Workshop on Big Data and Information Security}, PAGES = {49--56}, ADDRESS = {Jakarta, Indonesia}, }
Endnote
%0 Conference Proceedings %A Darari, Fariz %A Nutt, Werner %A Razniewski, Simon %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Comparing Index Structures for Completeness Reasoning : %G eng %U http://hdl.handle.net/21.11116/0000-0001-E193-A %R 10.1109/IWBIS.2018.8471712 %D 2018 %B International Workshop on Big Data and Information Security %Z date of event: 2018-05-12 - 2018-05-13 %C Jakarta, Indonesia %B IWBIS 2018 %P 49 - 56 %I IEEE %@ 978-1-5386-5525-2
[212]
S. Degaetano-Ortlieb and J. Strötgen, “Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy,” in Language Technologies for the Challenges of the Digital Age (GSCL 2017), Berlin, Germany, 2018.
Export
BibTeX
@inproceedings{DegaetanoortliebStroetgen2017, TITLE = {Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy}, AUTHOR = {Degaetano-Ortlieb, Stefania and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-3-319-73705-8}, DOI = {10.1007/978-3-319-73706-5_22}, PUBLISHER = {Springer}, YEAR = {2017}, DATE = {2018}, BOOKTITLE = {Language Technologies for the Challenges of the Digital Age (GSCL 2017)}, EDITOR = {Rehm, Georg and Declerck, Thierry}, PAGES = {259--275}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {10713}, ADDRESS = {Berlin, Germany}, }
Endnote
%0 Conference Proceedings %A Degaetano-Ortlieb, Stefania %A Strötgen, Jannik %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-A8E8-5 %R 10.1007/978-3-319-73706-5_22 %D 2018 %B Conference of the German Society for Computational Linguistics and Language Technology %Z date of event: 2017-09-13 - 2017-09-14 %C Berlin, Germany %B Language Technologies for the Challenges of the Digital Age %E Rehm, Georg; Declerck, Thierry %P 259 - 275 %I Springer %@ 978-3-319-73705-8 %B Lecture Notes in Artificial Intelligence %N 10713
[213]
P. Ernst, A. Siu, and G. Weikum, “HighLife: Higher-arity Fact Harvesting,” in Proceedings of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{ErnstlWWW_2018, TITLE = {{HighLife}: Higher-arity Fact Harvesting}, AUTHOR = {Ernst, Patrick and Siu, Amy and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5639-8}, DOI = {10.1145/3178876.3186000}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Proceedings of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel and Lalmas, Mounia and Ipeirotis, Panagiotis G.}, PAGES = {1013--1022}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Ernst, Patrick %A Siu, Amy %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T HighLife: Higher-arity Fact Harvesting : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3C96-3 %R 10.1145/3178876.3186000 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Proceedings of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; Médini, Lionel; Lalmas, Mounia; Ipeirotis, Panagiotis G. %P 1013 - 1022 %I ACM %@ 978-1-4503-5639-8
[214]
P. Ernst, “Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems,” Universität des Saarlandes, Saarbrücken, 2018.
Abstract
While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.
Export
BibTeX
@phdthesis{Ernstphd2017, TITLE = {Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems}, AUTHOR = {Ernst, Patrick}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-271051}, DOI = {10.22028/D291-27105}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, ABSTRACT = {While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: -- To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. -- To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. -- To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.}, }
Endnote
%0 Thesis %A Ernst, Patrick %Y Weikum, Gerhard %A referee: Verspoor, Karin %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems : %G eng %U http://hdl.handle.net/21.11116/0000-0001-1864-4 %U urn:nbn:de:bsz:291-scidok-ds-271051 %R 10.22028/D291-27105 %I Universität des Saarlandes %C Saarbrücken %D 2018 %8 20.02.2018 %P 147 p. %V phd %9 phd %X While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26987
[215]
A. K. Fischer, J. Vreeken, and D. Klakov, “Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL,” Computación y Sistemas, vol. 21, no. 4, 2018.
Export
BibTeX
@article{Fischer2018, TITLE = {Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by {MDL}}, AUTHOR = {Fischer, Andrea K. and Vreeken, Jilles and Klakov, Dietrich}, LANGUAGE = {eng}, DOI = {10.13053/CyS-21-4-2865}, PUBLISHER = {Instituto Polit{\'e}cnico Nacional}, ADDRESS = {M{\'e}xico}, YEAR = {2018}, JOURNAL = {Computaci{\'o}n y Sistemas}, VOLUME = {21}, NUMBER = {4}, PAGES = {829--839}, }
Endnote
%0 Journal Article %A Fischer, Andrea K. %A Vreeken, Jilles %A Klakov, Dietrich %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Beyond Pairwise Similarity: Quantifying and Characterizing Linguistic Similarity between Groups of Languages by MDL : %G eng %U http://hdl.handle.net/21.11116/0000-0001-4156-5 %R 10.13053/CyS-21-4-2865 %7 2018 %D 2018 %J Computación y Sistemas %V 21 %N 4 %& 829 %P 829 - 839 %I Instituto Politécnico Nacional %C México %U http://www.redalyc.org/articulo.oa?id=61553900023
[216]
E. Galbrun and P. Miettinen, “Mining Redescriptions with Siren,” ACM Transactions on Knowledge Discovery from Data, vol. 12, no. 1, 2018.
Export
BibTeX
@article{galbrun17mining, TITLE = {Mining Redescriptions with {Siren}}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1145/3007212}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2018}, JOURNAL = {ACM Transactions on Knowledge Discovery from Data}, VOLUME = {12}, NUMBER = {1}, EID = {6}, }
Endnote
%0 Journal Article %A Galbrun, Esther %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Mining Redescriptions with Siren : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-227B-F %R 10.1145/3007212 %7 2018 %D 2018 %J ACM Transactions on Knowledge Discovery from Data %V 12 %N 1 %Z sequence number: 6 %I ACM %C New York, NY
[217]
E. Gius, N. Reiter, J. Strötgen, and M. Willand, “SANTA: Systematische Analyse Narrativer Texte durch Annotation,” in DHd 2018, 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V., Köln, Germany, 2018.
Export
BibTeX
@inproceedings{GiusDHd2018, TITLE = {{{SANTA}: {Systematische Analyse Narrativer Texte durch Annotation}}}, AUTHOR = {Gius, Evelyn and Reiter, Nils and Str{\"o}tgen, Jannik and Willand, Marcus}, LANGUAGE = {deu}, ISBN = {978-3-946275-02-2}, URL = {http://dhd2018.uni-koeln.de/}, YEAR = {2018}, BOOKTITLE = {DHd 2018, 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.}, PAGES = {302--305}, ADDRESS = {K{\"o}ln, Germany}, }
Endnote
%0 Conference Proceedings %A Gius, Evelyn %A Reiter, Nils %A Strötgen, Jannik %A Willand, Marcus %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T SANTA: Systematische Analyse Narrativer Texte durch Annotation : %G deu %U http://hdl.handle.net/11858/00-001M-0000-002E-73EC-4 %D 2018 %B 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V. %Z date of event: 2018-02-26 - 2018-03-02 %C Köln, Germany %B DHd 2018 %P 302 - 305 %@ 978-3-946275-02-2
[218]
D. Gupta and K. Berberich, “GYANI: An Indexing Infrastructure for Knowledge-Centric Tasks,” in CIKM’18, 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 2018.
Export
BibTeX
@inproceedings{Gupta_CIKM2018, TITLE = {{GYANI}: {A}n Indexing Infrastructure for Knowledge-Centric Tasks}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-6014-2}, DOI = {10.1145/3269206.3271745}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {CIKM'18, 27th ACM International Conference on Information and Knowledge Management}, EDITOR = {Cuzzocrea, Alfredo and Allan, James and Paton, Norman and Srivastava, Divesh and Agrawal, Rakesh and Broder, Andrei and Zaki, Mohamed and Candan, Selcuk and Labrinidis, Alexandros and Schuster, Assaf and Wang, Haixun}, PAGES = {487--496}, ADDRESS = {Torino, Italy}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T GYANI: An Indexing Infrastructure for Knowledge-Centric Tasks : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A8B7-2 %R 10.1145/3269206.3271745 %D 2018 %B 27th ACM International Conference on Information and Knowledge Management %Z date of event: 2018-10-22 - 2018-10-26 %C Torino, Italy %B CIKM'18 %E Cuzzocrea, Alfredo; Allan, James; Paton, Norman; Srivastava, Divesh; Agrawal, Rakesh; Broder, Andrei; Zaki, Mohamed; Candan, Selcuk; Labrinidis, Alexandros; Schuster, Assaf; Wang, Haixun %P 487 - 496 %I ACM %@ 978-1-4503-6014-2
[219]
D. Gupta and K. Berberich, “Identifying Time Intervals for Knowledge Graph Facts,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{GuptaWWW2017, TITLE = {Identifying Time Intervals for Knowledge Graph Facts}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3186917}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {37--38}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Identifying Time Intervals for Knowledge Graph Facts : %G eng %U http://hdl.handle.net/21.11116/0000-0001-411F-4 %R 10.1145/3184558.3186917 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; Médini, Lionel %P 37 - 38 %I ACM %@ 978-1-4503-5640-4
[220]
D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti, “Generating Semantic Aspects for Queries,” in JCDL’18, Joint Conference on Digital Libraries, Fort Worth, TX, USA, 2018.
Export
BibTeX
@inproceedings{GuptaJCDL2018, TITLE = {Generating Semantic Aspects for Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-4503-5178-2}, DOI = {10.1145/3197026.3203900}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {JCDL'18, Joint Conference on Digital Libraries}, PAGES = {335--336}, ADDRESS = {Fort Worth, TX, USA}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %A Strötgen, Jannik %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Generating Semantic Aspects for Queries : %G eng %U http://hdl.handle.net/21.11116/0000-0001-904D-6 %R 10.1145/3197026.3203900 %D 2018 %B Joint Conference on Digital Libraries %Z date of event: 2018-06-03 - 2018-06-07 %C Fort Worth, TX, USA %B JCDL'18 %P 335 - 336 %I ACM %@ 978-1-4503-5178-2
[221]
G. Haratinezhad Torbati, “Joint Disambiguation of Named Entities and Concepts,” Universität des Saarlandes, Saarbrücken, 2018.
Export
BibTeX
@mastersthesis{torbati2018concept, TITLE = {Joint Disambiguation of Named Entities and Concepts}, AUTHOR = {Haratinezhad Torbati, Ghazaleh}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, DATE = {2018}, }
Endnote
%0 Thesis %A Haratinezhad Torbati, Ghazaleh %Y Del Corro, Luciano %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Joint Disambiguation of Named Entities and Concepts : %G eng %U http://hdl.handle.net/21.11116/0000-0003-38D0-3 %I Universität des Saarlandes %C Saarbrücken %D 2018 %P XIII, 70 p. %V master %9 master
[222]
A. Horňáková, M. List, J. Vreeken, and M. H. Schulz, “JAMI: Fast Computation of Conditional Mutual Information for ceRNA Network Analysis,” Bioinformatics, vol. 34, no. 17, 2018.
Export
BibTeX
@article{Hornakova_Bioinformatics2018, TITLE = {{JAMI}: {F}ast Computation of Conditional Mutual Information for {ceRNA} Network Analysis}, AUTHOR = {Hor{\v n}{\'a}kov{\'a}, Andrea and List, Markus and Vreeken, Jilles and Schulz, Marcel H.}, LANGUAGE = {eng}, ISSN = {1367-4803}, DOI = {10.1093/bioinformatics/bty221}, PUBLISHER = {Oxford University Press}, ADDRESS = {Oxford}, YEAR = {2018}, DATE = {2018}, JOURNAL = {Bioinformatics}, VOLUME = {34}, NUMBER = {17}, PAGES = {3050--3051}, }
Endnote
%0 Journal Article %A Horňáková, Andrea %A List, Markus %A Vreeken, Jilles %A Schulz, Marcel H. %+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society %T JAMI: Fast Computation of Conditional Mutual Information for ceRNA Network Analysis : %G eng %U http://hdl.handle.net/21.11116/0000-0002-573A-C %R 10.1093/bioinformatics/bty221 %7 2018 %D 2018 %J Bioinformatics %V 34 %N 17 %& 3050 %P 3050 - 3051 %I Oxford University Press %C Oxford %@ false
[223]
V. T. Ho, “An Embedding-based Approach to Rule Learning from Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2018.
Abstract
Knowledge Graphs (KGs) play an important role in various information systems and have application in many fields such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts. Rules over a Knowledge Graph capture interpretable patterns in data and various methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as confidence reflect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated. Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external information sources including text corpora. The contributions of this thesis are as follows: • We introduce a framework for rule learning guided by external sources. • We propose a concrete instantiation of our framework to show how to learn high- quality rules by utilizing feedback from a pretrained embedding model. • We conducted experiments on real-world KGs that demonstrate the effectiveness of our novel approach with respect to both the quality of the learned rules and fact predictions that they produce.
Export
BibTeX
@mastersthesis{HoMaster2018, TITLE = {An Embedding-based Approach to Rule Learning from Knowledge Graphs}, AUTHOR = {Ho, Vinh Thinh}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, DATE = {2018}, ABSTRACT = {Knowledge Graphs (KGs) play an important role in various information systems and have application in many {fi}elds such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts. Rules over a Knowledge Graph capture interpretable patterns in data and various methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as con{fi}dence re{fl}ect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated. Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external information sources including text corpora. The contributions of this thesis are as follows: \mbox{$\bullet$} We introduce a framework for rule learning guided by external sources. \mbox{$\bullet$} We propose a concrete instantiation of our framework to show how to learn high- quality rules by utilizing feedback from a pretrained embedding model. \mbox{$\bullet$} We conducted experiments on real-world KGs that demonstrate the effectiveness of our novel approach with respect to both the quality of the learned rules and fact predictions that they produce.}, }
Endnote
%0 Thesis %A Ho, Vinh Thinh %A referee: Weikum, Gerhard %Y Stepanova, Daria %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Embedding-based Approach to Rule Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0001-DE06-F %I Universität des Saarlandes %C Saarbrücken %D 2018 %P 60 %V master %9 master %X Knowledge Graphs (KGs) play an important role in various information systems and have application in many fields such as Semantic Web Search, Question Answering and Information Retrieval. KGs present information in the form of entities and relationships between them. Modern KGs could contain up to millions of entities and billions of facts, and they are usually built using automatic construction methods. As a result, despite the huge size of KGs, a large number of facts between their entities are still missing. That is the reason why we see the importance of the task of Knowledge Graph Completion (a.k.a. Link Prediction), which concerns the prediction of those missing facts. Rules over a Knowledge Graph capture interpretable patterns in data and various methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as confidence reflect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So, it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated. Therefore, the ranking and pruning of candidate rules are major problems. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and optionally external information sources including text corpora. The contributions of this thesis are as follows: • We introduce a framework for rule learning guided by external sources. • We propose a concrete instantiation of our framework to show how to learn high- quality rules by utilizing feedback from a pretrained embedding model. • We conducted experiments on real-world KGs that demonstrate the effectiveness of our novel approach with respect to both the quality of the learned rules and fact predictions that they produce.
[224]
V. T. Ho, D. Stepanova, M. H. Gad-Elrab, E. Kharlamov, and G. Weikum, “Rule Learning from Knowledge Graphs Guided by Embedding Models,” in The Semantic Web -- ISWC 2018, Monterey, CA, USA, 2018.
Export
BibTeX
@inproceedings{StepanovaISWC2018, TITLE = {Rule Learning from Knowledge Graphs Guided by Embedding Models}, AUTHOR = {Ho, Vinh Thinh and Stepanova, Daria and Gad-Elrab, Mohamed Hassan and Kharlamov, Evgeny and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-030-00670-9}, DOI = {10.1007/978-3-030-00671-6_5}, PUBLISHER = {Springer}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {The Semantic Web -- ISWC 2018}, EDITOR = {Vrande{\v c}i{\'c}, Denny and Bontcheva, Kalina and Su{\'a}rez-Figueroa, Mari Carmen and Presutti, Valentina and Celino, Irene and Sabou, Marta and Kaffee, Lucie-Aim{\'e}e and Simperl, Elena}, PAGES = {72--90}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {11136}, ADDRESS = {Monterey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Stepanova, Daria %A Gad-Elrab, Mohamed Hassan %A Kharlamov, Evgeny %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Rule Learning from Knowledge Graphs Guided by Embedding Models : %G eng %U http://hdl.handle.net/21.11116/0000-0001-9058-9 %R 10.1007/978-3-030-00671-6_5 %D 2018 %B The 17th International Semantic Web Conference %Z date of event: 2018-10-08 - 2018-10-12 %C Monterey, CA, USA %B The Semantic Web -- ISWC 2018 %E Vrandečić, Denny; Bontcheva, Kalina; Suárez-Figueroa, Mari Carmen; Presutti, Valentina; Celino, Irene; Sabou, Marta; Kaffee, Lucie-Aimée; Simperl, Elena %P 72 - 90 %I Springer %@ 978-3-030-00670-9 %B Lecture Notes in Computer Science %N 11136
[225]
V. T. Ho, D. Stepanova, M. H. Gad-Elrab, E. Kharlamov, and G. Weikum, “Learning Rules from Incomplete KGs using Embeddings,” in ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks (ISWC-P&D-Industry-BlueSky 2018), Monterey, CA, USA, 2018.
Export
BibTeX
@inproceedings{StepanovaISWC2018b, TITLE = {Learning Rules from Incomplete {KGs} using Embeddings}, AUTHOR = {Ho, Vinh Thinh and Stepanova, Daria and Gad-Elrab, Mohamed Hassan and Kharlamov, Evgeny and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://ceur-ws.org/Vol-2180/paper-25.pdf; urn:nbn:de:0074-2180-3}, PUBLISHER = {ceur.ws.org}, YEAR = {2018}, BOOKTITLE = {ISWC 2018 Posters \& Demonstrations, Industry and Blue Sky Ideas Tracks (ISWC-P\&D-Industry-BlueSky 2018)}, EDITOR = {van Erp, Marieke and Atre, Medha and Lopez, Vanessa and Srinivas, Kavitha and Fortuna, Carolina}, EID = {25}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2180}, ADDRESS = {Monterey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Ho, Vinh Thinh %A Stepanova, Daria %A Gad-Elrab, Mohamed Hassan %A Kharlamov, Evgeny %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Learning Rules from Incomplete KGs using Embeddings : %G eng %U http://hdl.handle.net/21.11116/0000-0001-905B-6 %U http://ceur-ws.org/Vol-2180/paper-25.pdf %D 2018 %B The 17th International Semantic Web Conference %Z date of event: 2018-10-08 - 2018-10-12 %C Monterey, CA, USA %B ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks %E van Erp, Marieke; Atre, Medha; Lopez, Vanessa; Srinivas, Kavitha; Fortuna, Carolina %Z sequence number: 25 %I ceur.ws.org %B CEUR Workshop Proceedings %N 2180
[226]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.
Export
BibTeX
@inproceedings{Hui_WSDM2018, TITLE = {Co-{PACRR}: {A} Context-Aware Neural {IR} Model for Ad-hoc Retrieval}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5581-0}, DOI = {10.1145/3159652.3159689}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining}, PAGES = {279--287}, ADDRESS = {Marina Del Rey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0000-6367-D %R 10.1145/3159652.3159689 %D 2018 %B 11th ACM International Conference on Web Search and Data Mining %Z date of event: 2018-02-05 - 2018-02-09 %C Marina Del Rey, CA, USA %B WSDM'18 %P 279 - 287 %I ACM %@ 978-1-4503-5581-0
[227]
M. Humble, “Redescription Mining on Financial Time Series Data,” Universität des Saarlandes, Saarbrücken, 2018.
Export
BibTeX
@mastersthesis{Humble_BSc2017, TITLE = {Redescription Mining on Financial Time Series Data}, AUTHOR = {Humble, Megan}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, DATE = {2018}, TYPE = {Bachelor's thesis}, }
Endnote
%0 Thesis %A Humble, Megan %Y Miettinen, Pauli %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Redescription Mining on Financial Time Series Data : %G eng %U http://hdl.handle.net/21.11116/0000-0002-F042-4 %I Universität des Saarlandes %C Saarbrücken %D 2018 %P XV, 100 p. %V bachelor %9 bachelor
[228]
H. Jhavar and P. Mirza, “EMOFIEL: Mapping Emotions of Relationships in a Story,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{JhavarWWW2018, TITLE = {{EMOFIEL}: {M}apping Emotions of Relationships in a Story}, AUTHOR = {Jhavar, Harshita and Mirza, Paramita}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3186989}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {243--246}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Jhavar, Harshita %A Mirza, Paramita %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T EMOFIEL: Mapping Emotions of Relationships in a Story : %G eng %U http://hdl.handle.net/21.11116/0000-0001-4B96-2 %R 10.1145/3184558.3186989 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; Médini, Lionel %P 243 - 246 %I ACM %@ 978-1-4503-5640-4
[229]
Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum, “TEQUILA: Temporal Question Answering over Knowledge Bases,” in CIKM’18, 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 2018.
Export
BibTeX
@inproceedings{Jia_CIKM2018, TITLE = {{TEQUILA}: {T}emporal Question Answering over Knowledge Bases}, AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-6014-2}, DOI = {10.1145/3269206.3269247}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {CIKM'18, 27th ACM International Conference on Information and Knowledge Management}, EDITOR = {Cuzzocrea, Alfredo and Allan, James and Paton, Norman and Srivastava, Divesh and Agrawal, Rakesh and Broder, Andrei and Zaki, Mohamed and Candan, Selcuk and Labrinidis, Alexandros and Schuster, Assaf and Wang, Haixun}, PAGES = {1807--1810}, ADDRESS = {Torino, Italy}, }
Endnote
%0 Conference Proceedings %A Jia, Zhen %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Strötgen, Jannik %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T TEQUILA: Temporal Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A106-1 %R 10.1145/3269206.3269247 %D 2018 %B 27th ACM International Conference on Information and Knowledge Management %Z date of event: 2018-10-22 - 2018-10-26 %C Torino, Italy %B CIKM'18 %E Cuzzocrea, Alfredo; Allan, James; Paton, Norman; Srivastava, Divesh; Agrawal, Rakesh; Broder, Andrei; Zaki, Mohamed; Candan, Selcuk; Labrinidis, Alexandros; Schuster, Assaf; Wang, Haixun %P 1807 - 1810 %I ACM %@ 978-1-4503-6014-2
[230]
Z. Jia, A. Abujabal, R. Saha Roy, J. Strötgen, and G. Weikum, “TempQuestions: A Benchmark for Temporal Question Answering,” in Companion of the World Wide Web Conference (WWW 2018), Lyon, France, 2018.
Export
BibTeX
@inproceedings{JiaWWW2017, TITLE = {{TempQuestions}: {A} Benchmark for Temporal Question Answering}, AUTHOR = {Jia, Zhen and Abujabal, Abdalghani and Saha Roy, Rishiraj and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5640-4}, DOI = {10.1145/3184558.3191536}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Companion of the World Wide Web Conference (WWW 2018)}, EDITOR = {Champin, Pierre-Antoine and Gandon, Fabien and M{\'e}dini, Lionel}, PAGES = {1057--1062}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Jia, Zhen %A Abujabal, Abdalghani %A Saha Roy, Rishiraj %A Strötgen, Jannik %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T TempQuestions: A Benchmark for Temporal Question Answering : %G eng %U http://hdl.handle.net/21.11116/0000-0001-3C80-B %R 10.1145/3184558.3191536 %D 2018 %B The Web Conference %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B Companion of the World Wide Web Conference %E Champin, Pierre-Antoine; Gandon, Fabien; Médini, Lionel %P 1057 - 1062 %I ACM %@ 978-1-4503-5640-4
[231]
J. Kalofolias, E. Galbrun, and P. Miettinen, “From Sets of Good Redescriptions to Good Sets of Redescriptions,” Knowledge and Information Systems, vol. 57, no. 1, 2018.
Export
BibTeX
@article{kalofolias18from, TITLE = {From Sets of Good Redescriptions to Good Sets of Redescriptions}, AUTHOR = {Kalofolias, Janis and Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {0219-1377}, DOI = {10.1007/s10115-017-1149-7}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2018}, DATE = {2018}, JOURNAL = {Knowledge and Information Systems}, VOLUME = {57}, NUMBER = {1}, PAGES = {21--54}, }
Endnote
%0 Journal Article %A Kalofolias, Janis %A Galbrun, Esther %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T From Sets of Good Redescriptions to Good Sets of Redescriptions : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90D1-5 %R 10.1007/s10115-017-1149-7 %7 2018-01-19 %D 2018 %J Knowledge and Information Systems %V 57 %N 1 %& 21 %P 21 - 54 %I Springer %C New York, NY %@ false
[232]
S. Karaev, S. Metzler, and P. Miettinen, “Logistic-Tropical Decompositions and Nested Subgraphs,” in Proceedings of the 14th International Workshop on Mining and Learning with Graphs (MLG 2018), London, UK, 2018.
Export
BibTeX
@inproceedings{Karaev_MLG2018, TITLE = {Logistic-Tropical Decompositions and Nested Subgraphs}, AUTHOR = {Karaev, Sanjar and Metzler, Saskia and Miettinen, Pauli}, LANGUAGE = {eng}, PUBLISHER = {MLG Workshop}, YEAR = {2018}, BOOKTITLE = {Proceedings of the 14th International Workshop on Mining and Learning with Graphs (MLG 2018)}, EID = {35}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Karaev, Sanjar %A Metzler, Saskia %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Logistic-Tropical Decompositions and Nested Subgraphs : %G eng %U http://hdl.handle.net/21.11116/0000-0002-A91F-E %D 2018 %B 14th International Workshop on Mining and Learning with Graphs %Z date of event: 2018-08-20 - 2018-08-20 %C London, UK %B Proceedings of the 14th International Workshop on Mining and Learning with Graphs %Z sequence number: 35 %I MLG Workshop %U http://www.mlgworkshop.org/2018/papers/MLG2018_paper_35.pdf
[233]
S. Karaev, J. Hook, and P. Miettinen, “Latitude: A Model for Mixed Linear-Tropical Matrix Factorization,” 2018. [Online]. Available: http://arxiv.org/abs/1801.06136. (arXiv: 1801.06136)
Abstract
Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.
Export
BibTeX
@online{Karaev2018, TITLE = {Latitude: A Model for Mixed Linear-Tropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Hook, James and Miettinen, Pauli}, URL = {http://arxiv.org/abs/1801.06136}, EPRINT = {1801.06136}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.}, }
Endnote
%0 Report %A Karaev, Sanjar %A Hook, James %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Latitude: A Model for Mixed Linear-Tropical Matrix Factorization : %U http://hdl.handle.net/21.11116/0000-0000-636B-9 %U http://arxiv.org/abs/1801.06136 %D 2018 %X Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone. %K Computer Science, Learning, cs.LG
[234]
S. Karaev, J. Hook, and P. Miettinen, “Latitude: A Model for Mixed Linear-Tropical Matrix Factorization,” in Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018), San Diego, CA, USA, 2018.
Export
BibTeX
@inproceedings{Karaev_SDM2018, TITLE = {Latitude: A Model for Mixed Linear-Tropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Hook, James and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-61197-532-1}, DOI = {10.1137/1.9781611975321.41}, PUBLISHER = {SIAM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {Proceedings of the 2018 SIAM International Conference on Data Mining (SDM 2018)}, EDITOR = {Ester, Martin and Pedreschi, Dino}, PAGES = {360--368}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Karaev, Sanjar %A Hook, James %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Latitude: A Model for Mixed Linear-Tropical Matrix Factorization : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E2D-4 %R 10.1137/1.9781611975321.41 %D 2018 %B SIAM International Conference on Data Mining %Z date of event: 2018-05-03 - 2018-05-05 %C San Diego, CA, USA %B Proceedings of the 2018 SIAM International Conference on Data Mining %E Ester, Martin; Pedreschi, Dino %P 360 - 368 %I SIAM %@ 978-1-61197-532-1
[235]
P. Lahoti, K. Garimella, and A. Gionis, “Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.
Export
BibTeX
@inproceedings{Lahoti_WSDM2018, TITLE = {Joint Non-negative Matrix Factorization for Learning Ideological Leaning on {T}witter}, AUTHOR = {Lahoti, Preethi and Garimella, Kiran and Gionis, Aristides}, LANGUAGE = {eng}, ISBN = {978-1-4503-5581-0}, DOI = {10.1145/3159652.3159669}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining}, PAGES = {351--359}, ADDRESS = {Marina Del Rey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Lahoti, Preethi %A Garimella, Kiran %A Gionis, Aristides %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9C4F-7 %R 10.1145/3159652.3159669 %D 2018 %B 11th ACM International Conference on Web Search and Data Mining %Z date of event: 2018-02-05 - 2018-02-09 %C Marina Del Rey, CA, USA %B WSDM'18 %P 351 - 359 %I ACM %@ 978-1-4503-5581-0
[236]
P. Lahoti, G. Weikum, and K. P. Gummadi, “iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making,” 2018. [Online]. Available: http://arxiv.org/abs/1806.01059. (arXiv: 1806.01059)
Abstract
People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.
Export
BibTeX
@online{Lahoti_arXiv1806.01059, TITLE = {{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making}, AUTHOR = {Lahoti, Preethi and Weikum, Gerhard and Gummadi, Krishna P.}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1806.01059}, EPRINT = {1806.01059}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.}, }
Endnote
%0 Report %A Lahoti, Preethi %A Weikum, Gerhard %A Gummadi, Krishna P. %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making : %G eng %U http://hdl.handle.net/21.11116/0000-0002-1545-9 %U http://arxiv.org/abs/1806.01059 %D 2018 %X People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting. %K Computer Science, Learning, cs.LG,Computer Science, Information Retrieval, cs.IR,Statistics, Machine Learning, stat.ML
[237]
C. Li, Y. Sun, B. He, L. Wang, K. Hui, A. Yates, L. Sun, and J. Xu, “NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018.
Export
BibTeX
@inproceedings{DBLP:conf/emnlp/LiSHWHYSX18, TITLE = {{NPRF}: {A} Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval}, AUTHOR = {Li, Canjia and Sun, Yingfei and He, Ben and Wang, Le and Hui, Kai and Yates, Andrew and Sun, Le and Xu, Jungang}, LANGUAGE = {eng}, ISBN = {978-1-948087-84-1}, URL = {https://aclanthology.info/papers/D18-1478/d18-1478}, PUBLISHER = {ACL}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)}, EDITOR = {Riloff, Ellen and Chiang, David and Hockenmaier, Julia and Jun'ichi, Tsujii}, PAGES = {4482--4491}, ADDRESS = {Brussels, Belgium}, }
Endnote
%0 Conference Proceedings %A Li, Canjia %A Sun, Yingfei %A He, Ben %A Wang, Le %A Hui, Kai %A Yates, Andrew %A Sun, Le %A Xu, Jungang %+ External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0003-11BB-7 %U https://aclanthology.info/papers/D18-1478/d18-1478 %D 2018 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2018-10-31 - 2018-11-04 %C Brussels, Belgium %B The Conference on Empirical Methods in Natural Language Processing %E Riloff, Ellen; Chiang, David; Hockenmaier, Julia; Jun'ichi, Tsujii %P 4482 - 4491 %I ACL %@ 978-1-948087-84-1
[238]
S. MacAvaney, B. Desmet, A. Cohan, L. Soldaini, A. Yates, A. Zirikly, and N. Goharian, “RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses,” 2018. [Online]. Available: http://arxiv.org/abs/1806.07916. (arXiv: 1806.07916)
Abstract
Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Furthermore, we include exact temporal spans that relate to the date of diagnosis. This information is valuable for various computational methods to examine mental health through social media because one's mental health state is not static. We also test several baseline classification and extraction approaches, which suggest that extracting temporal information from self-reported diagnosis statements is challenging.
Export
BibTeX
@online{MacAveray_arXiv1806.07916, TITLE = {{RSDD}-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses}, AUTHOR = {MacAvaney, Sean and Desmet, Bart and Cohan, Arman and Soldaini, Luca and Yates, Andrew and Zirikly, Ayah and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1806.07916}, EPRINT = {1806.07916}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Furthermore, we include exact temporal spans that relate to the date of diagnosis. This information is valuable for various computational methods to examine mental health through social media because one's mental health state is not static. We also test several baseline classification and extraction approaches, which suggest that extracting temporal information from self-reported diagnosis statements is challenging.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Desmet, Bart %A Cohan, Arman %A Soldaini, Luca %A Yates, Andrew %A Zirikly, Ayah %A Goharian, Nazli %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ED9-1 %U http://arxiv.org/abs/1806.07916 %D 2018 %X Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Furthermore, we include exact temporal spans that relate to the date of diagnosis. This information is valuable for various computational methods to examine mental health through social media because one's mental health state is not static. We also test several baseline classification and extraction approaches, which suggest that extracting temporal information from self-reported diagnosis statements is challenging. %K Computer Science, Computation and Language, cs.CL
[239]
S. MacAvaney, B. Desmet, A. Cohan, L. Soldaini, A. Yates, A. Zirikly, and N. Goharian, “RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses,” in Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2018), New Orleans, LA, USA, 2018.
Export
BibTeX
@inproceedings{MacAvaney_NAACL_HLT2018, TITLE = {{RSDD}-Time: {T}emporal Annotation of Self-Reported Mental Health Diagnoses}, AUTHOR = {MacAvaney, Sean and Desmet, Bart and Cohan, Arman and Soldaini, Luca and Yates, Andrew and Zirikly, Ayah and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {978-1-948087-12-4}, URL = {http://aclweb.org/anthology/W18-0618}, DOI = {10.18653/v1/W18-0618}, PUBLISHER = {ACL}, YEAR = {2018}, BOOKTITLE = {Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2018)}, EDITOR = {Loveys, Kate and Niederhoffer, Kate and Prud'hommeaux, Emily and Resnik, Rebecca and Resnik, Philip}, PAGES = {168--173}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Desmet, Bart %A Cohan, Arman %A Soldaini, Luca %A Yates, Andrew %A Zirikly, Ayah %A Goharian, Nazli %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E8C-8 %U http://aclweb.org/anthology/W18-0618 %R 10.18653/v1/W18-0618 %D 2018 %B Fifth Workshop on Computational Linguistics and Clinical Psychology %Z date of event: 2018-06-05 - 2018-06-05 %C New Orleans, LA, USA %B Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology %E Loveys, Kate; Niederhoffer, Kate; Prud'hommeaux, Emily; Resnik, Rebecca; Resnik, Philip %P 168 - 173 %I ACL %@ 978-1-948087-12-4 %U https://aclanthology.info/papers/W18-0618/w18-0618
[240]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Overcoming Low-Utility Facets for Complex Answer Retrieval,” in SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH (ProfS-KG4IR-Data:Search 2018), Ann Arbor, MI, USA, 2018.
Export
BibTeX
@inproceedings{MacAvaney_KG4IR2018, TITLE = {Overcoming Low-Utility Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, URL = {http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf; urn:nbn:de:0074-2127-8}, PUBLISHER = {ceur.ws.org}, YEAR = {2018}, BOOKTITLE = {SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH (ProfS-KG4IR-Data:Search 2018)}, EDITOR = {Dietz, Laura and Koetzen, Laura and Verberne, Suzan}, PAGES = {46--47}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {2127}, ADDRESS = {Ann Arbor, MI, USA}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Overcoming Low-Utility Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5E9C-6 %U http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf %D 2018 %B Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding %Z date of event: 2018-07-12 - 2018-07-12 %C Ann Arbor, MI, USA %B SIGIR 2018 Workshops: ProfS, KG4IR, and DATA:SEARCH %E Dietz, Laura; Koetzen, Laura; Verberne, Suzan %P 46 - 47 %I ceur.ws.org %B CEUR Workshop Proceedings %N 2127 %U http://ceur-ws.org/Vol-2127/paper1-kg4ir.pdf
[241]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Characterizing Question Facets for Complex Answer Retrieval,” 2018. [Online]. Available: http://arxiv.org/abs/1805.00791. (arXiv: 1805.00791)
Abstract
Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method.
Export
BibTeX
@online{MacAvernay_arXIv1805.00791, TITLE = {Characterizing Question Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1805.00791}, EPRINT = {1805.00791}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Characterizing Question Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ECE-E %U http://arxiv.org/abs/1805.00791 %D 2018 %X Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method. %K Computer Science, Information Retrieval, cs.IR
[242]
S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, “Characterizing Question Facets for Complex Answer Retrieval,” in SIGIR’18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 2018.
Export
BibTeX
@inproceedings{MacAvaney_SIGIR2018, TITLE = {Characterizing Question Facets for Complex Answer Retrieval}, AUTHOR = {MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Soldaini, Luca and Hui, Kai and Goharian, Nazli and Frieder, Ophir}, LANGUAGE = {eng}, ISBN = {978-1-4503-5657-2}, DOI = {10.1145/3209978.3210135}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {SIGIR'18, 41st International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {1205--1208}, ADDRESS = {Ann Arbor, MI, USA}, }
Endnote
%0 Conference Proceedings %A MacAvaney, Sean %A Yates, Andrew %A Cohan, Arman %A Soldaini, Luca %A Hui, Kai %A Goharian, Nazli %A Frieder, Ophir %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations External Organizations %T Characterizing Question Facets for Complex Answer Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0002-5ECA-2 %R 10.1145/3209978.3210135 %D 2018 %B 41st International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2018-07-08 - 2018-07-12 %C Ann Arbor, MI, USA %B SIGIR'18 %P 1205 - 1208 %I ACM %@ 978-1-4503-5657-2
[243]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” in IEEE International Conference on Data Mining (ICDM 2018), Singapore, Singapore, 2018.
Export
BibTeX
@inproceedings{mandros:18:fedora, TITLE = {Discovering Reliable Dependencies from Data: {H}ardness and Improved Algorithms}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-9159-5}, DOI = {10.1109/ICDM.2018.00047}, PUBLISHER = {IEEE}, YEAR = {2018}, BOOKTITLE = {IEEE International Conference on Data Mining (ICDM 2018)}, PAGES = {317--326}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms : %G eng %U http://hdl.handle.net/21.11116/0000-0002-9EA2-5 %R 10.1109/ICDM.2018.00047 %D 2018 %B IEEE International Conference on Data Mining %Z date of event: 2018-11-17 - 2018-11-20 %C Singapore, Singapore %B IEEE International Conference on Data Mining %P 317 - 326 %I IEEE %@ 978-1-5386-9159-5
[244]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms,” 2018. [Online]. Available: http://arxiv.org/abs/1809.05467. (arXiv: 1809.05467)
Abstract
The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.
Export
BibTeX
@online{Mandros_arXiv1809.05467, TITLE = {Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1809.05467}, EPRINT = {1809.05467}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {The reliable fraction of information is an attractive score for