Publications

2018
[1]
S. Degaetano-Ortlieb and J. Strötgen, “Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy,” in Language Technologies for the Challenges of the Digital Age (GSCL 2017), Berlin, Germany, 2018.
Export
BibTeX
@inproceedings{DegaetanoortliebStroetgen2017, TITLE = {Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy}, AUTHOR = {Degaetano-Ortlieb, Stefania and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-3-319-73705-8}, DOI = {10.1007/978-3-319-73706-5_22}, PUBLISHER = {Springer}, YEAR = {2017}, DATE = {2018}, BOOKTITLE = {Language Technologies for the Challenges of the Digital Age (GSCL 2017)}, EDITOR = {Rehm, Georg and Declerck, Thierry}, PAGES = {259--275}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {10713}, ADDRESS = {Berlin, Germany}, }
Endnote
%0 Conference Proceedings %A Degaetano-Ortlieb, Stefania %A Strötgen, Jannik %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Diachronic Variation of Temporal Expressions in Scientific Writing through the Lens of Relative Entropy : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-A8E8-5 %R 10.1007/978-3-319-73706-5_22 %D 2018 %B Conference of the German Society for Computational Linguistics and Language Technology %Z date of event: 2017-09-13 - 2017-09-14 %C Berlin, Germany %B Language Technologies for the Challenges of the Digital Age %E Rehm, Georg; Declerck, Thierry %P 259 - 275 %I Springer %@ 978-3-319-73705-8 %B Lecture Notes in Artificial Intelligence %N 10713
[2]
P. Ernst, “Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems,” Universität des Saarlandes, Saarbrücken, 2018.
Abstract
While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.
Export
BibTeX
@phdthesis{Ernstphd2017, TITLE = {Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems}, AUTHOR = {Ernst, Patrick}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-271051}, DOI = {10.22028/D291-27105}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, ABSTRACT = {While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: -- To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. -- To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. -- To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches.}, }
Endnote
%0 Thesis %A Ernst, Patrick %Y Weikum, Gerhard %A referee: Verspoor, Karin %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Biomedical Knowledge Base Construction from Text and its Applications in Knowledge-based Systems : %G eng %U http://hdl.handle.net/21.11116/0000-0001-1864-4 %U urn:nbn:de:bsz:291-scidok-ds-271051 %R 10.22028/D291-27105 %I Universität des Saarlandes %C Saarbrücken %D 2018 %8 20.02.2018 %P 147 p. %V phd %9 phd %X While general-purpose Knowledge Bases (KBs) have gone a long way in compiling comprehensive knowledgee about people, events, places, etc., domain-specific KBs, such as on health, are equally important, but are less explored. Consequently, a comprehensive and expressive health KB that spans all aspects of biomedical knowledge is still missing. The main goal of this thesis is to develop principled methods for building such a KB and enabling knowledge-centric applications. We address several challenges and make the following contributions: - To construct a health KB, we devise a largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas. - To consider higher-arity relations, crucial for proper knowledge representation in advanced domain such as health, we generalize the fact-pattern duality paradigm of previous methods. A key novelty is the integration of facts with missing arguments by extending our framework to partial patterns and facts by reasoning over the composability of partial facts. - To demonstrate the benefits of a health KB, we devise systems for entity-aware search and analytics and for entity-relationship-oriented exploration. Extensive experiments and use-case studies demonstrate the viability of the proposed approaches. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26987
[3]
E. Galbrun and P. Miettinen, “Mining Redescriptions with Siren,” ACM Transactions on Knowledge Discovery from Data, vol. 12, no. 1, 2018.
Export
BibTeX
@article{galbrun17mining, TITLE = {Mining Redescriptions with {Siren}}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1145/3007212}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2018}, JOURNAL = {ACM Transactions on Knowledge Discovery from Data}, VOLUME = {12}, NUMBER = {1}, EID = {6}, }
Endnote
%0 Journal Article %A Galbrun, Esther %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Mining Redescriptions with Siren : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-227B-F %R 10.1145/3007212 %7 2018 %D 2018 %J ACM Transactions on Knowledge Discovery from Data %V 12 %N 1 %Z sequence number: 6 %I ACM %C New York, NY
[4]
E. Gius, N. Reiter, J. Strötgen, and M. Willand, “SANTA: Systematische Analyse Narrativer Texte durch Annotation,” in Kritik der digitalen Vernunft (DHd 2018), Köln, Germany. (Accepted/in press)
Export
BibTeX
@inproceedings{GiusDHd2018, TITLE = {{{SANTA}: {Systematische Analyse Narrativer Texte durch Annotation}}}, AUTHOR = {Gius, Evelyn and Reiter, Nils and Str{\"o}tgen, Jannik and Willand, Marcus}, LANGUAGE = {deu}, URL = {http://dhd2018.uni-koeln.de/}, YEAR = {2018}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Kritik der digitalen Vernunft (DHd 2018)}, ADDRESS = {K{\"o}ln, Germany}, }
Endnote
%0 Conference Proceedings %A Gius, Evelyn %A Reiter, Nils %A Strötgen, Jannik %A Willand, Marcus %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T SANTA: Systematische Analyse Narrativer Texte durch Annotation : %G deu %U http://hdl.handle.net/11858/00-001M-0000-002E-73EC-4 %D 2017 %B 5. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V. %Z date of event: 2018-02-26 - 2018-03-02 %C Köln, Germany %B Kritik der digitalen Vernunft
[5]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval,” in WSDM’18, 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 2018.
Export
BibTeX
@inproceedings{Hui_WSDM2018, TITLE = {Co-{PACRR}: {A} Context-Aware Neural {IR} Model for Ad-hoc Retrieval}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5581-0}, DOI = {10.1145/3159652.3159689}, PUBLISHER = {ACM}, YEAR = {2018}, DATE = {2018}, BOOKTITLE = {WSDM'18, 11th ACM International Conference on Web Search and Data Mining}, PAGES = {279--287}, ADDRESS = {Marina Del Rey, CA, USA}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval : %G eng %U http://hdl.handle.net/21.11116/0000-0000-6367-D %R 10.1145/3159652.3159689 %D 2018 %B 11th ACM International Conference on Web Search and Data Mining %Z date of event: 2018-02-05 - 2018-02-09 %C Marina Del Rey, CA, USA %B WSDM'18 %P 279 - 287 %I ACM %@ 978-1-4503-5581-0
[6]
J. Kalofolias, E. Galbrun, and P. Miettinen, “From Sets of Good Redescriptions to Good Sets of Redescriptions,” Knowledge and Information Systems, 2018.
Export
BibTeX
@article{kalofolias18from, TITLE = {From Sets of Good Redescriptions to Good Sets of Redescriptions}, AUTHOR = {Kalofolias, Janis and Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISSN = {0219-1377}, DOI = {10.1007/s10115-017-1149-7}, PUBLISHER = {Springer}, ADDRESS = {New York, NY}, YEAR = {2018}, JOURNAL = {Knowledge and Information Systems}, }
Endnote
%0 Journal Article %A Kalofolias, Janis %A Galbrun, Esther %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T From Sets of Good Redescriptions to Good Sets of Redescriptions : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90D1-5 %R 10.1007/s10115-017-1149-7 %7 2018-01-19 %D 2018 %8 19.01.2018 %J Knowledge and Information Systems %I Springer %C New York, NY %@ false
[7]
S. Karaev, J. Hook, and P. Miettinen, “Latitude: A Model for Mixed Linear-Tropical Matrix Factorization,” 2018. [Online]. Available: http://arxiv.org/abs/1801.06136. (arXiv: 1801.06136)
Abstract
Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.
Export
BibTeX
@online{Karaev2018, TITLE = {Latitude: A Model for Mixed Linear-Tropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Hook, James and Miettinen, Pauli}, URL = {http://arxiv.org/abs/1801.06136}, EPRINT = {1801.06136}, EPRINTTYPE = {arXiv}, YEAR = {2018}, ABSTRACT = {Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.}, }
Endnote
%0 Report %A Karaev, Sanjar %A Hook, James %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Latitude: A Model for Mixed Linear-Tropical Matrix Factorization : %U http://hdl.handle.net/21.11116/0000-0000-636B-9 %U http://arxiv.org/abs/1801.06136 %D 2018 %X Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone. %K Computer Science, Learning, cs.LG
[8]
A. Mishra, “Leveraging Semantic Annotations for Event-focused Search & Summarization,” Universität des Saarlandes, Saarbrücken, 2018.
Abstract
Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.
Export
BibTeX
@phdthesis{Mishraphd2018, TITLE = {Leveraging Semantic Annotations for Event-focused Search \& Summarization}, AUTHOR = {Mishra, Arunav}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-271081}, DOI = {10.22028/D291-27108}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2018}, ABSTRACT = {Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: \mbox{$\bullet$} We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. \mbox{$\bullet$} We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. \mbox{$\bullet$} To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.}, }
Endnote
%0 Thesis %A Mishra, Arunav %Y Berberich, Klaus %A referee: Weikum, Gerhard %A referee: Hauff, Claudia %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Leveraging Semantic Annotations for Event-focused Search & Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0001-1844-8 %U urn:nbn:de:bsz:291-scidok-ds-271081 %R 10.22028/D291-27108 %I Universität des Saarlandes %C Saarbrücken %D 2018 %8 08.02.2018 %P 252 p. %V phd %9 phd %X Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26995
[9]
K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum, “CredEye: A Credibility Lens for Analyzing and Explaining Misinformation,” in WWW’18 Companion, Lyon, France. (Accepted/in press)
Export
BibTeX
@inproceedings{PopatWWW2017, TITLE = {{CredEye}: {A} Credibility Lens for Analyzing and Explaining Misinformation}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1145/3184558.3186967}, PUBLISHER = {ACM}, YEAR = {2018}, PUBLREMARK = {Accepted}, BOOKTITLE = {WWW'18 Companion}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %A Mukherjee, Subhabrata %A Strötgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T CredEye: A Credibility Lens for Analyzing and Explaining Misinformation : %G eng %U http://hdl.handle.net/21.11116/0000-0000-B546-5 %R 10.1145/3184558.3186967 %D 2018 %B 27th International Conference on World Wide Web %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B WWW'18 Companion %I ACM
[10]
A. Spitz, J. Strötgen, and M. Gertz, “Predicting Document Creation Times in News Citation Networks,” in WWW’18 Companion, Lyon, France. (Accepted/in press)
Export
BibTeX
@inproceedings{SpitzWWW2017, TITLE = {Predicting Document Creation Times in News Citation Networks}, AUTHOR = {Spitz, Andreas and Str{\"o}tgen, Jannik and Gertz, Michael}, LANGUAGE = {eng}, DOI = {10.1145/3184558.3191633}, PUBLISHER = {ACM}, YEAR = {2018}, PUBLREMARK = {Accepted}, BOOKTITLE = {WWW'18 Companion}, ADDRESS = {Lyon, France}, }
Endnote
%0 Conference Proceedings %A Spitz, Andreas %A Strötgen, Jannik %A Gertz, Michael %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Predicting Document Creation Times in News Citation Networks : %G eng %U http://hdl.handle.net/21.11116/0000-0000-B544-7 %R 10.1145/3184558.3191633 %D 2018 %B 27th International Conference on World Wide Web %Z date of event: 2018-04-23 - 2018-04-27 %C Lyon, France %B WWW'18 Companion %I ACM
[11]
J. Strötgen, R. Andrade, and D. Gupta, “Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations,” in Proceedings of the Joint Conference on Digital Libraries (JCDL 2018), Fort Worth, TX, USA. (Accepted/in press)
Export
BibTeX
@inproceedings{StroetgenJCDL2018, TITLE = {Putting Dates on the Map: {H}arvesting and Analyzing Street Names with Date Mentions and their Explanations}, AUTHOR = {Str{\"o}tgen, Jannik and Andrade, Rosita and Gupta, Dhruv}, LANGUAGE = {eng}, PUBLISHER = {ACM}, YEAR = {2018}, PUBLREMARK = {Accepted}, BOOKTITLE = {Proceedings of the Joint Conference on Digital Libraries (JCDL 2018)}, ADDRESS = {Fort Worth, TX, USA}, }
Endnote
%0 Conference Proceedings %A Strötgen, Jannik %A Andrade, Rosita %A Gupta, Dhruv %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Putting Dates on the Map: Harvesting and Analyzing Street Names with Date Mentions and their Explanations : %G eng %U http://hdl.handle.net/21.11116/0000-0000-B548-3 %D 2018 %B Joint Conference on Digital Libraries %Z date of event: 2018-06-03 - 2018-06-06 %C Fort Worth, TX, USA %B Proceedings of the Joint Conference on Digital Libraries %I ACM
[12]
J. Strötgen, A.-L. Minard, L. Lange, M. Speranza, and B. Magnini, “KRAUTS: A German Temporally Annotated News Corpus,” in Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. (Accepted/in press)
Export
BibTeX
@inproceedings{StroetgenELREC2018, TITLE = {{KRAUTS}: {A German} Temporally Annotated News Corpus}, AUTHOR = {Str{\"o}tgen, Jannik and Minard, Anne-Lyse and Lange, Lukas and Speranza, Manuela and Magnini, Bernardo}, LANGUAGE = {eng}, URL = {http://lrec2018.lrec-conf.org/en/}, YEAR = {2018}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, ADDRESS = {Miyazaki, Japan}, }
Endnote
%0 Conference Proceedings %A Strötgen, Jannik %A Minard, Anne-Lyse %A Lange, Lukas %A Speranza, Manuela %A Magnini, Bernardo %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T KRAUTS: A German Temporally Annotated News Corpus : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-8B8C-E %U http://lrec2018.lrec-conf.org/en/ %D 2017 %B 11th Language Resources and Evaluation Conference %Z date of event: 2018-05-07 - 2018-05-12 %C Miyazaki, Japan %B Eleventh International Conference on Language Resources and Evaluation
2017
[13]
A. Abujabal, R. S. Roy, M. Yahya, and G. Weikum, “QUINT: Interpretable Question Answering over Knowledge Bases,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{AbujabalENMLP2017, TITLE = {{QUINT}: {I}nterpretable Question Answering over Knowledge Bases}, AUTHOR = {Abujabal, Abdalghani and Roy, Rishiraj Saha and Yahya, Mohamed and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-97-5}, URL = {http://aclweb.org/anthology/D17-2011}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {61--66}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Roy, Rishiraj Saha %A Yahya, Mohamed %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T QUINT: Interpretable Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-F97C-E %U http://aclweb.org/anthology/D17-2011 %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 61 - 66 %I ACL %@ 978-1-945626-97-5 %U http://aclweb.org/anthology/D17-2011
[14]
A. Abujabal, M. Yahya, M. Riedewald, and G. Weikum, “Automated Template Generation for Question Answering over Knowledge Graphs,” in WWW’17, 26th International Conference on World Wide Web, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{AbujabalWWW2017, TITLE = {Automated Template Generation for Question Answering over Knowledge Graphs}, AUTHOR = {Abujabal, Abdalghani and Yahya, Mohamed and Riedewald, Mirek and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4913-0}, DOI = {10.1145/3038912.3052583}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17, 26th International Conference on World Wide Web}, PAGES = {1191--1200}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Yahya, Mohamed %A Riedewald, Mirek %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Automated Template Generation for Question Answering over Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4F9C-E %R 10.1145/3038912.3052583 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 %P 1191 - 1200 %I ACM %@ 978-1-4503-4913-0
[15]
P. Agarwal, “Time-Aware Named Entity Disambiguation,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Named Entity Disambiguation (NED) is a Natural Language Processing task of linking mentions of named entities is a text to their corresponding entries in a Knowledge Base. It serves as a crucial component in applications such as Semantic Search, Knowledge Base Population, and Opinion Mining. Currently deployed tools for NED are based on sophisticated models that use coherence relation among enti- ties and distributed vectors to represent the entity mentions and their contexts in a document to disambiguate them collectively. Factors that have not been considered yet in this track are the semantics of temporal information about canonical entity forms and their mentions. Even though temporal expressions in a text give inherent structural characteristic to it, for instance, it can map a topic being discussed to a certain period of known history, yet such expressions are leveraged no differently than other dictionary words. In this thesis we propose the first time-aware NED model, which extends a state-of-the-art learning to rank approach based on joint word-entity embeddings. For this we introduce the concept of temporal signatures that is used in our work to represent the importance of each entity in a Knowledge Base over a historical time-line. Such signatures for the entities and temporal con- texts for the entity mentions are represented in our proposed temporal vector space to model the similarities between them. We evaluated our method on CoNLL-AIDA and TAC 2010, which are two widely used datasets in the NED track. However, be- cause such datasets are composed of news articles from a short time-period, they do not provide extensive evaluation for our proposed temoral similarity modeling. Therefore, we curated a dia-chronic dataset, diaNED, with the characteristic of temporally diverse entity mentions in its text collection.
Export
BibTeX
@mastersthesis{Agarwalmaster2017, TITLE = {Time-Aware Named Entity Disambiguation}, AUTHOR = {Agarwal, Prabal}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Named Entity Disambiguation (NED) is a Natural Language Processing task of linking mentions of named entities is a text to their corresponding entries in a Knowledge Base. It serves as a crucial component in applications such as Semantic Search, Knowledge Base Population, and Opinion Mining. Currently deployed tools for NED are based on sophisticated models that use coherence relation among enti- ties and distributed vectors to represent the entity mentions and their contexts in a document to disambiguate them collectively. Factors that have not been considered yet in this track are the semantics of temporal information about canonical entity forms and their mentions. Even though temporal expressions in a text give inherent structural characteristic to it, for instance, it can map a topic being discussed to a certain period of known history, yet such expressions are leveraged no differently than other dictionary words. In this thesis we propose the first time-aware NED model, which extends a state-of-the-art learning to rank approach based on joint word-entity embeddings. For this we introduce the concept of temporal signatures that is used in our work to represent the importance of each entity in a Knowledge Base over a historical time-line. Such signatures for the entities and temporal con- texts for the entity mentions are represented in our proposed temporal vector space to model the similarities between them. We evaluated our method on CoNLL-AIDA and TAC 2010, which are two widely used datasets in the NED track. However, be- cause such datasets are composed of news articles from a short time-period, they do not provide extensive evaluation for our proposed temoral similarity modeling. Therefore, we curated a dia-chronic dataset, diaNED, with the characteristic of temporally diverse entity mentions in its text collection.}, }
Endnote
%0 Thesis %A Agarwal, Prabal %Y Strötgen, Jannik %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Time-Aware Named Entity Disambiguation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-9119-C %I Universität des Saarlandes %C Saarbrücken %D 2017 %P 114 p. %V master %9 master %X Named Entity Disambiguation (NED) is a Natural Language Processing task of linking mentions of named entities is a text to their corresponding entries in a Knowledge Base. It serves as a crucial component in applications such as Semantic Search, Knowledge Base Population, and Opinion Mining. Currently deployed tools for NED are based on sophisticated models that use coherence relation among enti- ties and distributed vectors to represent the entity mentions and their contexts in a document to disambiguate them collectively. Factors that have not been considered yet in this track are the semantics of temporal information about canonical entity forms and their mentions. Even though temporal expressions in a text give inherent structural characteristic to it, for instance, it can map a topic being discussed to a certain period of known history, yet such expressions are leveraged no differently than other dictionary words. In this thesis we propose the first time-aware NED model, which extends a state-of-the-art learning to rank approach based on joint word-entity embeddings. For this we introduce the concept of temporal signatures that is used in our work to represent the importance of each entity in a Knowledge Base over a historical time-line. Such signatures for the entities and temporal con- texts for the entity mentions are represented in our proposed temporal vector space to model the similarities between them. We evaluated our method on CoNLL-AIDA and TAC 2010, which are two widely used datasets in the NED track. However, be- cause such datasets are composed of news articles from a short time-period, they do not provide extensive evaluation for our proposed temoral similarity modeling. Therefore, we curated a dia-chronic dataset, diaNED, with the characteristic of temporally diverse entity mentions in its text collection.
[16]
P. Agarwal and J. Strötgen, “Tiwiki: Searching Wikipedia with Temporal Constraints,” in WWW ’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{AgarwalStroetgen2017_TempWeb, TITLE = {Tiwiki: Searching {W}ikipedia with Temporal Constraints}, AUTHOR = {Agarwal, Prabal and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3051112}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW '17 Companion}, PAGES = {1595--1600}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Agarwal, Prabal %A Strötgen, Jannik %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Tiwiki: Searching Wikipedia with Temporal Constraints : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-53AE-9 %R 10.1145/3041021.3051112 %D 2017 %B 26th International Conference on World Wide Web Companion %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW '17 Companion %P 1595 - 1600 %I ACM %@ 978-1-4503-4914-7
[17]
R. Andrade and J. Strötgen, “All Dates Lead to Rome: Extracting and Explaining Temporal References in Street Names,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{AndradeWWW2017, TITLE = {All Dates Lead to {R}ome: {E}xtracting and Explaining Temporal References in Street Names}, AUTHOR = {Andrade, Rosita and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3054249}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {757--758}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Andrade, Rosita %A Strötgen, Jannik %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T All Dates Lead to Rome: Extracting and Explaining Temporal References in Street Names : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-62AE-1 %R 10.1145/3041021.3054249 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 757 - 758 %I ACM %@ 978-1-4503-4914-7
[18]
A. Bhattacharyya and J. Vreeken, “Efficiently Summarising Event Sequences with Rich Interleaving Patterns,” 2017. [Online]. Available: http://arxiv.org/abs/1701.08096. (arXiv: 1701.08096)
Abstract
Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values.
Export
BibTeX
@online{DBLP:journals/corr/BhattacharyyaV17, TITLE = {Efficiently Summarising Event Sequences with Rich Interleaving Patterns}, AUTHOR = {Bhattacharyya, Apratim and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1701.08096}, EPRINT = {1701.08096}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values.}, }
Endnote
%0 Report %A Bhattacharyya, Apratim %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Summarising Event Sequences with Rich Interleaving Patterns : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90E4-A %U http://arxiv.org/abs/1701.08096 %D 2017 %X Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose \ourmethod, a novel greedy MDL-based method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show \ourmethod is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[19]
A. Bhattacharyya and J. Vreeken, “Efficiently Summarising Event Sequences with Rich Interleaving Patterns,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{bhattacharyya:17:squish, TITLE = {Efficiently Summarising Event Sequences with Rich Interleaving Patterns}, AUTHOR = {Bhattacharyya, Apratim and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-497-3}, DOI = {10.1137/1.9781611974973.89}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, PAGES = {795--803}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Bhattacharyya, Apratim %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Summarising Event Sequences with Rich Interleaving Patterns : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BDC-D %R 10.1137/1.9781611974973.89 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %P 795 - 803 %I SIAM %@ 978-1-61197-497-3
[20]
A. J. Biega, A. Ghazimatin, H. Ferhatosmanoglu, K. P. Gummadi, and G. Weikum, “Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Biega_CIKM2017, TITLE = {Learning to Un-Rank: {Q}uantifying Search Exposure for Users in Online Communities}, AUTHOR = {Biega, Asia J. and Ghazimatin, Azin and Ferhatosmanoglu, Hakan and Gummadi, Krishna P. and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133040}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {267--276}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Biega, Asia J. %A Ghazimatin, Azin %A Ferhatosmanoglu, Hakan %A Gummadi, Krishna P. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Learning to Un-Rank: Quantifying Search Exposure for Users in Online Communities : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3BA4-5 %R 10.1145/3132847.3133040 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 267 - 276 %I ACM %@ 978-1-4503-4918-5
[21]
A. J. Biega, R. S. Roy, and G. Weikum, “Privacy through Solidarity: A User-Utility-Preserving Framework to Counter Profiling,” in SIGIR’17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 2017.
Export
BibTeX
@inproceedings{BiegaSIGIR2017, TITLE = {Privacy through Solidarity: {A} User-Utility-Preserving Framework to Counter Profiling}, AUTHOR = {Biega, Asia J. and Roy, Rishiraj Saha and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5022-8}, DOI = {10.1145/3077136.3080830}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {SIGIR'17, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {675--684}, ADDRESS = {Shinjuku, Tokyo, Japan}, }
Endnote
%0 Conference Proceedings %A Biega, Asia J. %A Roy, Rishiraj Saha %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Privacy through Solidarity: A User-Utility-Preserving Framework to Counter Profiling : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-F901-2 %R 10.1145/3077136.3080830 %D 2017 %B 40th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2017-08-07 - 2017-08-11 %C Shinjuku, Tokyo, Japan %B SIGIR'17 %P 675 - 684 %I ACM %@ 978-1-4503-5022-8
[22]
N. Boldyrev, “Alignment of Multi-Cultural Knowledge Repositories,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.
Export
BibTeX
@phdthesis{BOLDYREVPHD2017, TITLE = {Alignment of Multi-Cultural Knowledge Repositories}, AUTHOR = {Boldyrev, Natalia}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-269407}, DOI = {10.22028/D291-26940}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.}, }
Endnote
%0 Thesis %A Boldyrev, Natalia %Y Weikum, Gerhard %A referee: Berberich, Klaus %A referee: Spaniol, Marc %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Alignment of Multi-Cultural Knowledge Repositories : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-87D8-2 %R 10.22028/D291-26940 %U urn:nbn:de:bsz:291-scidok-ds-269407 %I Universität des Saarlandes %C Saarbrücken %D 2017 %8 06.12.2017 %P X, 124 p. %V phd %9 phd %X The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26891
[23]
N. Boldyrev, M. Spaniol, J. Strötgen, and G. Weikum, “SESAME: European Statistics Explored via Semantic Alignment onto Wikipedia,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{BoldyrevWWW2017, TITLE = {{SESAME}: {E}uropean Statistics Explored via Semantic Alignment onto {Wikipedia}}, AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3054732}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {177--181}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Boldyrev, Natalia %A Spaniol, Marc %A Strötgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T SESAME: European Statistics Explored via Semantic Alignment onto Wikipedia : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80B0-0 %R 10.1145/3041021.3054732 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 177 - 181 %I ACM %@ 978-1-4503-4914-7
[24]
M. Boley, B. R. Goldsmith, L. M. Ghiringhelli, and J. Vreeken, “Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery,” Data Mining and Knowledge Discovery, vol. 31, no. 5, 2017.
Export
BibTeX
@article{Boley2017, TITLE = {Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery}, AUTHOR = {Boley, Mario and Goldsmith, Bryan R. and Ghiringhelli, Luca M. and Vreeken, Jilles}, LANGUAGE = {eng}, DOI = {10.1007/s10618-017-0520-3}, PUBLISHER = {Springer}, ADDRESS = {London}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {Data Mining and Knowledge Discovery}, VOLUME = {31}, NUMBER = {5}, PAGES = {1391--1418}, }
Endnote
%0 Journal Article %A Boley, Mario %A Goldsmith, Bryan R. %A Ghiringhelli, Luca M. %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90E1-0 %R 10.1007/s10618-017-0520-3 %7 2017-06-28 %D 2017 %8 28.06.2017 %J Data Mining and Knowledge Discovery %V 31 %N 5 %& 1391 %P 1391 - 1418 %I Springer %C London
[25]
M. Boley, B. R. Goldsmith, L. M. Ghiringhelli, and J. Vreeken, “Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery,” 2017. [Online]. Available: http://arxiv.org/abs/1701.07696. (arXiv: 1701.07696)
Abstract
Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.
Export
BibTeX
@online{DBLP:journals/corr/BoleyGGV17, TITLE = {Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery}, AUTHOR = {Boley, Mario and Goldsmith, Bryan R. and Ghiringhelli, Luca M. and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1701.07696}, EPRINT = {1701.07696}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.}, }
Endnote
%0 Report %A Boley, Mario %A Goldsmith, Bryan R. %A Ghiringhelli, Luca M. %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90DB-F %U http://arxiv.org/abs/1701.07696 %D 2017 %X Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Databases, cs.DB
[26]
K. Budhathoki and J. Vreeken, “MDL for Causal Inference on Discrete Data,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{BudhathokiICDM2017, TITLE = {{MDL} for Causal Inference on Discrete Data}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.87}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {751--756}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T MDL for Causal Inference on Discrete Data : %G eng %U http://hdl.handle.net/21.11116/0000-0000-6458-D %R 10.1109/ICDM.2017.87 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 751 - 756 %I IEEE %@ 978-1-5386-3835-4
[27]
K. Budhathoki and J. Vreeken, “Correlation by Compression,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{budhathoki:17:cbc, TITLE = {Correlation by Compression}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-611974-87-4}, DOI = {10.1137/1.9781611974973.59}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, EDITOR = {Chawla, Nitesh}, PAGES = {525--533}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Correlation by Compression : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BD8-6 %R 10.1137/1.9781611974973.59 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %E Chawla, Nitesh; Wang, Wei %P 525 - 533 %I SIAM %@ 978-1-611974-87-4
[28]
K. Budhathoki and J. Vreeken, “Causal Inference by Stochastic Complexity,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06776. (arXiv: 1702.06776)
Abstract
The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.
Export
BibTeX
@online{DBLP:journals/corr/BudhathokiV17, TITLE = {Causal Inference by Stochastic Complexity}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1702.06776}, EPRINT = {1702.06776}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.}, }
Endnote
%0 Report %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference by Stochastic Complexity : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90F2-A %U http://arxiv.org/abs/1702.06776 %D 2017 %X The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes. %K Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI
[29]
K. Budhathoki and J. Vreeken, “Causal Inference by Compression,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{budhathoki:16:origo, TITLE = {Causal Inference by Compression}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5090-5473-2}, DOI = {10.1109/ICDM.2016.0015}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, EDITOR = {Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong}, PAGES = {41--50}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference by Compression : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1CC0-6 %R 10.1109/ICDM.2016.0015 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %E Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong %P 41 - 50 %I IEEE %@ 978-1-5090-5473-2
[30]
C. X. Chu, N. Tandon, and G. Weikum, “Distilling Task Knowledge from How-To Communities,” in WWW’17, 26th International Conference on World Wide Web, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{Cuong:WWW2017, TITLE = {Distilling Task Knowledge from How-To Communities}, AUTHOR = {Chu, Cuong Xuan and Tandon, Niket and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4913-0}, DOI = {10.1145/3038912.3052715}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17, 26th International Conference on World Wide Web}, PAGES = {805--814}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Chu, Cuong Xuan %A Tandon, Niket %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Distilling Task Knowledge from How-To Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-54BE-E %R 10.1145/3038912.3052715 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 %P 805 - 814 %I ACM %@ 978-1-4503-4913-0
[31]
A. Cohan, S. Young, A. Yates, and N. Goharian, “Triaging Content Severity in Online Mental Health Forums,” Journal of the Association for Information Science and Technology, vol. 68, no. 11, 2017.
Export
BibTeX
@article{Cohan2017, TITLE = {Triaging Content Severity in Online Mental Health Forums}, AUTHOR = {Cohan, Arman and Young, Sydney and Yates, Andrew and Goharian, Nazli}, LANGUAGE = {eng}, ISSN = {2330-1635}, DOI = {10.1002/asi.23865}, PUBLISHER = {Wiley}, ADDRESS = {Chichester, UK}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {Journal of the Association for Information Science and Technology}, VOLUME = {68}, NUMBER = {11}, PAGES = {2675--2689}, }
Endnote
%0 Journal Article %A Cohan, Arman %A Young, Sydney %A Yates, Andrew %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Triaging Content Severity in Online Mental Health Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06B9-8 %R 10.1002/asi.23865 %7 2017-09-25 %D 2017 %8 25.09.2017 %J Journal of the Association for Information Science and Technology %O asis&t %V 68 %N 11 %& 2675 %P 2675 - 2689 %I Wiley %C Chichester, UK %@ false
[32]
A. Cohan, S. Young, A. Yates, and N. Goharian, “Triaging Content Severity in Online Mental Health Forums,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06875. (arXiv: 1702.06875)
Abstract
Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We present a framework for triaging user content into four severity categories which are defined based on indications of self-harm ideation. Our models are based on a feature-rich classification framework which includes lexical, psycholinguistic, contextual and topic modeling features. Our approaches improve the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Using the proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate a decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.
Export
BibTeX
@online{Cohan_arXiv2017, TITLE = {Triaging Content Severity in Online Mental Health Forums}, AUTHOR = {Cohan, Arman and Young, Sydney and Yates, Andrew and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1702.06875}, EPRINT = {1702.06875}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We present a framework for triaging user content into four severity categories which are defined based on indications of self-harm ideation. Our models are based on a feature-rich classification framework which includes lexical, psycholinguistic, contextual and topic modeling features. Our approaches improve the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Using the proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate a decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.}, }
Endnote
%0 Report %A Cohan, Arman %A Young, Sydney %A Yates, Andrew %A Goharian, Nazli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Triaging Content Severity in Online Mental Health Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06AF-F %U http://arxiv.org/abs/1702.06875 %D 2017 %X Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We present a framework for triaging user content into four severity categories which are defined based on indications of self-harm ideation. Our models are based on a feature-rich classification framework which includes lexical, psycholinguistic, contextual and topic modeling features. Our approaches improve the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Using the proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate a decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need. %K Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI
[33]
C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel, “Towards Real-Time Road Traffic Analytics using Telco Big Data,” in BIRTE ’17, Eleventh International Workshop on Real-Time Business Intelligence and Analytics, Munich, Germany, 2017.
Export
BibTeX
@inproceedings{birte17traffictbd, TITLE = {Towards Real-Time Road Traffic Analytics using {Telco Big Data}}, AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.}, LANGUAGE = {eng}, ISBN = {978-1-4503-5425-7}, DOI = {10.1145/3129292.3129296}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {BIRTE '17, Eleventh International Workshop on Real-Time Business Intelligence and Analytics}, EDITOR = {Chatziantoniou, Damianos and Castellanos, Malu and Chrysanthis, Panos K.}, EID = {5}, ADDRESS = {Munich, Germany}, }
Endnote
%0 Conference Proceedings %A Costa, Constantinos %A Chatzimilioudis, Georgios %A Zeinalipour-Yazti, Demetrios %A Mokbel, Mohamed F. %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Towards Real-Time Road Traffic Analytics using Telco Big Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-DDB7-A %R 10.1145/3129292.3129296 %D 2017 %B Eleventh International Workshop on Real-Time Business Intelligence and Analytics %Z date of event: 2017-08-28 - 2017-08-28 %C Munich, Germany %B BIRTE '17 %E Chatziantoniou, Damianos; Castellanos, Malu; Chrysanthis, Panos K. %Z sequence number: 5 %I ACM %@ 978-1-4503-5425-7
[34]
C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel, “SPATE: Compacting and Exploring Telco Big Data,” in ICDE 2017, 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 2017.
Export
BibTeX
@inproceedings{icde17-spate-demo, TITLE = {{SPATE}: Compacting and Exploring Telco Big Data}, AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.}, LANGUAGE = {eng}, ISBN = {978-1-5090-6544-8}, DOI = {10.1109/ICDE.2017.203}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICDE 2017, 33rd IEEE International Conference on Data Engineering}, PAGES = {1419--1420}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Costa, Constantinos %A Chatzimilioudis, Georgios %A Zeinalipour-Yazti, Demetrios %A Mokbel, Mohamed F. %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T SPATE: Compacting and Exploring Telco Big Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-62BA-5 %R 10.1109/ICDE.2017.203 %D 2017 %B 33rd IEEE International Conference on Data Engineering %Z date of event: 2017-04-19 - 2017-04-22 %C San Diego, CA, USA %B ICDE 2017 %P 1419 - 1420 %I IEEE %@ 978-1-5090-6544-8
[35]
C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, and M. F. Mokbel, “Efficient Exploration of Telco Big Data with Compression and Decaying,” in ICDE 2017, 33rd IEEE International Conference on Data Engineering, San Diego, CA, USA, 2017.
Export
BibTeX
@inproceedings{icde17-spate, TITLE = {Efficient Exploration of Telco Big Data with Compression and Decaying}, AUTHOR = {Costa, Constantinos and Chatzimilioudis, Georgios and Zeinalipour-Yazti, Demetrios and Mokbel, Mohamed F.}, LANGUAGE = {eng}, ISBN = {978-1-5090-6544-8}, DOI = {10.1109/ICDE.2017.175}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICDE 2017, 33rd IEEE International Conference on Data Engineering}, PAGES = {1332--1343}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Costa, Constantinos %A Chatzimilioudis, Georgios %A Zeinalipour-Yazti, Demetrios %A Mokbel, Mohamed F. %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Efficient Exploration of Telco Big Data with Compression and Decaying : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-62B3-4 %R 10.1109/ICDE.2017.175 %D 2017 %B 33rd IEEE International Conference on Data Engineering %Z date of event: 2017-04-19 - 2017-04-22 %C San Diego, CA, USA %B ICDE 2017 %P 1332 - 1343 %I IEEE %@ 978-1-5090-6544-8
[36]
S. Das, K. Berberich, D. Klakow, A. Mishra, and V. Setty, “Estimating Event Focus Time with Distributed Representation of Words,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Time is an important dimension as it aids in disambiguating and understanding news- worthy events that happened in the past. It helps in chronological ordering of events to understand its causality, evolution, and ramifications. In Information Retrieval, time alongside text is known to improve the quality of search results. So, making use of the temporal dimensionality in the text-based analysis is an interesting idea to explore. Considering the importance of time, methods to automatically resolve temporal foci’s of events are essential. In this thesis, we try to solve this research question by training our models on two different kinds of corpora and then evaluate on a set of historical event-queries.
Export
BibTeX
@mastersthesis{dasmaster17, TITLE = {Estimating Event Focus Time with Distributed Representation of Words}, AUTHOR = {Das, Supratim and Berberich, Klaus and Klakow, Dietrich and Mishra, Arunav and Setty, Vinay}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Time is an important dimension as it aids in disambiguating and understanding news- worthy events that happened in the past. It helps in chronological ordering of events to understand its causality, evolution, and ramifications. In Information Retrieval, time alongside text is known to improve the quality of search results. So, making use of the temporal dimensionality in the text-based analysis is an interesting idea to explore. Considering the importance of time, methods to automatically resolve temporal foci{\textquoteright}s of events are essential. In this thesis, we try to solve this research question by training our models on two different kinds of corpora and then evaluate on a set of historical event-queries.}, }
Endnote
%0 Thesis %A Das, Supratim %A Berberich, Klaus %A Klakow, Dietrich %A Mishra, Arunav %A Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Estimating Event Focus Time with Distributed Representation of Words : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-DFF1-7 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P 83 p. %V master %9 master %X Time is an important dimension as it aids in disambiguating and understanding news- worthy events that happened in the past. It helps in chronological ordering of events to understand its causality, evolution, and ramifications. In Information Retrieval, time alongside text is known to improve the quality of search results. So, making use of the temporal dimensionality in the text-based analysis is an interesting idea to explore. Considering the importance of time, methods to automatically resolve temporal foci’s of events are essential. In this thesis, we try to solve this research question by training our models on two different kinds of corpora and then evaluate on a set of historical event-queries.
[37]
S. Das, A. Mishra, K. Berberich, and V. Setty, “Estimating Event Focus Time Using Neural Word Embeddings,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Das_CIKM2017, TITLE = {Estimating Event Focus Time Using Neural Word Embeddings}, AUTHOR = {Das, Supratim and Mishra, Arunav and Berberich, Klaus and Setty, Vinay}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133131}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {2039--2042}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Das, Supratim %A Mishra, Arunav %A Berberich, Klaus %A Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Estimating Event Focus Time Using Neural Word Embeddings : %G eng %U http://hdl.handle.net/21.11116/0000-0000-635B-B %R 10.1145/3132847.3133131 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 2039 - 2042 %I ACM %@ 978-1-4503-4918-5
[38]
S. Dutta, “Efficient knowledge Management for Named Entities from Text,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations. A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge acquisition from texts and its representation in knowledge repositories. This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented.
Export
BibTeX
@phdthesis{duttaphd17, TITLE = {Efficient knowledge Management for Named Entities from Text}, AUTHOR = {Dutta, Sourav}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-67924}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations. A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge acquisition from texts and its representation in knowledge repositories. This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented.}, }
Endnote
%0 Thesis %A Dutta, Sourav %Y Weikum, Gerhard %A referee: Nejdl, Wolfgang %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficient knowledge Management for Named Entities from Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-A793-E %U urn:nbn:de:bsz:291-scidok-67924 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P xv, 134 p. %V phd %9 phd %X The evolution of search from keywords to entities has necessitated the efficient harvesting and management of entity-centric information for constructing knowledge bases catering to various applications such as semantic search, question answering, and information retrieval. The vast amounts of natural language texts available across diverse domains on the Web provide rich sources for discovering facts about named entities such as people, places, and organizations. A key challenge, in this regard, entails the need for precise identification and disambiguation of entities across documents for extraction of attributes/relations and their proper representation in knowledge bases. Additionally, the applicability of such repositories not only involves the quality and accuracy of the stored information, but also storage management and query processing efficiency. This dissertation aims to tackle the above problems by presenting efficient approaches for entity-centric knowledge acquisition from texts and its representation in knowledge repositories. This dissertation presents a robust approach for identifying text phrases pertaining to the same named entity across huge corpora, and their disambiguation to canonical entities present in a knowledge base, by using enriched semantic contexts and link validation encapsulated in a hierarchical clustering framework. This work further presents language and consistency features for classification models to compute the credibility of obtained textual facts, ensuring quality of the extracted information. Finally, an encoding algorithm, using frequent term detection and improved data locality, to represent entities for enhanced knowledge base storage and query performance is presented. %U http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=dehttp://scidok.sulb.uni-saarland.de/volltexte/2017/6792/
[39]
S. Eslami, A. J. Biega, R. S. Roy, and G. Weikum, “Privacy of Hidden Profiles: Utility-Preserving Profile Removal in Online Forums,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Eslami_CIKM2017, TITLE = {Privacy of Hidden Profiles: {U}tility-Preserving Profile Removal in Online Forums}, AUTHOR = {Eslami, Sedigheh and Biega, Asia J. and Roy, Rishiraj Saha and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133140}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {2063--2066}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Eslami, Sedigheh %A Biega, Asia J. %A Roy, Rishiraj Saha %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Privacy of Hidden Profiles: Utility-Preserving Profile Removal in Online Forums : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3BA2-7 %R 10.1145/3132847.3133140 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 2063 - 2066 %I ACM %@ 978-1-4503-4918-5
[40]
S. Eslami, “Utility-preserving Profile Removal in Online Forums,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{EslamiMSc2017, TITLE = {Utility-preserving Profile Removal in Online Forums}, AUTHOR = {Eslami, Sedigheh}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Eslami, Sedigheh %Y Weikum, Gerhard %A referee: Roy, Rishiraj Saha %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Utility-preserving Profile Removal in Online Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-9236-4 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P XII, 66 p. %V master %9 master
[41]
E. Galbrun and P. Miettinen, Redescription Mining. Cham: Springer International, 2017.
Export
BibTeX
@book{galbrun18redescription, TITLE = {Redescription Mining}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-3-319-72889-6}, DOI = {10.1007/978-3-319-72889-6}, PUBLISHER = {Springer International}, ADDRESS = {Cham}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Book %A Galbrun, Esther %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Redescription Mining : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90D3-1 %R 10.1007/978-3-319-72889-6 %@ 978-3-319-72889-6 %I Springer International %C Cham %D 2017
[42]
E. Galbrun and P. Miettinen, “Analysing Political Opinions Using Redescription Mining,” in 16th IEEE International Conference on Data Mining Workshops (ICDMW 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{galbrun16analysing, TITLE = {Analysing Political Opinions Using Redescription Mining}, AUTHOR = {Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5090-5910-2}, DOI = {10.1109/ICDMW.2016.121}, PUBLISHER = {IEEE}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining Workshops (ICDMW 2016)}, EDITOR = {Domeniconi, Carlotta and Gullo, Francesco and Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong}, PAGES = {422--427}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Galbrun, Esther %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Analysing Political Opinions Using Redescription Mining : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2247-5 %R 10.1109/ICDMW.2016.121 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2015-12-12 - 2015-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining Workshops %E Domeniconi, Carlotta; Gullo, Francesco; Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong %P 422 - 427 %I IEEE %@ 978-1-5090-5910-2
[43]
K. Gashteovski, R. Gemulla, and L. Del Corro, “MinIE: Minimizing Facts in Open Information Extraction,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{DBLP:conf/emnlp/GashteovskiGC17, TITLE = {{MinIE}: {M}inimizing Facts in Open Information Extraction}, AUTHOR = {Gashteovski, Kiril and Gemulla, Rainer and Del Corro, Luciano}, LANGUAGE = {eng}, ISBN = {978-1-945626-83-8}, URL = {http://aclanthology.info/papers/D17-1277/d17-1277}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {2620--2630}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Gashteovski, Kiril %A Gemulla, Rainer %A Del Corro, Luciano %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T MinIE: Minimizing Facts in Open Information Extraction : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-30F4-2 %U http://aclanthology.info/papers/D17-1277/d17-1277 %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 2620 - 2630 %I ACL %@ 978-1-945626-83-8 %U http://www.aclweb.org/anthology/D17-1277
[44]
X. Ge, A. Daphalapurkar, M. Shmipi, K. Darpun, K. Pelechrinis, P. K. Chrysanthis, and D. Zeinalipour-Yazti, “Data-driven Serendipity Navigation in Urban Places,” in IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017), Atlanta, GA, USA, 2017.
Export
BibTeX
@inproceedings{icdcs17-serendipity-demo, TITLE = {Data-driven Serendipity Navigation in Urban Places}, AUTHOR = {Ge, Xiaoyi and Daphalapurkar, Ameya and Shmipi, Manali and Darpun, Kohli and Pelechrinis, Konstantinos and Chrysanthis, Panos K. and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-1792-2}, DOI = {10.1109/ICDCS.2017.286}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {IEEE 37th International Conference on Distributed Computing Systems (ICDCS 2017)}, EDITOR = {Lee, Kisung and Liu, Ling}, PAGES = {2501--2504}, ADDRESS = {Atlanta, GA, USA}, }
Endnote
%0 Conference Proceedings %A Ge, Xiaoyi %A Daphalapurkar, Ameya %A Shmipi, Manali %A Darpun, Kohli %A Pelechrinis, Konstantinos %A Chrysanthis, Panos K. %A Zeinalipour-Yazti, Demetrios %+ External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Data-driven Serendipity Navigation in Urban Places : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-082B-7 %R 10.1109/ICDCS.2017.286 %D 2017 %B 37th IEEE International Conference on Distributed Computing Systems %Z date of event: 2017-06-05 - 2017-06-08 %C Atlanta, GA, USA %B IEEE 37th International Conference on Distributed Computing Systems %E Lee, Kisung; Liu, Ling %P 2501 - 2504 %I IEEE %@ 978-1-5386-1792-2
[45]
B. Goldsmith, M. Boley, J. Vreeken, M. Scheffler, and L. Ghiringhelli,, “Uncovering Structure-property Relationships of Materials by Subgroup Discovery,” New Journal of Physics, vol. 19, no. 1, 2017.
Export
BibTeX
@article{goldsmith:17:gold, TITLE = {Uncovering Structure-property Relationships of Materials by Subgroup Discovery}, AUTHOR = {Goldsmith, Brian and Boley, Mario and Vreeken, Jilles and Scheffler, Matthias and Ghiringhelli,, Luca}, LANGUAGE = {eng}, ISSN = {1367-2630}, DOI = {10.1088/1367-2630/aa57c2}, PUBLISHER = {IOP Publishing}, ADDRESS = {Bristol}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {New Journal of Physics}, VOLUME = {19}, NUMBER = {1}, EID = {013031}, }
Endnote
%0 Journal Article %A Goldsmith, Brian %A Boley, Mario %A Vreeken, Jilles %A Scheffler, Matthias %A Ghiringhelli,, Luca %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Uncovering Structure-property Relationships of Materials by Subgroup Discovery : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BF5-4 %R 10.1088/1367-2630/aa57c2 %7 2017 %D 2017 %J New Journal of Physics %O New J. Phys. %V 19 %N 1 %Z sequence number: 013031 %I IOP Publishing %C Bristol %@ false %U http://iopscience.iop.org/article/10.1088/1367-2630/aa57c2
[46]
A. Grycner, “Constructing Lexicons of Relational Phrases,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus.
Export
BibTeX
@phdthesis{Grynerphd17, TITLE = {Constructing Lexicons of Relational Phrases}, AUTHOR = {Grycner, Adam}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-69101}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus.}, }
Endnote
%0 Thesis %A Grycner, Adam %Y Weikum, Gerhard %A referee: Klakow, Dietrich %A referee: Ponzetto, Simone Paolo %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Constructing Lexicons of Relational Phrases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-933B-1 %U urn:nbn:de:bsz:291-scidok-69101 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P 125 p. %V phd %9 phd %X Knowledge Bases are one of the key components of Natural Language Understanding systems. For example, DBpedia, YAGO, and Wikidata capture and organize knowledge about named entities and relations between them, which is often crucial for tasks like Question Answering and Named Entity Disambiguation. While Knowledge Bases have good coverage of prominent entities, they are often limited with respect to relations. The goal of this thesis is to bridge this gap and automatically create lexicons of textual representations of relations, namely relational phrases. The lexicons should contain information about paraphrases, hierarchy, as well as semantic types of arguments of relational phrases. The thesis makes three main contributions. The first contribution addresses disambiguating relational phrases by aligning them with the WordNet dictionary. Moreover, the alignment allows imposing the WordNet hierarchy on the relational phrases. The second contribution proposes a method for graph construction of relations using Probabilistic Graphical Models. In addition, we apply this model to relation paraphrasing. The third contribution presents a method for constructing a lexicon of relational paraphrases with fine-grained semantic typing of arguments. This method is based on information from a multilingual parallel corpus. %U http://scidok.sulb.uni-saarland.de/volltexte/2017/6910/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de
[47]
A. Guimarães, L. Wang, and G. Weikum, “Us and Them: Adversarial Politics on Twitter,” in 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017 ), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{Guimaraes_ICDMW2017, TITLE = {Us and Them: {A}dversarial Politics on {Twitter}}, AUTHOR = {Guimar{\~a}es, Anna and Wang, Liqiang and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-5386-1480-8}, DOI = {10.1109/ICDMW.2017.119}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining Workshops (ICDMW 2017 )}, EDITOR = {Gottumukkala, Raju and Ning, Xia and Dong, Guozhu and Raghavan, Vijav and Aluru, Srinivas and Karypis, George and Miele, Lucio and Wu, Xindong}, PAGES = {872--877}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Guimarães, Anna %A Wang, Liqiang %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Us and Them: Adversarial Politics on Twitter : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B89-4 %R 10.1109/ICDMW.2017.119 %D 2017 %B 17th International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining Workshops %E Gottumukkala, Raju; Ning, Xia; Dong, Guozhu; Raghavan, Vijav; Aluru, Srinivas; Karypis, George; Miele, Lucio; Wu, Xindong %P 872 - 877 %I IEEE %@ 978-1-5386-1480-8
[48]
D. Gupta, K. Berberich, J. Strötgen, and D. Zeinalipour-Yazti, “Generating Semantic Aspects for Queries,” Max-Planck-Institut für Informatik, Saarbrücken, MPI–I–2017–5-001, 2017.
Abstract
Ambiguous information needs expressed in a limited number of keywords often result in long-winded query sessions and many query reformulations. In this work, we tackle ambiguous queries by providing automatically gen- erated semantic aspects that can guide users to satisfying results regarding their information needs. To generate semantic aspects, we use semantic an- notations available in the documents and leverage models representing the semantic relationships between annotations of the same type. The aspects in turn provide us a foundation for representing text in a completely structured manner, thereby allowing for a semantically-motivated organization of search results. We evaluate our approach on a testbed of over 5,000 aspects on Web scale document collections amounting to more than 450 million documents, with temporal, geographic, and named entity annotations as example dimen- sions. Our experimental results show that our general approach is Web-scale ready and finds relevant aspects for highly ambiguous queries.
Export
BibTeX
@techreport{Guptareport2007, TITLE = {Generating Semantic Aspects for Queries}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus and Str{\"o}tgen, Jannik and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISSN = {0946-011X}, NUMBER = {MPI–I–2017–5-001}, INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Ambiguous information needs expressed in a limited number of keywords often result in long-winded query sessions and many query reformulations. In this work, we tackle ambiguous queries by providing automatically gen- erated semantic aspects that can guide users to satisfying results regarding their information needs. To generate semantic aspects, we use semantic an- notations available in the documents and leverage models representing the semantic relationships between annotations of the same type. The aspects in turn provide us a foundation for representing text in a completely structured manner, thereby allowing for a semantically-motivated organization of search results. We evaluate our approach on a testbed of over 5,000 aspects on Web scale document collections amounting to more than 450 million documents, with temporal, geographic, and named entity annotations as example dimen- sions. Our experimental results show that our general approach is Web-scale ready and finds relevant aspects for highly ambiguous queries.}, TYPE = {Research Report}, }
Endnote
%0 Report %A Gupta, Dhruv %A Berberich, Klaus %A Strötgen, Jannik %A Zeinalipour-Yazti, Demetrios %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Generating Semantic Aspects for Queries : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-07DD-0 %Y Max-Planck-Institut für Informatik %C Saarbrücken %D 2017 %P 39 p. %X Ambiguous information needs expressed in a limited number of keywords often result in long-winded query sessions and many query reformulations. In this work, we tackle ambiguous queries by providing automatically gen- erated semantic aspects that can guide users to satisfying results regarding their information needs. To generate semantic aspects, we use semantic an- notations available in the documents and leverage models representing the semantic relationships between annotations of the same type. The aspects in turn provide us a foundation for representing text in a completely structured manner, thereby allowing for a semantically-motivated organization of search results. We evaluate our approach on a testbed of over 5,000 aspects on Web scale document collections amounting to more than 450 million documents, with temporal, geographic, and named entity annotations as example dimen- sions. Our experimental results show that our general approach is Web-scale ready and finds relevant aspects for highly ambiguous queries. %B Research Report %@ false
[49]
S. Gurajada, “Distributed Querying of Large Labeled Graphs,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the “Labeled Graph”, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner. • Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices. • Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner. • Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner. To this end, we propose a prototype distributed engine, coined “TriAD” (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets.
Export
BibTeX
@phdthesis{guraphd2017, TITLE = {Distributed Querying of Large Labeled Graphs}, AUTHOR = {Gurajada, Sairam}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-67738}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the {\textquotedblleft}Labeled Graph{\textquotedblright}, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner. \mbox{$\bullet$} Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices. \mbox{$\bullet$} Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner. \mbox{$\bullet$} Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner. To this end, we propose a prototype distributed engine, coined {\textquotedblleft}TriAD{\textquotedblright} (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets.}, }
Endnote
%0 Thesis %A Gurajada, Sairam %Y Theobald, Martin %A referee: Weikum, Gerhard %A referee: Özsu, M. Tamer %A referee: Michel, Sebastian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Distributed Querying of Large Labeled Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-8202-E %U urn:nbn:de:bsz:291-scidok-67738 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P x, 167 p. %V phd %9 phd %X Graph is a vital abstract data type that has profound significance in several applications. Because of its versitality, graphs have been adapted into several different forms and one such adaption with many practical applications is the “Labeled Graph”, where vertices and edges are labeled. An enormous research effort has been invested in to the task of managing and querying graphs, yet a lot challenges are left unsolved. In this thesis, we advance the state-of-the-art for the following query models, and propose a distributed solution to process them in an efficient and scalable manner. • Set Reachability. We formalize and investigate a generalization of the basic notion of reachability, called set reachability. Set reachability deals with finding all reachable pairs for a given source and target sets. We present a non-iterative distributed solution that takes only a single round of communication for any set reachability query. This is achieved by precomputation, replication, and indexing of partial reachabilities among the boundary vertices. • Basic Graph Patterns (BGP). Supported by majority of query languages, BGP queries are a common mode of querying knowledge graphs, biological datasets, etc. We present a novel distributed architecture that relies on the concepts of asynchronous executions, join-ahead pruning, and a multi-threaded query processing framework to process BGP queries in an efficient and scalable manner. • Generalized Graph Patterns (GGP). These queries combine the semantics of pattern matching and navigational queries, and are popular in scenarios where the schema of an underlying graph is either unknown or partially known. We present a distributed solution with bimodal indexing layout that individually support efficient processing of BGP queries and navigational queries. Furthermore, we design a unified query optimizer and a processor to efficiently process GGP queries and also in a scalable manner. To this end, we propose a prototype distributed engine, coined “TriAD” (Triple Asynchronous and Distributed) that supports all the aforementioned query models. We also provide a detailed empirical evaluation of TriAD in comparison to several state-of-the-art systems over multiple real-world and synthetic datasets. %U http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=dehttp://scidok.sulb.uni-saarland.de/volltexte/2017/6773/
[50]
K. Hui, “Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieva,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.
Export
BibTeX
@phdthesis{HUiphd2017, TITLE = {Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieva}, AUTHOR = {Hui, Kai}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-269423}, DOI = {10.22028/D291-26942}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.}, }
Endnote
%0 Thesis %A Hui, Kai %Y Berberich, Klaus %A referee: Weikum, Gerhard %A referee: Dietz, Laura %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Automatic Methods for Low-Cost Evaluation and Position-Aware Models for Neural Information Retrieva : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-8921-E %U urn:nbn:de:bsz:291-scidok-ds-269423 %R 10.22028/D291-26942 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P xiv, 130 p. %V phd %9 phd %X An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26894
[51]
K. Hui and K. Berberich, “Merge-Tie-Judge: Low-Cost Preference Judgments with Ties,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.
Export
BibTeX
@inproceedings{HuiICTIR2017b, TITLE = {{Merge-Tie-Judge}: Low-Cost Preference Judgments with Ties}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4490-6}, DOI = {10.1145/3121050.3121095}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval}, PAGES = {277--280}, ADDRESS = {Amsterdam, The Netherlands}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Merge-Tie-Judge: Low-Cost Preference Judgments with Ties : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-064B-2 %R 10.1145/3121050.3121095 %D 2017 %B 7th International Conference on the Theory of Information Retrieval %Z date of event: 2017-10-01 - 2017-10-04 %C Amsterdam, The Netherlands %B ICTIR'17 %P 277 - 280 %I ACM %@ 978-1-4503-4490-6
[52]
K. Hui, K. Berberich, and I. Mele, “Dealing with Incomplete Judgments in Cascade Measures,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.
Export
BibTeX
@inproceedings{HuiICTIR2017, TITLE = {Dealing with Incomplete Judgments in Cascade Measures}, AUTHOR = {Hui, Kai and Berberich, Klaus and Mele, Ida}, LANGUAGE = {eng}, ISBN = {978-1-4503-4490-6}, DOI = {10.1145/3121050.3121064}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval}, PAGES = {83--90}, ADDRESS = {Amsterdam, The Netherlands}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %A Mele, Ida %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Dealing with Incomplete Judgments in Cascade Measures : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0649-6 %R 10.1145/3121050.3121064 %D 2017 %B 7th International Conference on the Theory of Information Retrieval %Z date of event: 2017-10-01 - 2017-10-04 %C Amsterdam, The Netherlands %B ICTIR'17 %P 83 - 90 %I ACM %@ 978-1-4503-4490-6
[53]
K. Hui and K. Berberich, “Low-Cost Preference Judgment via Ties,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{hui2017short, TITLE = {Low-Cost Preference Judgment via Ties}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_58}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {626--632}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Low-Cost Preference Judgment via Ties : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1F7B-A %R 10.1007/978-3-319-56608-5_58 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2017-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 626 - 632 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[54]
K. Hui and K. Berberich, “Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{hui2017full, TITLE = {Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_19}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {239--251}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1F75-5 %R 10.1007/978-3-319-56608-5_19 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2016-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 239 - 251 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[55]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “PACRR: A Position-Aware Neural IR Model for Relevance Matching,” 2017. [Online]. Available: http://arxiv.org/abs/1704.03940. (arXiv: 1704.03940)
Abstract
In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query. While previous works have successfully captured unigram term matches, how to fully employ position-dependent information such as proximity and term dependencies has been insufficiently explored. In this work, we propose a novel neural IR model named PACRR (Position-Aware Convolutional-Recurrent Relevance), aiming at better modeling position-dependent interactions between a query and a document via convolutional layers as well as recurrent layers. Extensive experiments on six years' TREC Web Track data confirm that the proposed model yields better results under different benchmarks.
Export
BibTeX
@online{DBLP:journals/corr/HuiYBM17, TITLE = {{PACRR}: A Position-Aware Neural {IR} Model for Relevance Matching}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1704.03940}, EPRINT = {1704.03940}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query. While previous works have successfully captured unigram term matches, how to fully employ position-dependent information such as proximity and term dependencies has been insufficiently explored. In this work, we propose a novel neural IR model named PACRR (Position-Aware Convolutional-Recurrent Relevance), aiming at better modeling position-dependent interactions between a query and a document via convolutional layers as well as recurrent layers. Extensive experiments on six years' TREC Web Track data confirm that the proposed model yields better results under different benchmarks.}, }
Endnote
%0 Report %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T PACRR: A Position-Aware Neural IR Model for Relevance Matching : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90A8-3 %U http://arxiv.org/abs/1704.03940 %D 2017 %X In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query. While previous works have successfully captured unigram term matches, how to fully employ position-dependent information such as proximity and term dependencies has been insufficiently explored. In this work, we propose a novel neural IR model named PACRR (Position-Aware Convolutional-Recurrent Relevance), aiming at better modeling position-dependent interactions between a query and a document via convolutional layers as well as recurrent layers. Extensive experiments on six years' TREC Web Track data confirm that the proposed model yields better results under different benchmarks. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[56]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “PACRR: A Position-Aware Neural IR Model for Relevance Matching,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{HuiENMLP2017, TITLE = {{PACRR}: A Position-Aware Neural {IR} Model for Relevance Matching}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-945626-83-8}, URL = {https://aclanthology.info/pdf/D/D17/D17-1111.pdf}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {1060--1069}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T PACRR: A Position-Aware Neural IR Model for Relevance Matching : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-063F-D %U https://aclanthology.info/pdf/D/D17/D17-1111.pdf %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 1060 - 1069 %I ACL %@ 978-1-945626-83-8 %U https://aclanthology.info/pdf/D/D17/D17-1111.pdf
[57]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model,” 2017. [Online]. Available: http://arxiv.org/abs/1706.10192. (arXiv: 1706.10192)
Abstract
Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results.
Export
BibTeX
@online{HuiarXiv2017b, TITLE = {{RE-PACRR}: {A} Context and Density-Aware Neural Information Retrieval Model}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1706.10192}, EPRINT = {1706.10192}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results.}, }
Endnote
%0 Report %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-064D-D %U http://arxiv.org/abs/1706.10192 %D 2017 %X Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[58]
K. Hui, A. Yates, K. Berberich, and G. de Melo, “Position-Aware Representations for Relevance Matching in Neural Information Retrieval,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{HuiWWW2017, TITLE = {Position-Aware Representations for Relevance Matching in Neural Information Retrieval}, AUTHOR = {Hui, Kai and Yates, Andrew and Berberich, Klaus and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3054258}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {799--800}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Yates, Andrew %A Berberich, Klaus %A de Melo, Gerard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Position-Aware Representations for Relevance Matching in Neural Information Retrieval : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90A4-B %R 10.1145/3041021.3054258 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 799 - 800 %I ACM %@ 978-1-4503-4914-7
[59]
R. Jäschke, J. Strötgen, E. Krotova, and F. Fischer, “„Der Helmut Kohl unter den Brotaufstrichen“ - Zur Extraktion vossianischer Antonomasien aus großen Zeitungskorpora,” in DHd 2017, 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V., Bern, Switzerland, 2017.
Export
BibTeX
@inproceedings{JaeschkeEtAl2017_DHD, TITLE = {{{``Der Helmut Kohl unter den Brotaufstrichen'' -- Zur Extraktion vossianischer Antonomasien aus gro{\ss}en Zeitungskorpora}}}, AUTHOR = {J{\"a}schke, Robert and Str{\"o}tgen, Jannik and Krotova, Elena and Fischer, Frank}, LANGUAGE = {deu}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {DHd 2017, 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.}, PAGES = {120--124}, ADDRESS = {Bern, Switzerland}, }
Endnote
%0 Conference Proceedings %A Jäschke, Robert %A Strötgen, Jannik %A Krotova, Elena %A Fischer, Frank %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T „Der Helmut Kohl unter den Brotaufstrichen“ - Zur Extraktion vossianischer Antonomasien aus großen Zeitungskorpora : %G deu %U http://hdl.handle.net/11858/00-001M-0000-002C-4E05-A %D 2017 %B 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V. %Z date of event: 2017-02-13 - 2017-02-18 %C Bern, Switzerland %B DHd 2017 %P 120 - 124
[60]
H. Jhamtani, R. S. Roy, N. Chhaya, and E. Nyberg, “Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{JhamtaniECIR2017, TITLE = {Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages}, AUTHOR = {Jhamtani, Harsh and Roy, Rishiraj Saha and Chhaya, Niyati and Nyberg, Eric}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_41}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {506--512}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Jhamtani, Harsh %A Roy, Rishiraj Saha %A Chhaya, Niyati %A Nyberg, Eric %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Leveraging Site Search Logs to Identify Missing Content on Enterprise Webpages : %G eng %U http://hdl.handle.net/21.11116/0000-0000-DB33-0 %R 10.1007/978-3-319-56608-5_41 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2017-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 506 - 512 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[61]
J. Kalofolias, E. Galbrun, and P. Miettinen, “From Sets of Good Redescriptions to Good Sets of Redescriptions,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{kalofolias16from, TITLE = {From Sets of Good Redescriptions to Good Sets of Redescriptions}, AUTHOR = {Kalofolias, Janis and Galbrun, Esther and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5090-5473-2}, DOI = {10.1109/ICDM.2016.0032}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, PAGES = {211--220}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Kalofolias, Janis %A Galbrun, Esther %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T From Sets of Good Redescriptions to Good Sets of Redescriptions : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-224D-A %R 10.1109/ICDM.2016.0032 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %P 211 - 220 %I IEEE %@ 978-1-5090-5473-2
[62]
J. Kalofolias, M. Boley, and J. Vreeken, “Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{KalofoliasICDM2017, TITLE = {Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups}, AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.29}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {197--206}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Kalofolias, Janis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups : %G eng %U http://hdl.handle.net/21.11116/0000-0000-63C2-5 %R 10.1109/ICDM.2017.29 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 197 - 206 %I IEEE %@ 978-1-5386-3835-4
[63]
J. Kalofolias, M. Boley, and J. Vreeken, “Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups,” 2017. [Online]. Available: http://arxiv.org/abs/1709.07941. (arXiv: 1709.07941)
Abstract
Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.
Export
BibTeX
@online{Kalofolias_arXiv2017, TITLE = {Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups}, AUTHOR = {Kalofolias, Janis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.07941}, EPRINT = {1709.07941}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.}, }
Endnote
%0 Report %A Kalofolias, Janis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0685-D %U http://arxiv.org/abs/1709.07941 %D 2017 %X Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time. %K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI
[64]
S. Karaev and P. Miettinen, “Algorithms for Approximate Subtropical Matrix Factorization,” 2017. [Online]. Available: http://arxiv.org/abs/1707.08872. (arXiv: 1707.08872)
Abstract
Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matrix -- we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow "winner-takes-it-all" interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret.
Export
BibTeX
@online{Karaev_arXiv2017, TITLE = {Algorithms for Approximate Subtropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1707.08872}, EPRINT = {1707.08872}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matrix -- we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow "winner-takes-it-all" interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret.}, }
Endnote
%0 Report %A Karaev, Sanjar %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Algorithms for Approximate Subtropical Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-065A-F %U http://arxiv.org/abs/1707.08872 %D 2017 %X Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matrix -- we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow "winner-takes-it-all" interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen--Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret. %K Computer Science, Learning, cs.LG %U http://people.mpi-inf.mpg.de/~pmiettin/tropical/
[65]
E. Kuzey, “Populating Knowledge bases with Temporal Information,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@phdthesis{KuzeyPhd2017, TITLE = {Populating Knowledge bases with Temporal Information}, AUTHOR = {Kuzey, Erdal}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Kuzey, Erdal %Y Weikum, Gerhard %A referee: de Rijke , Maarten %A referee: Suchanek, Fabian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Populating Knowledge bases with Temporal Information : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-EAE5-7 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P XIV, 143 p. %V phd %9 phd %U http://scidok.sulb.uni-saarland.de/volltexte/2017/6811/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de
[66]
L. Lange, “Time in Newspaper: A Large-Scale Analysis of Temporal Expressions in News Corpora,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{LangeBcS2017, TITLE = {Time in Newspaper: {A} Large-Scale Analysis of Temporal Expressions in News Corpora}, AUTHOR = {Lange, Lukas}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, TYPE = {Bachelor's thesis}, }
Endnote
%0 Thesis %A Lange, Lukas %Y Strötgen, Jannik %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Time in Newspaper: A Large-Scale Analysis of Temporal Expressions in News Corpora : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5D08-B %I Universität des Saarlandes %C Saarbrücken %D 2017 %P 77 p. %V bachelor %9 bachelor
[67]
F. A. Lisi and D. Stepanova, “Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs,” in Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR, London, UK, 2017.
Export
BibTeX
@inproceedings{LisiRuleML2017, TITLE = {Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs}, AUTHOR = {Lisi, Francesca Alessandra and Stepanova, Daria}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-1875-8}, PUBLISHER = {CEUR-WS.org}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR}, EDITOR = {Bassiliades, Nick and Bikakis, Antonis and Constantini, Stefania and Franconi, Enrico and Giurca, Adrian and Kontchakov, Roman and Patkosi, Theodore and Sadri, Fariba and Van Woensel, William}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {1875}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Lisi, Francesca Alessandra %A Stepanova, Daria %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Combining Rule Learning and Nonmonotonic Reasoning for Link Prediction in Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-55FC-8 %D 2017 %B International Joint Conference on Rules and Reasoning %Z date of event: 2017-07-12 - 2017-07-15 %C London, UK %B Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR %E Bassiliades, Nick; Bikakis, Antonis; Constantini, Stefania; Franconi, Enrico; Giurca, Adrian; Kontchakov, Roman; Patkosi, Theodore; Sadri, Fariba; Van Woensel, William %I CEUR-WS.org %B CEUR Workshop Proceedings %N 1875 %@ false %U http://ceur-ws.org/Vol-1875/paper20.pdf
[68]
S. MacAvaney, K. Hui, and A. Yates, “An Approach for Weakly-Supervised Deep Information Retrieval,” 2017. [Online]. Available: http://arxiv.org/abs/1707.00189. (arXiv: 1707.00189)
Abstract
Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.
Export
BibTeX
@online{MacAvaney_arXiv2017, TITLE = {An Approach for Weakly-Supervised Deep Information Retrieval}, AUTHOR = {MacAvaney, Sean and Hui, Kai and Yates, Andrew}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1707.00189}, EPRINT = {1707.00189}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.}, }
Endnote
%0 Report %A MacAvaney, Sean %A Hui, Kai %A Yates, Andrew %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Approach for Weakly-Supervised Deep Information Retrieval : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06C5-C %U http://arxiv.org/abs/1707.00189 %D 2017 %X Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[69]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Approximate Functional Dependencies,” in KDD’17, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 2017.
Export
BibTeX
@inproceedings{MandrosKDD2017, TITLE = {Discovering Reliable Approximate Functional Dependencies}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-4503-4887-4}, DOI = {10.1145/3097983.3098062}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {KDD'17, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, PAGES = {355--363}, ADDRESS = {Halifax, NS, Canada}, }
Endnote
%0 Conference Proceedings %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Reliable Approximate Functional Dependencies : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-065F-5 %R 10.1145/3097983.3098062 %D 2017 %B 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining %Z date of event: 2017-08-13 - 2017-08-17 %C Halifax, NS, Canada %B KDD'17 %P 355 - 363 %I ACM %@ 978-1-4503-4887-4
[70]
P. Mandros, M. Boley, and J. Vreeken, “Discovering Reliable Approximate Functional Dependencies,” 2017. [Online]. Available: http://arxiv.org/abs/1705.09391. (arXiv: 1705.09391)
Abstract
Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.
Export
BibTeX
@online{DBLP:journals/corr/MandrosBV17, TITLE = {Discovering Reliable Approximate Functional Dependencies}, AUTHOR = {Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.09391}, EPRINT = {1705.09391}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.}, }
Endnote
%0 Report %A Mandros, Panagiotis %A Boley, Mario %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering Reliable Approximate Functional Dependencies : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90F8-D %U http://arxiv.org/abs/1705.09391 %D 2017 %X Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity. %K Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT
[71]
A. Marx and J. Vreeken, “Telling Cause from Effect Using MDL-Based Local and Global Regression,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{MarxICDM2017, TITLE = {Telling Cause from Effect Using {MDL}-Based Local and Global Regression}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.40}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {307--316}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Telling Cause from Effect Using MDL-Based Local and Global Regression : %G eng %U http://hdl.handle.net/21.11116/0000-0000-63C4-3 %R 10.1109/ICDM.2017.40 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 307 - 316 %I IEEE %@ 978-1-5386-3835-4
[72]
A. Marx and J. Vreeken, “Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length,” 2017. [Online]. Available: http://arxiv.org/abs/1702.06385. (arXiv: 1702.06385)
Abstract
Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data.
Export
BibTeX
@online{DBLP:journals/corr/MarxV17, TITLE = {Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length}, AUTHOR = {Marx, Alexander and Vreeken, Jilles}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1702.06385}, EPRINT = {1702.06385}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data.}, }
Endnote
%0 Report %A Marx, Alexander %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90EF-3 %U http://arxiv.org/abs/1702.06385 %D 2017 %X Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data. %K Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG
[73]
F. Meawad, M. H. Gad-Elrab, and E. Hemayed, “Designing Mobile Augmented Reality Experiences Using Friendly Markers,” in 4th International Conference on User Science and Engineering (i-USEr 2016), Melaka, Malaysia, 2017.
Export
BibTeX
@inproceedings{Meawad2017, TITLE = {Designing Mobile Augmented Reality Experiences Using Friendly Markers}, AUTHOR = {Meawad, Fatma and Gad-Elrab, Mohamed H. and Hemayed, Elsayed}, LANGUAGE = {eng}, ISBN = {978-1-5090-263-9}, DOI = {10.1109/IUSER.2016.7857937}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {4th International Conference on User Science and Engineering (i-USEr 2016)}, PAGES = {75--80}, ADDRESS = {Melaka, Malaysia}, }
Endnote
%0 Conference Proceedings %A Meawad, Fatma %A Gad-Elrab, Mohamed H. %A Hemayed, Elsayed %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Designing Mobile Augmented Reality Experiences Using Friendly Markers : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-CF28-A %R 10.1109/IUSER.2016.7857937 %D 2017 %B 4th International Conference on User Science and Engineering %Z date of event: 2016-08-23 - 2016-08-25 %C Melaka, Malaysia %B 4th International Conference on User Science and Engineering %P 75 - 80 %I IEEE %@ 978-1-5090-263-9
[74]
S. Metzger, R. Schenkel, and M. Sydow, “QBEES: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation,” Journal of Intelligent Information Systems, vol. 49, no. 3, 2017.
Export
BibTeX
@article{Metzger2017, TITLE = {{QBEES}: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation}, AUTHOR = {Metzger, Steffen and Schenkel, Ralf and Sydow, Marcin}, LANGUAGE = {eng}, DOI = {10.1007/s10844-017-0443-x}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, JOURNAL = {Journal of Intelligent Information Systems}, VOLUME = {49}, NUMBER = {3}, PAGES = {333--366}, }
Endnote
%0 Journal Article %A Metzger, Steffen %A Schenkel, Ralf %A Sydow, Marcin %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T QBEES: Query-by-Example Entity Search in Semantic Knowledge Graphs Based on Maximal Aspects, Diversity-awareness and Relaxation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-557B-8 %R 10.1007/s10844-017-0443-x %7 2017 %D 2017 %J Journal of Intelligent Information Systems %V 49 %N 3 %& 333 %P 333 - 366
[75]
S. Metzler, S. Günnemann, and P. Miettinen, “Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{metzler16hyperbolae, TITLE = {Hyperbolae Are No Hyperbole: {Modelling} Communities That Are Not Cliques}, AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5090-5473-2}, DOI = {10.1109/ICDM.2016.0044}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, PAGES = {330--339}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Metzler, Saskia %A Günnemann, Stephan %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-225F-F %R 10.1109/ICDM.2016.0044 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %P 330 - 339 %I IEEE %@ 978-1-5090-5473-2
[76]
P. Mirza, S. Razniewski, F. Darari, and G. Weikum, “Cardinal Virtues: Extracting Relation Cardinalities from Text,” in The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 2017.
Export
BibTeX
@inproceedings{MirzaACL2017, TITLE = {Cardinal Virtues: {E}xtracting Relation Cardinalities from Text}, AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-76-0}, DOI = {10.18653/v1/P17-2055}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)}, PAGES = {347--351}, ADDRESS = {Vancouver, Canada}, }
Endnote
%0 Conference Proceedings %A Mirza, Paramita %A Razniewski, Simon %A Darari, Fariz %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cardinal Virtues: Extracting Relation Cardinalities from Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-F9F8-7 %R 10.18653/v1/P17-2055 %D 2017 %B The 55th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2017-07-30 - 2017-08-04 %C Vancouver, Canada %B The 55th Annual Meeting of the Association for Computational Linguistics %P 347 - 351 %I ACL %@ 978-1-945626-76-0 %U http://aclweb.org/anthology/P17-2055
[77]
P. Mirza, S. Razniewski, F. Darari, and G. Weikum, “Cardinal Virtues: Extracting Relation Cardinalities from Text,” 2017. [Online]. Available: http://arxiv.org/abs/1704.04455. (arXiv: 1704.04455)
Abstract
Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations.
Export
BibTeX
@online{Mirza2017, TITLE = {Cardinal Virtues: Extracting Relation Cardinalities from Text}, AUTHOR = {Mirza, Paramita and Razniewski, Simon and Darari, Fariz and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1704.04455}, EPRINT = {1704.04455}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations.}, }
Endnote
%0 Report %A Mirza, Paramita %A Razniewski, Simon %A Darari, Fariz %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cardinal Virtues: Extracting Relation Cardinalities from Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8128-9 %U http://arxiv.org/abs/1704.04455 %D 2017 %X Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. We introduce this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. We present a distant supervision method using conditional random fields. A preliminary evaluation results in precision between 3% and 55%, depending on the difficulty of relations. %K Computer Science, Computation and Language, cs.CL
[78]
A. Mishra and K. Berberich, “How do Order and Proximity Impact the Readability of Event Summaries?,” in Advances in Information Retrieval (ECIR 2017), Aberdeen, UK, 2017.
Export
BibTeX
@inproceedings{DBLP:conf/ecir/MishraB17, TITLE = {How do Order and Proximity Impact the Readability of Event Summaries?}, AUTHOR = {Mishra, Arunav and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-56607-8}, DOI = {10.1007/978-3-319-56608-5_17}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2017)}, EDITOR = {Jose, Joemon M. and Hauff, Claudia and Altingovde, Ismail Sengor and Song, Dawei and Albakour, Dyaa and Watt, Stuart and Tait, John}, PAGES = {212--225}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10193}, ADDRESS = {Aberdeen, UK}, }
Endnote
%0 Conference Proceedings %A Mishra, Arunav %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T How do Order and Proximity Impact the Readability of Event Summaries? : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-20D9-B %R 10.1007/978-3-319-56608-5_17 %D 2017 %B 39th European Conference on Information Retrieval %Z date of event: 2017-04-09 - 2017-04-13 %C Aberdeen, UK %B Advances in Information Retrieval %E Jose, Joemon M.; Hauff, Claudia; Altingovde, Ismail Sengor; Song, Dawei; Albakour, Dyaa; Watt, Stuart; Tait, John %P 212 - 225 %I Springer %@ 978-3-319-56607-8 %B Lecture Notes in Computer Science %N 10193
[79]
P. Mrazovic, B. Eravci, J. L. Larriba-Pey, H. Ferhatosmanoglu, and M. Matskin, “Understanding and Predicting Trends in Urban Freight Transport,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.
Export
BibTeX
@inproceedings{MrazovicMDM2017, TITLE = {Understanding and Predicting Trends in Urban Freight Transport}, AUTHOR = {Mrazovic, Petar and Eravci, Bahaeddin and Larriba-Pey, Josep L. and Ferhatosmanoglu, Hakan and Matskin, Mihhail}, LANGUAGE = {eng}, ISBN = {978-1-5386-3932-0}, DOI = {10.1109/MDM.2017.26}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)}, PAGES = {124--133}, ADDRESS = {Daejeon, South Korea}, }
Endnote
%0 Conference Proceedings %A Mrazovic, Petar %A Eravci, Bahaeddin %A Larriba-Pey, Josep L. %A Ferhatosmanoglu, Hakan %A Matskin, Mihhail %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Understanding and Predicting Trends in Urban Freight Transport : %G eng %U http://hdl.handle.net/21.11116/0000-0000-DB41-0 %R 10.1109/MDM.2017.26 %D 2017 %B 18th IEEE International Conference on Mobile Data Management %Z date of event: 2017-05-29 - 2017-06-01 %C Daejeon, South Korea %B 18th IEEE International Conference on Mobile Data Management %P 124 - 133 %I IEEE %@ 978-1-5386-3932-0
[80]
S. Mukherjee, “Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.
Export
BibTeX
@phdthesis{Mukherjeephd17, TITLE = {Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities}, AUTHOR = {Mukherjee, Subhabrata}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-69269}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.}, }
Endnote
%0 Thesis %A Mukherjee, Subhabrata %Y Weikum, Gerhard %A referee: Han, Jiawei %A referee: Günnemann, Stephan %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-A648-0 %U urn:nbn:de:bsz:291-scidok-69269 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P 166 p. %V phd %9 phd %X One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information. %U http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=dehttp://scidok.sulb.uni-saarland.de/volltexte/2017/6926/
[81]
S. Mukherjee and G. Weikum, “People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02667. (arXiv: 1705.02667)
Abstract
Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.
Export
BibTeX
@online{Mukerjee_arXiv1705.02667, TITLE = {People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities}, AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02667}, EPRINT = {1705.02667}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80F7-0 %U http://arxiv.org/abs/1705.02667 %D 2017 %X Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[82]
S. Mukherjee, G. Weikum, and C. Danescu-Niculescu-Mizil, “People on Drugs: Credibility of User Statements in Health Communities,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02522. (arXiv: 1705.02522)
Abstract
Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.
Export
BibTeX
@online{Mukherjee_arXiv2017, TITLE = {People on Drugs: Credibility of User Statements in Health Communities}, AUTHOR = {Mukherjee, Subhabrata and Weikum, Gerhard and Danescu-Niculescu-Mizil, Cristian}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02522}, EPRINT = {1705.02522}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Weikum, Gerhard %A Danescu-Niculescu-Mizil, Cristian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T People on Drugs: Credibility of User Statements in Health Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80FE-2 %U http://arxiv.org/abs/1705.02522 %D 2017 %X Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[83]
S. Mukherjee, H. Lamba, and G. Weikum, “Item Recommendation with Evolving User Preferences and Experience,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02519. (arXiv: 1705.02519)
Abstract
Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels.
Export
BibTeX
@online{Mukherjee2017d, TITLE = {Item Recommendation with Evolving User Preferences and Experience}, AUTHOR = {Mukherjee, Subhabrata and Lamba, Hemank and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02519}, DOI = {10.1109/ICDM.2015.111}, EPRINT = {1705.02519}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Lamba, Hemank %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Item Recommendation with Evolving User Preferences and Experience : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8103-C %R 10.1109/ICDM.2015.111 %U http://arxiv.org/abs/1705.02519 %D 2017 %X Current recommender systems exploit user and item similarities by collaborative filtering. Some advanced methods also consider the temporal evolution of item ratings as a global background process. However, all prior methods disregard the individual evolution of a user's experience level and how this is expressed in the user's writing in a review community. In this paper, we model the joint evolution of user experience, interest in specific item facets, writing style, and rating behavior. This way we can generate individual recommendations that take into account the user's maturity level (e.g., recommending art movies rather than blockbusters for a cinematography expert). As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style. We develop a generative HMM-LDA model to trace user evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time -- with solely user reviews and ratings as observables over time. The facets of a user's interest are drawn from a Latent Dirichlet Allocation (LDA) model derived from her reviews, as a function of her (again latent) experience level. In experiments with five real-world datasets, we show that our model improves the rating prediction over state-of-the-art baselines, by a substantial margin. We also show, in a use-case study, that our model performs well in the assessment of user experience levels. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[84]
S. Mukherjee, S. Guennemann, and G. Weikum, “Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02669. (arXiv: 1705.02669)
Abstract
Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings.
Export
BibTeX
@online{Mukherjee2017, TITLE = {Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion}, AUTHOR = {Mukherjee, Subhabrata and Guennemann, Stephan and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02669}, DOI = {10.1145/2939672.2939780}, EPRINT = {1705.02669}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Guennemann, Stephan %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Personalized Item Recommendation with Continuous Experience Evolution of Users using Brownian Motion : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80BE-3 %R 10.1145/2939672.2939780 %U http://arxiv.org/abs/1705.02669 %D 2017 %X Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has shown that recommender systems benefit from explicit consideration of user experience. However, prior work assumes a fixed number of discrete experience levels, whereas in reality users gain experience and mature continuously over time. This paper presents a new model that captures the continuous evolution of user experience, and the resulting language model in reviews and other posts. Our model is unsupervised and combines principles of Geometric Brownian Motion, Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal progression of user experience and language model respectively. We develop practical algorithms for estimating the model parameters from data and for inference with our model (e.g., to recommend items). Extensive experiments with five real-world datasets show that our model not only fits data better than discrete-model baselines, but also outperforms state-of-the-art methods for predicting item ratings. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[85]
S. Mukherjee, K. Popat, and G. Weikum, “Exploring Latent Semantic Factors to Find Useful Product Reviews,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{MukherjeeSDM2017, TITLE = {Exploring Latent Semantic Factors to Find Useful Product Reviews}, AUTHOR = {Mukherjee, Subhabrata and Popat, Kashyap and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-61197-497-3}, DOI = {10.1137/1.9781611974973.54}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, PAGES = {480--488}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Mukherjee, Subhabrata %A Popat, Kashyap %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exploring Latent Semantic Factors to Find Useful Product Reviews : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4CD4-6 %R 10.1137/1.9781611974973.54 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %P 480 - 488 %I SIAM %@ 978-1-61197-497-3
[86]
S. Mukherjee, K. Popat, and G. Weikum, “Exploring Latent Semantic Factors to Find Useful Product Reviews,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02518. (arXiv: 1705.02518)
Abstract
Online reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers. In contrast to previous works in this domain exploiting a variety of syntactic and community-level features, we delve deep into the semantics of reviews as to what makes them useful, providing interpretable explanation for the same. We identify a set of consistency and semantic factors, all from the text, ratings, and timestamps of user-generated reviews, making our approach generalizable across all communities and domains. We explore review semantics in terms of several latent factors like the expertise of its author, his judgment about the fine-grained facets of the underlying product, and his writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) item facets, and (iii) review helpfulness. Large-scale experiments on five real-world datasets from Amazon show significant improvement over state-of-the-art baselines in predicting and ranking useful reviews.
Export
BibTeX
@online{Mukjherjee2017e, TITLE = {Exploring Latent Semantic Factors to Find Useful Product Reviews}, AUTHOR = {Mukherjee, Subhabrata and Popat, Kashyap and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02518}, EPRINT = {1705.02518}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers. In contrast to previous works in this domain exploiting a variety of syntactic and community-level features, we delve deep into the semantics of reviews as to what makes them useful, providing interpretable explanation for the same. We identify a set of consistency and semantic factors, all from the text, ratings, and timestamps of user-generated reviews, making our approach generalizable across all communities and domains. We explore review semantics in terms of several latent factors like the expertise of its author, his judgment about the fine-grained facets of the underlying product, and his writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) item facets, and (iii) review helpfulness. Large-scale experiments on five real-world datasets from Amazon show significant improvement over state-of-the-art baselines in predicting and ranking useful reviews.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Popat, Kashyap %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exploring Latent Semantic Factors to Find Useful Product Reviews : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-811C-5 %U http://arxiv.org/abs/1705.02518 %D 2017 %X Online reviews provided by consumers are a valuable asset for e-Commerce platforms, influencing potential consumers in making purchasing decisions. However, these reviews are of varying quality, with the useful ones buried deep within a heap of non-informative reviews. In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers. In contrast to previous works in this domain exploiting a variety of syntactic and community-level features, we delve deep into the semantics of reviews as to what makes them useful, providing interpretable explanation for the same. We identify a set of consistency and semantic factors, all from the text, ratings, and timestamps of user-generated reviews, making our approach generalizable across all communities and domains. We explore review semantics in terms of several latent factors like the expertise of its author, his judgment about the fine-grained facets of the underlying product, and his writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) item facets, and (iii) review helpfulness. Large-scale experiments on five real-world datasets from Amazon show significant improvement over state-of-the-art baselines in predicting and ranking useful reviews. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[87]
S. Mukherjee, S. Dutta, and G. Weikum, “Credible Review Detection with Limited Information using Consistency Analysis,” 2017. [Online]. Available: http://arxiv.org/abs/1705.02668. (arXiv: 1705.02668)
Abstract
Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines.
Export
BibTeX
@online{Mukherjee2017b, TITLE = {Credible Review Detection with Limited Information using Consistency Analysis}, AUTHOR = {Mukherjee, Subhabrata and Dutta, Sourav and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1705.02668}, EPRINT = {1705.02668}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines.}, }
Endnote
%0 Report %A Mukherjee, Subhabrata %A Dutta, Sourav %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Credible Review Detection with Limited Information using Consistency Analysis : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80C1-A %U http://arxiv.org/abs/1705.02668 %D 2017 %X Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines. %K Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL,Computer Science, Information Retrieval, cs.IR,cs.SI,Statistics, Machine Learning, stat.ML
[88]
S. Neumann, R. Gemulla, and P. Miettinen, “What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank,” in 16th IEEE International Conference on Data Mining (ICDM 2016), Barcelona, Spain, 2017.
Export
BibTeX
@inproceedings{neumann16what, TITLE = {What You Will Gain By Rounding: {Theory} and Algorithms for Rounding Rank}, AUTHOR = {Neumann, Stefan and Gemulla, Rainer and Miettinen, Pauli}, LANGUAGE = {eng}, DOI = {10.1109/ICDM.2016.147}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {16th IEEE International Conference on Data Mining (ICDM 2016)}, EDITOR = {Bonchi, Francesco and Domingo-Ferrer, Josep and Baeza-Yates, Ricardo and Zhou, Zhi-Hua and Wu, Xindong}, PAGES = {380--389}, ADDRESS = {Barcelona, Spain}, }
Endnote
%0 Conference Proceedings %A Neumann, Stefan %A Gemulla, Rainer %A Miettinen, Pauli %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2265-0 %R 10.1109/ICDM.2016.147 %D 2017 %8 02.02.2017 %B 16th International Conference on Data Mining %Z date of event: 2016-12-12 - 2016-12-15 %C Barcelona, Spain %B 16th IEEE International Conference on Data Mining %E Bonchi, Francesco; Domingo-Ferrer, Josep; Baeza-Yates, Ricardo; Zhou, Zhi-Hua; Wu, Xindong %P 380 - 389 %I IEEE
[89]
S. Neumann and P. Miettinen, “Reductions for Frequency-Based Data Mining Problems,” in 17th IEEE International Conference on Data Mining (ICDM 2017), New Orleans, LA, USA, 2017.
Export
BibTeX
@inproceedings{neumann17reductions, TITLE = {Reductions for Frequency-Based Data Mining Problems}, AUTHOR = {Neumann, Stefan and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-5386-3835-4}, DOI = {10.1109/ICDM.2017.128}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {17th IEEE International Conference on Data Mining (ICDM 2017)}, PAGES = {997--1002}, ADDRESS = {New Orleans, LA, USA}, }
Endnote
%0 Conference Proceedings %A Neumann, Stefan %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Reductions for Frequency-Based Data Mining Problems : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90CE-F %R 10.1109/ICDM.2017.128 %D 2017 %B 17th IEEE International Conference on Data Mining %Z date of event: 2017-11-18 - 2017-11-21 %C New Orleans, LA, USA %B 17th IEEE International Conference on Data Mining %P 997 - 1002 %I IEEE %@ 978-1-5386-3835-4
[90]
S. Neumann and P. Miettinen, “Reductions for Frequency-Based Data Mining Problems,” 2017. [Online]. Available: http://arxiv.org/abs/1709.00900. (arXiv: 1709.00900)
Abstract
Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.
Export
BibTeX
@online{Neumann_arXiv2017, TITLE = {Reductions for Frequency-Based Data Mining Problems}, AUTHOR = {Neumann, Stefan and Miettinen, Pauli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.00900}, EPRINT = {1709.00900}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Studying the computational complexity of problems is one of the -- if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.}, }
Endnote
%0 Report %A Neumann, Stefan %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Reductions for Frequency-Based Data Mining Problems : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0654-C %U http://arxiv.org/abs/1709.00900 %D 2017 %X Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems. %K Computer Science, Computational Complexity, cs.CC
[91]
D. B. Nguyen, “Joint Models for Information and Knowledge Extraction,” Universität des Saarlandes, Saarbrücken, 2017.
Abstract
Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: • High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. • Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. • On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.
Export
BibTeX
@phdthesis{Nguyenphd2017, TITLE = {Joint Models for Information and Knowledge Extraction}, AUTHOR = {Nguyen, Dat Ba}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-ds-269433}, DOI = {10.22028/D291-26943}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: \mbox{$\bullet$} High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. \mbox{$\bullet$} Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. \mbox{$\bullet$} On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.}, }
Endnote
%0 Thesis %A Nguyen, Dat Ba %Y Weikum, Gerhard %A referee: Theobald, Martin %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Joint Models for Information and Knowledge Extraction : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-890F-9 %U urn:nbn:de:bsz:291-scidok-ds-269433 %R 10.22028/D291-26943 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P 89 p. %V phd %9 phd %X Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: • High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. • Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. • On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26895
[92]
D. B. Nguyen, M. Theobald, and G. Weikum, “J-REED: Joint Relation Extraction and Entity Disambiguation,” in CIKM’17, 26th ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017.
Export
BibTeX
@inproceedings{Nguyen_CIKM2017, TITLE = {J-{REED}: {Joint Relation Extraction and Entity Disambiguation}}, AUTHOR = {Nguyen, Dat Ba and Theobald, Martin and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4918-5}, DOI = {10.1145/3132847.3133090}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {CIKM'17, 26th ACM International Conference on Information and Knowledge Management}, PAGES = {2227--2230}, ADDRESS = {Singapore, Singapore}, }
Endnote
%0 Conference Proceedings %A Nguyen, Dat Ba %A Theobald, Martin %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T J-REED: Joint Relation Extraction and Entity Disambiguation : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B9D-E %R 10.1145/3132847.3133090 %D 2017 %B 26th ACM International Conference on Information and Knowledge Management %Z date of event: 2017-11-06 - 2017-11-10 %C Singapore, Singapore %B CIKM'17 %P 2227 - 2230 %I ACM %@ 978-1-4503-4918-5
[93]
A. Nikitin, C. Laoudias, G. Chatzimilioudis, P. Karras, and D. Zeinalipour-Yazti, “ACCES: Offline Accuracy Estimation for Fingerprint-based Localization,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.
Export
BibTeX
@inproceedings{mdm17-spate-demo, TITLE = {{ACCES}: Offline Accuracy Estimation for Fingerprint-based Localization}, AUTHOR = {Nikitin, Artyom and Laoudias, Christos and Chatzimilioudis, Georgios and Karras, Panagiotis and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-3932-0}, DOI = {10.1109/MDM.2017.61}, PUBLISHER = {IEEE Computer Society}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)}, PAGES = {358--359}, ADDRESS = {Daejeon, South Korea}, }
Endnote
%0 Conference Proceedings %A Nikitin, Artyom %A Laoudias, Christos %A Chatzimilioudis, Georgios %A Karras, Panagiotis %A Zeinalipour-Yazti, Demetrios %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ACCES: Offline Accuracy Estimation for Fingerprint-based Localization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-082D-3 %R 10.1109/MDM.2017.61 %D 2017 %B 18th IEEE International Conference on Mobile Data Management %Z date of event: 2017-05-29 - 2017-06-01 %C Daejeon, South Korea %B 18th IEEE International Conference on Mobile Data Management %P 358 - 359 %I IEEE Computer Society %@ 978-1-5386-3932-0
[94]
A. Nikitin, C. Laoudias, G. Chatzimilioudis, P. Karras, and D. Zeinalipour-Yazti, “Indoor Localization Accuracy Estimation from Fingerprint Data,” in 18th IEEE International Conference on Mobile Data Management (MDM 2017), Daejeon, South Korea, 2017.
Export
BibTeX
@inproceedings{mdm17-spate, TITLE = {Indoor Localization Accuracy Estimation from Fingerprint Data}, AUTHOR = {Nikitin, Artyom and Laoudias, Christos and Chatzimilioudis, Georgios and Karras, Panagiotis and Zeinalipour-Yazti, Demetrios}, LANGUAGE = {eng}, ISBN = {978-1-5386-3932-0}, DOI = {10.1109/MDM.2017.34}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {18th IEEE International Conference on Mobile Data Management (MDM 2017)}, PAGES = {196--205}, ADDRESS = {Daejeon, South Korea}, }
Endnote
%0 Conference Proceedings %A Nikitin, Artyom %A Laoudias, Christos %A Chatzimilioudis, Georgios %A Karras, Panagiotis %A Zeinalipour-Yazti, Demetrios %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Indoor Localization Accuracy Estimation from Fingerprint Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-0832-6 %R 10.1109/MDM.2017.34 %D 2017 %B 18th IEEE International Conference on Mobile Data Management %Z date of event: 2017-05-29 - 2017-06-01 %C Daejeon, South Korea %B 18th IEEE International Conference on Mobile Data Management %P 196 - 205 %I IEEE %@ 978-1-5386-3932-0
[95]
S. Paramonov, D. Stepanova, and P. Miettinen, “Hybrid ASP-based Approach to Pattern Mining,” in Lecture Notes in Computer Science, London, UK, 2017, vol. 10364.
Export
BibTeX
@inproceedings{StepanovaRR2017, TITLE = {Hybrid {ASP}-based Approach to Pattern Mining}, AUTHOR = {Paramonov, Sergey and Stepanova, Daria and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-3-319-61251-5}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Rules and Reasoning (RuleML+RR 2017)}, PAGES = {199--214}, BOOKTITLE = {Lecture Notes in Computer Science}, VOLUME = {10364}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Paramonov, Sergey %A Stepanova, Daria %A Miettinen, Pauli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hybrid ASP-based Approach to Pattern Mining : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8450-8 %D 2017 %B International Joint Conference on Rules and Reasoning %Z date of event: 2017-07-12 - 2017-07-15 %C London, UK %B Rules and Reasoning %P 199 - 214 %I Springer %@ 978-3-319-61251-5 %B Lecture Notes in Computer Science %V 10364
[96]
T. Pelilissier Tanon, D. Stepanova, S. Razniewski, P. Mirza, and G. Weikum, “Completeness-Aware Rule Learning from Knowledge Graphs,” in The Semantic Web -- ISWC 2017, Vienna, Austria, 2017.
Export
BibTeX
@inproceedings{StepanovaISWC2017, TITLE = {Completeness-Aware Rule Learning from Knowledge Graphs}, AUTHOR = {Pelilissier Tanon, Thomas and Stepanova, Daria and Razniewski, Simon and Mirza, Paramita and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-68287-7}, DOI = {10.1007/978-3-319-68288-4_30}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Semantic Web -- ISWC 2017}, EDITOR = {d'Amato, Claudia and Fernandez, Miriam and Tamma, Valentina and Lecue, Freddy and Cudr{\'e}-Mauroux, Philippe and Sequeda, Juan and Lange, Christoph and Hefflin, Jeff}, PAGES = {507--525}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {10587}, ADDRESS = {Vienna, Austria}, }
Endnote
%0 Conference Proceedings %A Pelilissier Tanon, Thomas %A Stepanova, Daria %A Razniewski, Simon %A Mirza, Paramita %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Completeness-Aware Rule Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-55D9-3 %R 10.1007/978-3-319-68288-4_30 %D 2017 %B 16th International Semantic Web Conference %Z date of event: 2017-10-21 - 2017-10-25 %C Vienna, Austria %B The Semantic Web -- ISWC 2017 %E d'Amato, Claudia; Fernandez, Miriam; Tamma, Valentina; Lecue, Freddy; Cudré-Mauroux, Philippe; Sequeda, Juan; Lange, Christoph; Hefflin, Jeff %P 507 - 525 %I Springer %@ 978-3-319-68287-7 %B Lecture Notes in Computer Science %N 10587 %U https://iswc2017.ai.wu.ac.at/wp-content/uploads/papers/MainProceedings/324.pdf
[97]
R. Pienta, M. Kahng, Z. Lin, J. Vreeken, P. Talukdar, J. Abello, G. Parameswaran, and D. H. Chau, “FACETS: Adaptive Local Exploration of Large Graphs,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{pienta:17:facets, TITLE = {{FACETS}: {A}daptive Local Exploration of Large Graphs}, AUTHOR = {Pienta, Robert and Kahng, Minsuk and Lin, Zhang and Vreeken, Jilles and Talukdar, Partha and Abello, James and Parameswaran, Ganesh and Chau, Duen Horng}, LANGUAGE = {eng}, ISBN = {978-1-611974-87-4}, DOI = {10.1137/1.9781611974973.67}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, EDITOR = {Chawla, Nitesh and Wang, Wei}, PAGES = {597--605}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Pienta, Robert %A Kahng, Minsuk %A Lin, Zhang %A Vreeken, Jilles %A Talukdar, Partha %A Abello, James %A Parameswaran, Ganesh %A Chau, Duen Horng %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations External Organizations %T FACETS: Adaptive Local Exploration of Large Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4BEA-D %R 10.1137/1.9781611974973.67 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %E Chawla, Nitesh; Wang, Wei %P 597 - 605 %I SIAM %@ 978-1-611974-87-4
[98]
E. Pitoura, P. Tsaparas, G. Flouris, I. Fundulaki, P. Papadakos, S. Abiteboul, and G. Weikum, “On Measuring Bias in Online Information,” ACM SIGMOD Record, vol. 46, no. 4, 2017.
Abstract
Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.
Export
BibTeX
@article{Pitoura2017a, TITLE = {On Measuring Bias in Online Information}, AUTHOR = {Pitoura, Evaggelia and Tsaparas, Panayiotis and Flouris, Giorgos and Fundulaki, Irini and Papadakos, Panagiotis and Abiteboul, Serge and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1145/3186549.3186553}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.}, JOURNAL = {ACM SIGMOD Record}, VOLUME = {46}, NUMBER = {4}, PAGES = {16--21}, }
Endnote
%0 Journal Article %A Pitoura, Evaggelia %A Tsaparas, Panayiotis %A Flouris, Giorgos %A Fundulaki, Irini %A Papadakos, Panagiotis %A Abiteboul, Serge %A Weikum, Gerhard %+ External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T On Measuring Bias in Online Information : %G eng %U http://hdl.handle.net/21.11116/0000-0000-EA0F-9 %R 10.1145/3186549.3186553 %7 2017 %D 2017 %X Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems. %K Computer Science, Databases, cs.DB,Computer Science, Computers and Society, cs.CY %J ACM SIGMOD Record %V 46 %N 4 %& 16 %P 16 - 21 %I ACM %C New York, NY
[99]
E. Pitoura, P. Tsaparas, G. Flouris, I. Fundulaki, P. Papadakos, S. Abiteboul, and G. Weikum, “On Measuring Bias in Online Information,” 2017. [Online]. Available: http://arxiv.org/abs/1704.05730. (arXiv: 1704.05730)
Abstract
Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.
Export
BibTeX
@online{Pitoura2017, TITLE = {On Measuring Bias in Online Information}, AUTHOR = {Pitoura, Evaggelia and Tsaparas, Panayiotis and Flouris, Giorgos and Fundulaki, Irini and Papadakos, Panagiotis and Abiteboul, Serge and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1704.05730}, EPRINT = {1704.05730}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.}, }
Endnote
%0 Report %A Pitoura, Evaggelia %A Tsaparas, Panayiotis %A Flouris, Giorgos %A Fundulaki, Irini %A Papadakos, Panagiotis %A Abiteboul, Serge %A Weikum, Gerhard %+ External Organizations External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T On Measuring Bias in Online Information : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8123-4 %U http://arxiv.org/abs/1704.05730 %D 2017 %X Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems. %K Computer Science, Databases, cs.DB,Computer Science, Computers and Society, cs.CY
[100]
K. Popat, “Assessing the Credibility of Claims on the Web,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{PopatWWW2017b, TITLE = {Assessing the Credibility of Claims on the {Web}}, AUTHOR = {Popat, Kashyap}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3053379}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {735--739}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Assessing the Credibility of Claims on the Web : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-90CC-2 %R 10.1145/3041021.3053379 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 735 - 739 %I ACM %@ 978-1-4503-4914-7
[101]
K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum, “Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{PopatWWW2017, TITLE = {Where the Truth Lies: {E}xplaining the Credibility of Emerging Claims on the {W}eb and Social Media}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3055133}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {1003--1012}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %A Mukherjee, Subhabrata %A Strötgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4CD8-D %R 10.1145/3041021.3055133 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 1003 - 1012 %I ACM %@ 978-1-4503-4914-7
[102]
Y. Ran, B. He, K. Hui, J. Xu, and L. Sun, “A Document-Based Neural Relevance Model for Effective Clinical Decision Support,” in 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017), Kansa City, MO, USA, 2017.
Export
BibTeX
@inproceedings{RanBIBM2017, TITLE = {A Document-Based Neural Relevance Model for Effective Clinical Decision Support}, AUTHOR = {Ran, Yanhua and He, Ben and Hui, Kai and Xu, Jungang and Sun, Le}, LANGUAGE = {eng}, ISBN = {978-1-5090-3050-7}, DOI = {10.1109/BIBM.2017.8217757}, PUBLISHER = {IEEE}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017)}, EDITOR = {Hu, Xiaohua and Shyu, Chi-Ren and Bromberg, Yana and Gao, Jean and Gong, Yang and Korkin, Dmitry and Yoo, Illhoi and Zheng, Jane Huiru}, PAGES = {798--804}, ADDRESS = {Kansa City, MO, USA}, }
Endnote
%0 Conference Proceedings %A Ran, Yanhua %A He, Ben %A Hui, Kai %A Xu, Jungang %A Sun, Le %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T A Document-Based Neural Relevance Model for Effective Clinical Decision Support : %G eng %U http://hdl.handle.net/21.11116/0000-0000-EA3D-5 %R 10.1109/BIBM.2017.8217757 %D 2017 %B IEEE International Conference on Bioinformatics and Biomedicine %Z date of event: 2017-11-13 - 2017-11-16 %C Kansa City, MO, USA %B 2017 IEEE International Conference on Bioinformatics and Biomedicine %E Hu, Xiaohua; Shyu, Chi-Ren; Bromberg, Yana; Gao, Jean; Gong, Yang; Korkin, Dmitry; Yoo, Illhoi ; Zheng, Jane Huiru %P 798 - 804 %I IEEE %@ 978-1-5090-3050-7
[103]
S. Razniewski, V. Balaraman, and W. Nutt, “Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties,” in Advanced Data Mining and Applications (ADMA 2017), Singapore, 2017.
Export
BibTeX
@inproceedings{Razniewski_ADMA2017, TITLE = {Doctoral Advisor or Medical Condition: {T}owards Entity-Specific Rankings of Knowledge Base Properties}, AUTHOR = {Razniewski, Simon and Balaraman, Vevake and Nutt, Werner}, LANGUAGE = {eng}, ISBN = {978-3-319-69178-7}, DOI = {10.1007/978-3-319-69179-4_37}, PUBLISHER = {Springer}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Advanced Data Mining and Applications (ADMA 2017)}, EDITOR = {Cong, Gao and Peng, Wen-Chin and Zhang, Wei Emma and Li, Chengliang and Sun, Aixin}, PAGES = {526--540}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {10604}, ADDRESS = {Singapore}, }
Endnote
%0 Conference Proceedings %A Razniewski, Simon %A Balaraman, Vevake %A Nutt, Werner %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-2C05-A %R 10.1007/978-3-319-69179-4_37 %D 2017 %B 13th International Conference on Advanced Data Mining and Applications %Z date of event: 2017-11-05 - 2017-11-06 %C Singapore %B Advanced Data Mining and Applications %E Cong, Gao; Peng, Wen-Chin; Zhang, Wei Emma; Li, Chengliang; Sun, Aixin %P 526 - 540 %I Springer %@ 978-3-319-69178-7 %B Lecture Notes in Artificial Intelligence %N 10604
[104]
N. Reiter, E. Gius, J. Strötgen, and M. Willand, “A Shared Task for a Shared Goal: Systematic Annotation of Literary,” in Digital Humanities 2017 (DH 2017), Montréal, Canada. (Accepted/in press)
Export
BibTeX
@inproceedings{StroetgenDH2017, TITLE = {A Shared Task for a Shared Goal: {S}ystematic Annotation of Literary}, AUTHOR = {Reiter, Nils and Gius, Evelyn and Str{\"o}tgen, Jannik and Willand, Marcus}, LANGUAGE = {eng}, YEAR = {2017}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Digital Humanities 2017 (DH 2017)}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Reiter, Nils %A Gius, Evelyn %A Strötgen, Jannik %A Willand, Marcus %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T A Shared Task for a Shared Goal: Systematic Annotation of Literary : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-7BDC-3 %D 2017 %B Digital Humanities %Z date of event: 2017-08-08 - 2017-08-11 %C Montréal, Canada %B Digital Humanities 2017
[105]
B. Roel, J. Vreeken, and A. Siebes, “Efficiently Discovering Unexpected Pattern-Co-Occurrences,” in Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017), Houston, TX, USA, 2017.
Export
BibTeX
@inproceedings{RoelSDM2017, TITLE = {Efficiently Discovering Unexpected Pattern-Co-Occurrences}, AUTHOR = {Roel, Bertens and Vreeken, Jilles and Siebes, Arno}, LANGUAGE = {eng}, ISBN = {978-1-611974-87-4}, DOI = {10.1137/1.9781611974973.15}, PUBLISHER = {SIAM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Proceedings of the Seventeenth SIAM International Conference on Data Mining (SDM 2017)}, EDITOR = {Chawla, Nitesh and Wang, Wei}, PAGES = {126--134}, ADDRESS = {Houston, TX, USA}, }
Endnote
%0 Conference Proceedings %A Roel, Bertens %A Vreeken, Jilles %A Siebes, Arno %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Efficiently Discovering Unexpected Pattern-Co-Occurrences : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-066E-3 %R 10.1137/1.9781611974973.15 %D 2017 %B 17th SIAM International Conference on Data Mining %Z date of event: 2017-04-27 - 2017-04-29 %C Houston, TX, USA %B Proceedings of the Seventeenth SIAM International Conference on Data Mining %E Chawla, Nitesh; Wang, Wei %P 126 - 134 %I SIAM %@ 978-1-611974-87-4
[106]
A. Rohrbach, A. Torabi, M. Rohrbach, N. Tandon, C. Pal, H. Larochelle, A. Courville, and B. Schiele, “Movie Description,” International Journal of Computer Vision, vol. 123, no. 1, 2017.
Abstract
Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015.
Export
BibTeX
@article{RohrbachMovie, TITLE = {Movie Description}, AUTHOR = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Christopher and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt}, LANGUAGE = {eng}, DOI = {10.1007/s11263-016-0987-1}, PUBLISHER = {Springer}, ADDRESS = {London}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, ABSTRACT = {Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015.}, JOURNAL = {International Journal of Computer Vision}, VOLUME = {123}, NUMBER = {1}, PAGES = {94--120}, }
Endnote
%0 Journal Article %A Rohrbach, Anna %A Torabi, Atousa %A Rohrbach, Marcus %A Tandon, Niket %A Pal, Christopher %A Larochelle, Hugo %A Courville, Aaron %A Schiele, Bernt %+ Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society %T Movie Description : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-FD03-C %R 10.1007/s11263-016-0987-1 %7 2017-01-25 %D 2017 %X Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015. %K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Computation and Language, cs.CL %J International Journal of Computer Vision %O IJCV %V 123 %N 1 %& 94 %P 94 - 120 %I Springer %C London
[107]
V. Setty, A. Anand, A. Mishra, and A. Anand, “Modeling Event Importance for Ranking Daily News Events,” in WSDM’17, 10th ACM International Conference on Web Search and Data Mining, Cambridge, UK, 2017.
Export
BibTeX
@inproceedings{Setii2017, TITLE = {Modeling Event Importance for Ranking Daily News Events}, AUTHOR = {Setty, Vinay and Anand, Abhijit and Mishra, Arunav and Anand, Avishek}, LANGUAGE = {eng}, ISBN = {978-1-4503-4675-7}, DOI = {10.1145/3018661.3018728}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WSDM'17, 10th ACM International Conference on Web Search and Data Mining}, PAGES = {231--240}, ADDRESS = {Cambridge, UK}, }
Endnote
%0 Conference Proceedings %A Setty, Vinay %A Anand, Abhijit %A Mishra, Arunav %A Anand, Avishek %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Modeling Event Importance for Ranking Daily News Events : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-26D5-9 %R 10.1145/3018661.3018728 %D 2017 %B 10th ACM International Conference on Web Search and Data Mining %Z date of event: 2017-02-06 - 2017-02-10 %C Cambridge, UK %B WSDM'17 %P 231 - 240 %I ACM %@ 978-1-4503-4675-7
[108]
D. Seyler, T. Dembelova, L. Del Corro, J. Hoffart, and G. Weikum, “KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition,” 2017. [Online]. Available: http://arxiv.org/abs/1709.03544. (arXiv: 1709.03544)
Abstract
KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.
Export
BibTeX
@online{Seyler_arXiv2017, TITLE = {{KnowNER}: Incremental Multilingual {Knowledge} in {Named Entity Recognition}}, AUTHOR = {Seyler, Dominic and Dembelova, Tatiana and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.03544}, EPRINT = {1709.03544}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.}, }
Endnote
%0 Report %A Seyler, Dominic %A Dembelova, Tatiana %A Del Corro, Luciano %A Hoffart, Johannes %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0693-D %U http://arxiv.org/abs/1709.03544 %D 2017 %X KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them. %K Computer Science, Computation and Language, cs.CL
[109]
D. Seyler, M. Yahya, and K. Berberich, “Knowledge Questions from Knowledge Graphs,” in ICTIR’17, 7th International Conference on the Theory of Information Retrieval, Amsterdam, The Netherlands, 2017.
Export
BibTeX
@inproceedings{SeylerICTIR2017, TITLE = {Knowledge Questions from Knowledge Graphs}, AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4490-6}, DOI = {10.1145/3121050.3121073}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {ICTIR'17, 7th International Conference on the Theory of Information Retrieval}, PAGES = {11--18}, ADDRESS = {Amsterdam, The Netherlands}, }
Endnote
%0 Conference Proceedings %A Seyler, Dominic %A Yahya, Mohamed %A Berberich, Klaus %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Knowledge Questions from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-0647-A %R 10.1145/3121050.3121073 %D 2017 %B 7th International Conference on the Theory of Information Retrieval %Z date of event: 2017-10-01 - 2017-10-04 %C Amsterdam, The Netherlands %B ICTIR'17 %P 11 - 18 %I ACM %@ 978-1-4503-4490-6
[110]
L. Soldaini, A. Yates, and N. Goharian, “Learning to Reformulate Long Queries for Clinical Decision Support,” Journal of the Association for Information Science and Technology, vol. 68, no. 11, 2017.
Export
BibTeX
@article{Soldaini2017, TITLE = {Learning to Reformulate Long Queries for Clinical Decision Support}, AUTHOR = {Soldaini, Luca and Yates, Andrew and Goharian, Nazli}, LANGUAGE = {eng}, ISSN = {2330-1635}, DOI = {10.1002/asi.23924}, PUBLISHER = {Wiley}, ADDRESS = {Chichester, UK}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, JOURNAL = {Journal of the Association for Information Science and Technology}, VOLUME = {68}, NUMBER = {11}, PAGES = {2602--2619}, }
Endnote
%0 Journal Article %A Soldaini, Luca %A Yates, Andrew %A Goharian, Nazli %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Learning to Reformulate Long Queries for Clinical Decision Support : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-2723-C %R 10.1002/asi.23924 %7 2017-09-14 %D 2017 %8 14.09.2017 %J Journal of the Association for Information Science and Technology %O asis&t %V 68 %N 11 %& 2602 %P 2602 - 2619 %I Wiley %C Chichester, UK %@ false
[111]
J. Stoyanovich, B. Howe, S. Abiteboul, G. Miklau, A. Sahuguet, and G. Weikum, “Fides: Towards a Platform for Responsible Data Science,” in 29th International Conference on Scientific and Statistical Database Management (SSDBM 2017), Chicago, IL, USA, 2017.
Export
BibTeX
@inproceedings{StoyanovichSSDBM2017, TITLE = {Fides: {T}owards a Platform for Responsible Data Science}, AUTHOR = {Stoyanovich, Julia and Howe, Bill and Abiteboul, Serge and Miklau, Gerome and Sahuguet, Arnaud and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-5282-6}, DOI = {10.1145/3085504.3085530}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {29th International Conference on Scientific and Statistical Database Management (SSDBM 2017)}, EID = {26}, ADDRESS = {Chicago, IL, USA}, }
Endnote
%0 Conference Proceedings %A Stoyanovich, Julia %A Howe, Bill %A Abiteboul, Serge %A Miklau, Gerome %A Sahuguet, Arnaud %A Weikum, Gerhard %+ External Organizations External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Fides: Towards a Platform for Responsible Data Science : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-80BA-B %R 10.1145/3085504.3085530 %D 2017 %B 29th International Conference on Scientific and Statistical Database Management %Z date of event: 2017-06-27 - 2017-06-29 %C Chicago, IL, USA %B 29th International Conference on Scientific and Statistical Database Management %Z sequence number: 26 %I ACM %@ 978-1-4503-5282-6
[112]
N. Tandon, G. de Melo, and G. Weikum, “WebChild 2.0: Fine-Grained Commonsense Knowledge Distillation,” in The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 2017.
Export
BibTeX
@inproceedings{TandonACL2017, TITLE = {{WebChild} 2.0: {F}ine-Grained Commonsense Knowledge Distillation}, AUTHOR = {Tandon, Niket and de Melo, Gerard and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-76-0}, DOI = {10.18653/v1/P17-4020}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)}, PAGES = {115--120}, ADDRESS = {Vancouver, Canada}, }
Endnote
%0 Conference Proceedings %A Tandon, Niket %A de Melo, Gerard %A Weikum, Gerhard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T WebChild 2.0: Fine-Grained Commonsense Knowledge Distillation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-FAC3-A %R 10.18653/v1/P17-4020 %D 2017 %B The 55th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2017-07-30 - 2017-08-04 %C Vancouver, Canada %B The 55th Annual Meeting of the Association for Computational Linguistics %P 115 - 120 %I ACL %@ 978-1-945626-76-0
[113]
C. Teflioudi and R. Gemulla, “Exact and Approximate Maximum Inner Product Search with LEMP,” ACM Transactions on Database Systems, vol. 42, no. 1, 2017.
Export
BibTeX
@article{Teflioudi:2016:EAM:3015779.2996452, TITLE = {Exact and Approximate Maximum Inner Product Search with {LEMP}}, AUTHOR = {Teflioudi, Christina and Gemulla, Rainer}, LANGUAGE = {eng}, ISSN = {0362-5915}, DOI = {10.1145/2996452}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, JOURNAL = {ACM Transactions on Database Systems}, VOLUME = {42}, NUMBER = {1}, EID = {5}, }
Endnote
%0 Journal Article %A Teflioudi, Christina %A Gemulla, Rainer %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Exact and Approximate Maximum Inner Product Search with LEMP : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-349C-B %R 10.1145/2996452 %7 2016 %D 2017 %J ACM Transactions on Database Systems %O TODS %V 42 %N 1 %Z sequence number: 5 %I ACM %C New York, NY %@ false
[114]
E. N. Toosi, “A New Efficient and Scalable Algorithm for Boolean Matrix Factorization,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{ToosiMsc2017, TITLE = {A New Efficient and Scalable Algorithm for {Boolean} Matrix Factorization}, AUTHOR = {Toosi, Ehsan Nadjaran}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Toosi, Ehsan Nadjaran %Y Miettinen, Pauli %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T A New Efficient and Scalable Algorithm for Boolean Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-90D5-E %I Universität des Saarlandes %C Saarbrücken %D 2017 %P X, 70 p. %V master %9 master
[115]
H. D. Tran, D. Stepanova, M. Gad-Elrab, F. A. Lisi, and G. Weikum, “Towards Nonmonotonic Relational Learning from Knowledge Graphs,” in Inductive Logic Programming (ILP 2016), London, UK, 2017.
Export
BibTeX
@inproceedings{TranILP2016, TITLE = {Towards Nonmonotonic Relational Learning from Knowledge Graphs}, AUTHOR = {Tran, Hai Dang and Stepanova, Daria and Gad-Elrab, Mohamed and Lisi, Francesca A. and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-63341-1}, DOI = {10.1007/978-3-319-63342-8_8}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {Inductive Logic Programming (ILP 2016)}, EDITOR = {Cussens, James and Russo, Alessandra}, PAGES = {94--107}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {10326}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Tran, Hai Dang %A Stepanova, Daria %A Gad-Elrab, Mohamed %A Lisi, Francesca A. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Towards Nonmonotonic Relational Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2DB1-E %R 10.1007/978-3-319-63342-8_8 %D 2017 %B 26th International Conference on Inductive Logic Programming %Z date of event: 2016-09-04 - 2016-09-06 %C London, UK %B Inductive Logic Programming %E Cussens, James; Russo, Alessandra %P 94 - 107 %I Springer %@ 978-3-319-63341-1 %B Lecture Notes in Artificial Intelligence %N 10326
[116]
H. D. Tran, “An Approach to Nonmonotonic Relational Learning from Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{TranMSc2017, TITLE = {An Approach to Nonmonotonic Relational Learning from Knowledge Graphs}, AUTHOR = {Tran, Hai Dang}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Tran, Hai Dang %Y Stepanova, Daria %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Approach to Nonmonotonic Relational Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-845A-3 %I Universität des Saarlandes %C Saarbrücken %D 2017 %P XV, 48 p. %V master %9 master
[117]
G. Weikum, “What Computers Should Know, Shouldn’t Know, and Shouldn’t Believe,” in WWW’17 Companion, Perth, Australia, 2017.
Export
BibTeX
@inproceedings{WeikumWWW2017, TITLE = {What Computers Should Know, Shouldn{\textquoteright}t Know, and Shouldn{\textquoteright}t Believe}, AUTHOR = {Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4914-7}, DOI = {10.1145/3041021.3051120}, PUBLISHER = {ACM}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {WWW'17 Companion}, PAGES = {1559--1560}, ADDRESS = {Perth, Australia}, }
Endnote
%0 Conference Proceedings %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T What Computers Should Know, Shouldn’t Know, and Shouldn’t Believe : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-7DA0-5 %R 10.1145/3041021.3051120 %D 2017 %B 26th International Conference on World Wide Web %Z date of event: 2017-04-03 - 2017-04-07 %C Perth, Australia %B WWW'17 Companion %P 1559 - 1560 %I ACM %@ 978-1-4503-4914-7
[118]
A. Yates, A. Cohan, and N. Goharian, “Depression and Self-Harm Risk Assessment in Online Forums,” 2017. [Online]. Available: http://arxiv.org/abs/1709.01848. (arXiv: 1709.01848)
Abstract
Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.
Export
BibTeX
@online{Yates_arXiv2017b, TITLE = {Depression and Self-Harm Risk Assessment in Online Forums}, AUTHOR = {Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1709.01848}, EPRINT = {1709.01848}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.}, }
Endnote
%0 Report %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Depression and Self-Harm Risk Assessment in Online Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06C8-6 %U http://arxiv.org/abs/1709.01848 %D 2017 %X Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset. %K Computer Science, Computation and Language, cs.CL
[119]
A. Yates, A. Cohan, and N. Goharian, “Depression and Self-Harm Risk Assessment in Online Forums,” in The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), Copenhagen, Denmark, 2017.
Export
BibTeX
@inproceedings{YatesENMLP2017, TITLE = {Depression and Self-Harm Risk Assessment in Online Forums}, AUTHOR = {Yates, Andrew and Cohan, Arman and Goharian, Nazli}, LANGUAGE = {eng}, ISBN = {978-1-945626-83-8}, URL = {https://aclanthology.info/pdf/D/D17/D17-1321.pdf}, PUBLISHER = {ACL}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, BOOKTITLE = {The Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)}, PAGES = {2958--2968}, ADDRESS = {Copenhagen, Denmark}, }
Endnote
%0 Conference Proceedings %A Yates, Andrew %A Cohan, Arman %A Goharian, Nazli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Depression and Self-Harm Risk Assessment in Online Forums : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06A0-D %U https://aclanthology.info/pdf/D/D17/D17-1321.pdf %D 2017 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2017-09-09 - 2017-09-11 %C Copenhagen, Denmark %B The Conference on Empirical Methods in Natural Language Processing %P 2958 - 2968 %I ACL %@ 978-1-945626-83-8 %U https://aclanthology.info/pdf/D/D17/D17-1321.pdf
[120]
A. Yates and K. Hui, “DE-PACRR: Exploring Layers Inside the PACRR Model,” 2017. [Online]. Available: http://arxiv.org/abs/1706.08746. (arXiv: 1706.08746)
Abstract
Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable.
Export
BibTeX
@online{Yates_arXiv2017, TITLE = {{DE}-{PACRR}: Exploring Layers Inside the {PACRR} Model}, AUTHOR = {Yates, Andrew and Hui, Kai}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1706.08746}, EPRINT = {1706.08746}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable.}, }
Endnote
%0 Report %A Yates, Andrew %A Hui, Kai %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T DE-PACRR: Exploring Layers Inside the PACRR Model : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002E-06BE-D %U http://arxiv.org/abs/1706.08746 %D 2017 %X Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval. However, deep models have a reputation for being black boxes, and the roles of a neural IR model's components may not be obvious at first glance. In this work, we attempt to shed light on the inner workings of a recently proposed neural IR model, namely the PACRR model, by visualizing the output of intermediate layers and by investigating the relationship between intermediate weights and the ultimate relevance score produced. We highlight several insights, hoping that such insights will be generally applicable. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[121]
Y. Zhang, M. Humbert, B. Surma, P. Manoharan, J. Vreeken, and M. Backes, “CTRL+Z: Recovering Anonymized Social Graphs,” 2017. [Online]. Available: http://arxiv.org/abs/1711.05441. (arXiv: 1711.05441)
Abstract
Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods.
Export
BibTeX
@online{Zhang1711.05441, TITLE = {{CTRL}+Z: Recovering Anonymized Social Graphs}, AUTHOR = {Zhang, Yang and Humbert, Mathias and Surma, Bartlomiej and Manoharan, Praveen and Vreeken, Jilles and Backes, Michael}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1711.05441}, EPRINT = {1711.05441}, EPRINTTYPE = {arXiv}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods.}, }
Endnote
%0 Report %A Zhang, Yang %A Humbert, Mathias %A Surma, Bartlomiej %A Manoharan, Praveen %A Vreeken, Jilles %A Backes, Michael %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T CTRL+Z: Recovering Anonymized Social Graphs : %G eng %U http://hdl.handle.net/21.11116/0000-0000-6463-0 %U http://arxiv.org/abs/1711.05441 %D 2017 %X Social graphs derived from online social interactions contain a wealth of information that is nowadays extensively used by both industry and academia. However, due to the sensitivity of information contained in such social graphs, they need to be properly anonymized before release. Most of the graph anonymization techniques that have been proposed to sanitize social graph data rely on the perturbation of the original graph's structure, more specifically of its edge set. In this paper, we identify a fundamental weakness of these edge-based anonymization mechanisms and exploit it to recover most of the original graph structure. First, we propose a method to quantify an edge's plausibility in a given graph by relying on graph embedding. Our experiments on three real-life social network datasets under two widely known graph anonymization mechanisms demonstrate that this method can very effectively detect fake edges with AUC values above 0.95 in most cases. Second, by relying on Gaussian mixture models and maximum a posteriori probability estimation, we derive an optimal decision rule to detect whether an edge is fake based on the observed graph data. We further demonstrate that this approach concretely jeopardizes the privacy guarantees provided by the considered graph anonymization mechanisms. To mitigate this vulnerability, we propose a method to generate fake edges as plausible as possible given the graph structure and incorporate it into the existing anonymization mechanisms. Our evaluation demonstrates that the enhanced mechanisms not only decrease the chances of graph recovery (with AUC dropping by up to 35%), but also provide even better graph utility than existing anonymization methods. %K Computer Science, Cryptography and Security, cs.CR,cs.SI
[122]
D. Ziegler, A. Abujabal, R. S. Roy, and G. Weikum, “Efficiency-aware Answering of Compositional Questions using Answer Type Prediction,” in The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, 2017.
Export
BibTeX
@inproceedings{ZieglerIJCNLP2017, TITLE = {Efficiency-aware Answering of Compositional Questions using Answer Type Prediction}, AUTHOR = {Ziegler, David and Abujabal, Abdalghani and Roy, Rishiraj Saha and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-948087-01-8}, URL = {http://aclweb.org/anthology/I17-2038}, PUBLISHER = {Asian Federation of Natural Language Processing}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)}, PAGES = {222--227}, ADDRESS = {Taipei, Taiwan}, }
Endnote
%0 Conference Proceedings %A Ziegler, David %A Abujabal, Abdalghani %A Roy, Rishiraj Saha %A Weikum, Gerhard %+ Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Efficiency-aware Answering of Compositional Questions using Answer Type Prediction : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B5F-5 %U http://aclweb.org/anthology/I17-2038 %D 2017 %B 8th International Joint Conference on Natural Language Processing %Z date of event: 2017-11-27 - 2017-12-01 %C Taipei, Taiwan %B The 8th International Joint Conference on Natural Language Processing %P 222 - 227 %I Asian Federation of Natural Language Processing %@ 978-1-948087-01-8
[123]
D. Ziegler, “Answer Type Prediction for Question Answering over Knowledge Bases,” Universität des Saarlandes, Saarbrücken, 2017.
Export
BibTeX
@mastersthesis{ZieglerMSc2017, TITLE = {Answer Type Prediction for Question Answering over Knowledge Bases}, AUTHOR = {Ziegler, David}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2017}, MARGINALMARK = {$\bullet$}, DATE = {2017}, }
Endnote
%0 Thesis %A Ziegler, David %Y Abujabal, Abdalghani %A referee: Roy, Rishiraj Saha %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Answer Type Prediction for Question Answering over Knowledge Bases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002D-8F38-A %I Universität des Saarlandes %C Saarbrücken %D 2017 %P X, 48 p. %V master %9 master
2016
[124]
S. Abiteboul, G. Miklau, J. Stoyanovich, and G. Weikum, Eds., Data, Responsibly, no. 7. Schloss Dagstuhl, 2016.
Export
BibTeX
@proceedings{AbiteboulDagstuhl2016, TITLE = {Data, Responsibly (Dagstuhl Seminar 16291)}, EDITOR = {Abiteboul, Serge and Miklau, Gerome and Stoyanovich, Julia and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {2192-5283}, URL = {urn:nbn:de:0030-drops-67644}, DOI = {10.4230/DagRep.6.7.42}, PUBLISHER = {Schloss Dagstuhl}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, PAGES = {30 p.}, SERIES = {Dagstuhl Reports}, VOLUME = {6}, ISSUE = {7}, ADDRESS = {Wadern, Germany}, }
Endnote
%0 Conference Proceedings %E Abiteboul, Serge %E Miklau, Gerome %E Stoyanovich, Julia %E Weikum, Gerhard %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Data, Responsibly : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-500A-2 %R 10.4230/DagRep.6.7.42 %U urn:nbn:de:0030-drops-67644 %I Schloss Dagstuhl %D 2016 %B Dagstuhl Seminar 16291 "Data, Responsibly" %Z date of event: 2016-07-17 - 2016-07-22 %D 2016 %C Wadern, Germany %P 30 p. %K Data responsibly, Big data, Machine bias, Data analysis, Data management, Data mining, Fairness, Diversity, Accountability, Transparency, Personal %S Dagstuhl Reports %V 6 %P 42 - 71 %@ false %U http://drops.dagstuhl.de/doku/urheberrecht1.htmlhttp://drops.dagstuhl.de/opus/volltexte/2016/6764/
[125]
K. Athukorala, D. Głowack, G. Jacucci, A. Oulasvirta, and J. Vreeken, “Is Exploratory Search Different? A Comparison of Information Search Behavior for Exploratory and Lookup Tasks,” Journal of the Association for Information Science and Technology, vol. 67, no. 11, 2016.
Export
BibTeX
@article{VreekenSearch2015, TITLE = {Is Exploratory Search Different? {A} Comparison of Information Search Behavior for Exploratory and Lookup Tasks}, AUTHOR = {Athukorala, Kumaripaba and G{\l}owack, Dorota and Jacucci, Giulio and Oulasvirta, Antti and Vreeken, Jilles}, LANGUAGE = {eng}, ISSN = {2330-1643}, DOI = {10.1002/asi.23617}, PUBLISHER = {Wiley}, ADDRESS = {Chichester}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, JOURNAL = {Journal of the Association for Information Science and Technology}, VOLUME = {67}, NUMBER = {11}, PAGES = {2635--2651}, }
Endnote
%0 Journal Article %A Athukorala, Kumaripaba %A Głowack, Dorota %A Jacucci, Giulio %A Oulasvirta, Antti %A Vreeken, Jilles %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Is Exploratory Search Different? A Comparison of Information Search Behavior for Exploratory and Lookup Tasks : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0028-E6A7-D %R 10.1002/asi.23617 %7 2015-10-22 %D 2016 %J Journal of the Association for Information Science and Technology %V 67 %N 11 %& 2635 %P 2635 - 2651 %I Wiley %C Chichester %@ false
[126]
A. H. Baradaranshahroudi, “Fast Computation of Highest Correlated Segments in Multivariate Time-Series,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{BaradaranshahroudiMSc2016, TITLE = {Fast Computation of Highest Correlated Segments in Multivariate Time-Series}, AUTHOR = {Baradaranshahroudi, Amir Hossein}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Baradaranshahroudi, Amir Hossein %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Fast Computation of Highest Correlated Segments in Multivariate Time-Series : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5FB1-1 %I Universität des Saarlandes %C Saarbrücken %D 2016 %V master %9 master
[127]
B. Berendt, B. Bringmann, E. Fromont, G. Garriga, P. Miettinen, N. Tatti, and V. Tresp, Eds., Machine Learning and Knowledge Discovery in Databases. Springer, 2016.
Export
BibTeX
@proceedings{ProceedingsECML2016III, TITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)}, EDITOR = {Berendt, Bettina and Bringmann, Bj{\"o}rn and Fromont, Elisa and Garriga, Gemma and Miettinen, Pauli and Tatti, Nikolai and Tresp, Volker}, LANGUAGE = {eng}, ISBN = {978-3-319-46130-4}, DOI = {10.1007/978-3-319-46131-1}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, PAGES = {XXII, 307 p.}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {9853}, ADDRESS = {Riva del Garda, Italy}, }
Endnote
%0 Conference Proceedings %E Berendt, Bettina %E Bringmann, Björn %E Fromont, Elisa %E Garriga, Gemma %E Miettinen, Pauli %E Tatti, Nikolai %E Tresp, Volker %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Machine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2016 ; Riva del Garda, Italy, September 19-23, 2016 ; Proceedings, Part III %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A68E-5 %R 10.1007/978-3-319-46131-1 %@ 978-3-319-46130-4 %I Springer %D 2016 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2016-09-19 - 2016-09-23 %D 2016 %C Riva del Garda, Italy %P XXII, 307 p. %S Lecture Notes in Artificial Intelligence %V 9853
[128]
R. Bertens, J. Vreeken, and A. Siebes, “Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns,” in KDD’16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016.
Export
BibTeX
@inproceedings{BertensKDD2016, TITLE = {Keeping it Short and Simple: {S}ummarising Complex Event Sequences with Multivariate Patterns}, AUTHOR = {Bertens, Roel and Vreeken, Jilles and Siebes, Arno}, LANGUAGE = {eng}, ISBN = {978-1-4503-4232-2}, DOI = {10.1145/2939672.2939761}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {KDD'16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, PAGES = {735--744}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Bertens, Roel %A Vreeken, Jilles %A Siebes, Arno %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A92D-B %R 10.1145/2939672.2939761 %D 2016 %B 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining %Z date of event: 2016-08-13 - 2016-08-17 %C San Francisco, CA, USA %B KDD'16 %P 735 - 744 %I ACM %@ 978-1-4503-4232-2
[129]
A. Bhattacharyya, “Squish: Efficiently Summarising Sequences with Rich and Interleaving Patterns,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{BhattacharyyaMSc2016, TITLE = {Squish: Efficiently Summarising Sequences with Rich and Interleaving Patterns}, AUTHOR = {Bhattacharyya, Apratim}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Bhattacharyya, Apratim %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Squish: Efficiently Summarising Sequences with Rich and Interleaving Patterns : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5F37-4 %I Universität des Saarlandes %C Saarbrücken %D 2016 %V master %9 master
[130]
J. A. Biega, K. P. Gummadi, I. Mele, D. Milchevski, C. Tryfonopoulos, and G. Weikum, “R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities,” in SIGIR’16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 2016.
Export
BibTeX
@inproceedings{BiegaSIGIR2016, TITLE = {R-Susceptibility: {A}n {IR}-Centric Approach to Assessing Privacy Risks for Users in Online Communities}, AUTHOR = {Biega, Joanna Asia and Gummadi, Krishna P. and Mele, Ida and Milchevski, Dragan and Tryfonopoulos, Christos and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4069-4}, DOI = {10.1145/2911451.2911533}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {SIGIR'16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {365--374}, ADDRESS = {Pisa, Italy}, }
Endnote
%0 Conference Proceedings %A Biega, Joanna Asia %A Gummadi, Krishna P. %A Mele, Ida %A Milchevski, Dragan %A Tryfonopoulos, Christos %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A921-3 %R 10.1145/2911451.2911533 %D 2016 %B 39th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2016-07-17 - 2016-07-21 %C Pisa, Italy %B SIGIR'16 %P 365 - 374 %I ACM %@ 978-1-4503-4069-4
[131]
T. Bögel, E. Gius, J. Jacke, and J. Strötgen, “From Order to Order Switch: Mediating between Complexity and Reproducibility in the Context of Automated Literary Annotation,” in Digital Humanities 2016 (DH 2016), Krakow, Poland, 2016.
Export
BibTeX
@inproceedings{BoegelDH2016, TITLE = {From Order to Order Switch: {M}ediating between Complexity and Reproducibility in the Context of Automated Literary Annotation}, AUTHOR = {B{\"o}gel, Thomas and Gius, Evelyn and Jacke, Janina and Str{\"o}tgen, Jannik}, LANGUAGE = {eng}, URL = {http://dh2016.adho.org/abstracts/275}, PUBLISHER = {Jagiellonian University \& Pedagogical University}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Digital Humanities 2016 (DH 2016)}, PAGES = {379--382}, ADDRESS = {Krakow, Poland}, }
Endnote
%0 Conference Proceedings %A Bögel, Thomas %A Gius, Evelyn %A Jacke, Janina %A Strötgen, Jannik %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T From Order to Order Switch: Mediating between Complexity and Reproducibility in the Context of Automated Literary Annotation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-0E96-0 %D 2016 %B Digital Humanities %Z date of event: 2016-07-11 - 2016-07-16 %C Krakow, Poland %B Digital Humanities 2016 %P 379 - 382 %I Jagiellonian University & Pedagogical University %U http://dh2016.adho.org/abstracts/275
[132]
N. Boldyrev, M. Spaniol, and G. Weikum, “ACROSS: A Framework for Multi-Cultural Interlinking of Web Taxonomies,” in WebSci’16, ACM Web Science Conference, Hannover, Germany, 2016.
Export
BibTeX
@inproceedings{BoldryevWebSci2016, TITLE = {{ACROSS}: {A} Framework for Multi-Cultural Interlinking of {W}eb Taxonomies}, AUTHOR = {Boldyrev, Natalia and Spaniol, Marc and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4208-7}, DOI = {10.1145/2908131.2908164}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WebSci'16, ACM Web Science Conference}, PAGES = {127--136}, ADDRESS = {Hannover, Germany}, }
Endnote
%0 Conference Proceedings %A Boldyrev, Natalia %A Spaniol, Marc %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ACROSS: A Framework for Multi-Cultural Interlinking of Web Taxonomies : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-01B6-E %R 10.1145/2908131.2908164 %D 2016 %B ACM Web Science Conference %Z date of event: 2016-05-22 - 2016-05-25 %C Hannover, Germany %B WebSci'16 %P 127 - 136 %I ACM %@ 978-1-4503-4208-7
[133]
P. Chau, J. Vreeken, M. van Leeuwen, D. Shahaf, and C. Faloutsos, Eds., Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics. IDEA’16, 2016.
Export
BibTeX
@proceedings{ChauIDEA2016, TITLE = {Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics (IDEA 2016)}, EDITOR = {Chau, Polo and Vreeken, Jilles and van Leeuwen, Matthijs and Shahaf, Dafna and Faloutsos, Christons}, LANGUAGE = {eng}, PUBLISHER = {IDEA'16}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, PAGES = {137 p.}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %E Chau, Polo %E Vreeken, Jilles %E van Leeuwen, Matthijs %E Shahaf, Dafna %E Faloutsos , Christons %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2439-4 %I IDEA'16 %D 2016 %B ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics %Z date of event: 2016-08-14 - 2016-08-14 %D 2016 %C San Francisco, CA, USA %P 137 p.
[134]
J. Chen, N. Tandon, C. D. Hariman, and G. de Melo, “WebBrain: Joint Neural Learning of Large-Scale Commonsense Knowledge,” in The Semantic Web -- ISWC 2016, Kobe, Japan, 2016.
Export
BibTeX
@inproceedings{ChenISWC2016, TITLE = {{WebBrain}: {J}oint Neural Learning of Large-Scale Commonsense Knowledge}, AUTHOR = {Chen, Jiaqiang and Tandon, Niket and Hariman, Charles Darwis and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-3-319-46522-7}, DOI = {10.1007/978-3-319-46523-4_7}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {The Semantic Web -- ISWC 2016}, EDITOR = {Groth, Paul and Simperl, Elena and Gray, Alasdair and Sabou, Marta and Kr{\"o}tzsch, Markus and Lecue, Freddy and Fl{\"o}ck, Fabian and Gil, Yolanda}, PAGES = {102--118}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9981}, ADDRESS = {Kobe, Japan}, }
Endnote
%0 Conference Proceedings %A Chen, Jiaqiang %A Tandon, Niket %A Hariman, Charles Darwis %A de Melo, Gerard %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T WebBrain: Joint Neural Learning of Large-Scale Commonsense Knowledge : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2DED-A %R 10.1007/978-3-319-46523-4_7 %D 2016 %B 15th International Semantic Web Conference %Z date of event: 2016-10-17 - 2016-10-21 %C Kobe, Japan %B The Semantic Web -- ISWC 2016 %E Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Krötzsch, Markus; Lecue, Freddy; Flöck, Fabian; Gil, Yolanda %P 102 - 118 %I Springer %@ 978-3-319-46522-7 %B Lecture Notes in Computer Science %N 9981
[135]
C. X. Chu, “Mining How-to Task Knowledge from Online Communities,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{ChuMSc2016, TITLE = {Mining How-to Task Knowledge from Online Communities}, AUTHOR = {Chu, Cuong Xuan}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Chu, Cuong Xuan %Y Weikum, Gerhard %A referee: Vreeken, Jilles %A referee: Tandon, Niket %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Mining How-to Task Knowledge from Online Communities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-491D-B %I Universität des Saarlandes %C Saarbrücken %D 2016 %P 66 p. %V master %9 master
[136]
L. Del Corro, “Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@phdthesis{delcorrophd15, TITLE = {Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text}, AUTHOR = {Del Corro, Luciano}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Del Corro, Luciano %Y Gemulla, Rainer %A referee: Ponzetto, Simone Paolo %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-B3DB-3 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P xiv, 101 p. %V phd %9 phd %U http://scidok.sulb.uni-saarland.de/volltexte/2016/6346/http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de
[137]
G. de Melo and N. Tandon, “Seeing is Believing: The Quest for Multimodal Knowledge,” ACM SIGWEB Newsletter, no. Spring, 2016.
Export
BibTeX
@article{DemeloTandon:SIGWEB2016, TITLE = {Seeing is Believing: {T}he Quest for Multimodal Knowledge}, AUTHOR = {de Melo, Gerard and Tandon, Niket}, LANGUAGE = {eng}, DOI = {10.1145/2903513.2903517}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, JOURNAL = {ACM SIGWEB Newsletter}, NUMBER = {Spring}, EID = {4}, }
Endnote
%0 Journal Article %A de Melo, Gerard %A Tandon, Niket %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Seeing is Believing: The Quest for Multimodal Knowledge : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-54BB-3 %R 10.1145/2903513.2903517 %7 2016 %D 2016 %J ACM SIGWEB Newsletter %N Spring %Z sequence number: 4 %I ACM %C New York, NY
[138]
L. Derczynski, J. Strötgen, D. Maynard, M. A. Greenwood, and M. Jung, “GATE-Time: Extraction of Temporal Expressions and Event,” in Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 2016.
Export
BibTeX
@inproceedings{DERCZYNSKI16.915, TITLE = {{GATE}-Time: {E}xtraction of Temporal Expressions and Event}, AUTHOR = {Derczynski, Leon and Str{\"o}tgen, Jannik and Maynard, Diana and Greenwood, Mark A. and Jung, Manuel}, LANGUAGE = {eng}, ISBN = {978-2-9517408-9-1}, PUBLISHER = {European Language Resources Association (ELRA)}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Tenth International Conference on Language Resources and Evaluation (LREC 2016)}, EDITOR = {Calzolari, Nicoletta and Choukri, Khalid and Declerck, Thierry and Goggi, Sara and Grobelnik, Marko and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asunci{\'o}n and Odijk, Jan and Piperidis, Stelios}, PAGES = {3702--3708}, EID = {915}, ADDRESS = {Portoro{\v z}, Slovenia}, }
Endnote
%0 Conference Proceedings %A Derczynski, Leon %A Strötgen, Jannik %A Maynard, Diana %A Greenwood, Mark A. %A Jung, Manuel %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T GATE-Time: Extraction of Temporal Expressions and Event : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-4139-8 %D 2016 %B 10th Language Resources and Evaluation Conference %Z date of event: 2016-05-23 - 2016-05-28 %C Portorož, Slovenia %B Tenth International Conference on Language Resources and Evaluation %E Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Goggi, Sara; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Moreno, Asunción; Odijk, Jan; Piperidis, Stelios %P 3702 - 3708 %Z sequence number: 915 %I European Language Resources Association (ELRA) %@ 978-2-9517408-9-1
[139]
H. Dombrowski, “Boolean Tensor Decomposition based on the Walk’n'Merge Algorithm,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{DombrowskiMaster2016, TITLE = {Boolean Tensor Decomposition based on the Walk'n'Merge Algorithm}, AUTHOR = {Dombrowski, Helge}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Dombrowski, Helge %Y Miettinen, Pauli %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Boolean Tensor Decomposition based on the Walk'n'Merge Algorithm : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2280-1 %I Universität des Saarlandes %C Saarbrücken %D 2016 %V master %9 master
[140]
X. Du, O. Emebo, A. Varde, N. Tandon, S. N. Chowdhury, and G. Weikum, “Air Quality Assessment from Social Media and Structured Data: Pollutants and Health Impacts in Urban Planning,” in Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW 2016), Helsinki, Finland, 2016.
Export
BibTeX
@inproceedings{DuICDEW2016, TITLE = {Air Quality Assessment from Social Media and Structured Data: {P}ollutants and Health Impacts in Urban Planning}, AUTHOR = {Du, Xu and Emebo, Onyeka and Varde, Aparna and Tandon, Niket and Chowdhury, Sreyasi Nag and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.1109/ICDEW.2016.7495616}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW 2016)}, PAGES = {54--59}, ADDRESS = {Helsinki, Finland}, }
Endnote
%0 Conference Proceedings %A Du, Xu %A Emebo, Onyeka %A Varde, Aparna %A Tandon, Niket %A Chowdhury, Sreyasi Nag %A Weikum, Gerhard %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Air Quality Assessment from Social Media and Structured Data: Pollutants and Health Impacts in Urban Planning : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-01AE-2 %R 10.1109/ICDEW.2016.7495616 %D 2016 %B IEEE 32nd International Conference on Data Engineering Workshops %Z date of event: 2016-05-16 - 2016-05-20 %C Helsinki, Finland %B Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering Workshops %P 54 - 59 %I IEEE
[141]
P. Ernst, A. Siu, D. Milchevski, J. Hoffart, and G. Weikum, “DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences,” in Proceedings of ACL-2016 System Demonstrations, Berlin, Germany, 2016.
Export
BibTeX
@inproceedings{ernst-EtAl:2016:P16-4, TITLE = {{DeepLife:} {A}n Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences}, AUTHOR = {Ernst, Patrick and Siu, Amy and Milchevski, Dragan and Hoffart, Johannes and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-0}, DOI = {10.18653/v1/P16-4004}, PUBLISHER = {ACL}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of ACL-2016 System Demonstrations}, EDITOR = {Pradhan, Sameer and Apidianaki, Marianna}, PAGES = {19--24}, ADDRESS = {Berlin, Germany}, }
Endnote
%0 Conference Proceedings %A Ernst, Patrick %A Siu, Amy %A Milchevski, Dragan %A Hoffart, Johannes %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-24CA-F %R 10.18653/v1/P16-4004 %D 2016 %B The 54th Annual Meeting of the Association for Computational Linguistics %Z date of event: 2016-08-07 - 2016-08-12 %C Berlin, Germany %B Proceedings of ACL-2016 System Demonstrations %E Pradhan, Sameer; Apidianaki, Marianna %P 19 - 24 %I ACL %@ 978-1-945626-0
[142]
P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken, Eds., Machine Learning and Knowledge Discovery in Databases. Springer, 2016.
Export
BibTeX
@proceedings{ProceedingsECML2016I, TITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)}, EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-3-319-46127-4}, DOI = {10.1007/978-3-319-46128-1}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, PAGES = {XXXVI, 817 p.}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {9851}, ADDRESS = {Riva del Garda, Italy}, }
Endnote
%0 Conference Proceedings %E Frasconi, Paolo %E Landwehr, Niels %E Manco, Guiseppe %E Vreeken, Jilles %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Machine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2016 ; Riva del Garda, Italy, September 19-23, 2016 ; Proceedings, Part I %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A68A-D %R 10.1007/978-3-319-46128-1 %@ 978-3-319-46127-4 %I Springer %D 2016 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2016-09-19 - 2016-09-23 %D 2016 %C Riva del Garda, Italy %P XXXVI, 817 p. %S Lecture Notes in Artificial Intelligence %V 9851
[143]
P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken, Eds., Machine Learning and Knowledge Discovery in Databases. Springer, 2016.
Export
BibTeX
@proceedings{ProceedingsECML2016, TITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)}, EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles}, LANGUAGE = {eng}, DOI = {10.1007/978-3-319-46227-1}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, PAGES = {XXVIII, 825 p.}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {9852}, ADDRESS = {Riva del Garda, Italy}, }
Endnote
%0 Conference Proceedings %E Frasconi, Paolo %E Landwehr, Niels %E Manco, Guiseppe %E Vreeken, Jilles %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Machine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2016 ; Riva del Garda, Italy, September 19-23, 2016 ; Proceedings, Part II %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A688-2 %R 10.1007/978-3-319-46227-1 %I Springer %D 2016 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2016-09-19 - 2016-09-23 %D 2016 %C Riva del Garda, Italy %P XXVIII, 825 p. %S Lecture Notes in Artificial Intelligence %V 9852
[144]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “Exception-Enriched Rule Learning from Knowledge Graphs,” in KI 2016: Advances in Artificial Intelligence, Klagenfurt, Austria, 2016.
Export
BibTeX
@inproceedings{Gad-ElrabKI2016, TITLE = {Exception-Enriched Rule Learning from Knowledge Graphs}, AUTHOR = {Gad-Elrab, Mohamed H. and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-46072-7}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {KI 2016: Advances in Artificial Intelligence}, EDITOR = {Friedrich, Gerhard and Helmert, Malte and Wotawa, Franz}, PAGES = {211--217}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {9904}, ADDRESS = {Klagenfurt, Austria}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed H. %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exception-Enriched Rule Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-22E9-A %D 2016 %B 39th Annual German Conference on AI %Z date of event: 2016-09-26 - 2016-09-30 %C Klagenfurt, Austria %B KI 2016: Advances in Artificial Intelligence %E Friedrich, Gerhard; Helmert, Malte; Wotawa, Franz %P 211 - 217 %I Springer %@ 978-3-319-46072-7 %B Lecture Notes in Artificial Intelligence %N 9904
[145]
M. H. Gad-Elrab, D. Stepanova, J. Urbani, and G. Weikum, “Exception-Enriched Rule Learning from Knowledge Graphs,” in The Semantic Web -- ISWC 2016, Kobe, Japan, 2016.
Export
BibTeX
@inproceedings{Gad-ElrabISWC2016, TITLE = {Exception-Enriched Rule Learning from Knowledge Graphs}, AUTHOR = {Gad-Elrab, Mohamed H. and Stepanova, Daria and Urbani, Jacopo and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-46522-7}, DOI = {10.1007/978-3-319-46523-4_15}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {The Semantic Web -- ISWC 2016}, EDITOR = {Groth, Paul and Simperl, Elena and Gray, Alasdair and Sabou, Marta and Kr{\"o}tzsch, Markus and Lecue, Freddy and Fl{\"o}ck, Fabian and Gil, Yolanda}, PAGES = {234--251}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9981}, ADDRESS = {Kobe, Japan}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed H. %A Stepanova, Daria %A Urbani, Jacopo %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exception-Enriched Rule Learning from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A91F-B %R 10.1007/978-3-319-46523-4_15 %D 2016 %B 15th International Semantic Web Conference %Z date of event: 2016-10-17 - 2016-10-21 %C Kobe, Japan %B The Semantic Web -- ISWC 2016 %E Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Krötzsch, Markus; Lecue, Freddy; Flöck, Fabian; Gil, Yolanda %P 234 - 251 %I Springer %@ 978-3-319-46522-7 %B Lecture Notes in Computer Science %N 9981
[146]
M. Gandhi, “Towards Summarising Large Transaction Databases,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{GandhiMSc2016, TITLE = {Towards Summarising Large Transaction Databases}, AUTHOR = {Gandhi, Manan}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Gandhi, Manan %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Towards Summarising Large Transaction Databases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5F61-4 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P X, 55 p. %V master %9 master
[147]
K. Grosse, “An Approach for Ontological Pattern-based Summarization,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{GrosseMSc2016, TITLE = {An Approach for Ontological Pattern-based Summarization}, AUTHOR = {Grosse, Kathrin}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Grosse, Kathrin %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Approach for Ontological Pattern-based Summarization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5F5F-C %I Universität des Saarlandes %C Saarbrücken %D 2016 %P X, 84 p. %V master %9 master
[148]
A. Grycner and G. Weikum, “POLY: Mining Relational Paraphrases from Multilingual Sentences,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), Austin, TX, USA, 2016.
Export
BibTeX
@inproceedings{GrycnerENMLP2016, TITLE = {{POLY}: {M}ining Relational Paraphrases from Multilingual Sentences}, AUTHOR = {Grycner, Adam and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-25-8}, URL = {https://aclweb.org/anthology/D16-1236}, PUBLISHER = {ACL}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016)}, PAGES = {2183--2192}, ADDRESS = {Austin, TX, USA}, }
Endnote
%0 Conference Proceedings %A Grycner, Adam %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T POLY: Mining Relational Paraphrases from Multilingual Sentences : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-158D-0 %U https://aclweb.org/anthology/D16-1236 %D 2016 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2016-11-01 - 2016-11-05 %C Austin, TX, USA %B Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing %P 2183 - 2192 %I ACL %@ 978-1-945626-25-8
[149]
D. Gupta, “Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search and Analytics,” 2016. [Online]. Available: http://arxiv.org/abs/1603.00260. (arXiv: 1603.00260)
Abstract
In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; and iii. event analytics.
Export
BibTeX
@online{Gupta1603.00260, TITLE = {Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search and Analytics}, AUTHOR = {Gupta, Dhruv}, URL = {http://arxiv.org/abs/1603.00260}, DOI = {10.1145/2835776.2855083}, EPRINT = {1603.00260}, EPRINTTYPE = {arXiv}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; and iii. event analytics.}, }
Endnote
%0 Report %A Gupta, Dhruv %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search and Analytics : %U http://hdl.handle.net/11858/00-001M-0000-002C-2224-4 %R 10.1145/2835776.2855083 %U http://arxiv.org/abs/1603.00260 %D 2016 %X In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; and iii. event analytics. %K Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL
[150]
D. Gupta, “Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics,” in WSDM’16, 9th ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, 2016.
Export
BibTeX
@inproceedings{GuptaWSDM2016, TITLE = {Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search \& Analytics}, AUTHOR = {Gupta, Dhruv}, LANGUAGE = {eng}, ISBN = {978-1-4503-3716-8}, DOI = {10.1145/2835776.2855083}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WSDM'16, 9th ACM International Conference on Web Search and Data Mining}, PAGES = {705--705}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-7526-7 %R 10.1145/2835776.2855083 %D 2016 %B 9th ACM International Conference on Web Search and Data Mining %Z date of event: 2016-02-22 - 2016-02-25 %C San Francisco, CA, USA %B WSDM'16 %P 705 - 705 %I ACM %@ 978-1-4503-3716-8
[151]
D. Gupta and K. Berberich, “Diversifying Search Results Using Time: An Information Retrieval Method for Historians,” in Advances in Information Retrieval (ECIR 2016), Padova, Italy, 2016.
Export
BibTeX
@inproceedings{GuptaECIR2016, TITLE = {Diversifying Search Results Using Time: An Information Retrieval Method for Historians}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-30670-4}, DOI = {10.1007/978-3-319-30671-1_69}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2016)}, EDITOR = {Ferro, Nicola and Crestani, Fabio and Moens, Marie-Francine and Mothe, Josiane and Silvestre, Fabrizio and Di Nunzio, Giorgio Maria and Hauff, Claudia and Silvello, Gianmaria}, PAGES = {789--795}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9626}, ADDRESS = {Padova, Italy}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Diversifying Search Results Using Time: An Information Retrieval Method for Historians : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-7514-F %R 10.1007/978-3-319-30671-1_69 %D 2016 %B 38th European Conference on Information Retrieval %Z date of event: 2016-03-20 - 2016-03-23 %C Padova, Italy %B Advances in Information Retrieval %E Ferro, Nicola; Crestani, Fabio; Moens, Marie-Francine; Mothe, Josiane; Silvestre, Fabrizio; Di Nunzio, Giorgio Maria; Hauff, Claudia; Silvello, Gianmaria %P 789 - 795 %I Springer %@ 978-3-319-30670-4 %B Lecture Notes in Computer Science %N 9626
[152]
D. Gupta and K. Berberich, “A Probabilistic Framework for Time-Sensitive Search,” in Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan, 2016.
Export
BibTeX
@inproceedings{GuptaNTCIR12, TITLE = {A Probabilistic Framework for Time-Sensitive Search}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-4-86049-071-3}, URL = {http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/NTCIR/toc_ntcir.html}, PUBLISHER = {National Institute of Informatics}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies}, DEBUG = {author: Yamamoto, Shuhei}, EDITOR = {Kando, Noriko and Kishida, Kazuaki and Kato, Makoto P.}, PAGES = {225--232}, ADDRESS = {Tokyo, Japan}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T A Probabilistic Framework for Time-Sensitive Search : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2238-7 %U http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/NTCIR/toc_ntcir.html %D 2016 %B 12th NTCIR Conference on Evaluation of Information Access Technologies %Z date of event: 2016-06-07 - 2016-06-10 %C Tokyo, Japan %B Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies %E Kando, Noriko; Kishida, Kazuaki; Kato, Makoto P.; Yamamoto, Shuhei %P 225 - 232 %I National Institute of Informatics %@ 978-4-86049-071-3
[153]
D. Gupta and K. Berberich, “Diversifying Search Results Using Time,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2016-5-001, 2016.
Abstract
Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore longitudinal document collections by querying for entities or events without knowing associated important dates apriori. In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present search results diversified along time.
Export
BibTeX
@techreport{GuptaReport2016-5-001, TITLE = {Diversifying Search Results Using Time}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISSN = {0946-011X}, NUMBER = {MPI-I-2016-5-001}, INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore longitudinal document collections by querying for entities or events without knowing associated important dates apriori. In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present search results diversified along time.}, TYPE = {Research Report}, }
Endnote
%0 Report %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Diversifying Search Results Using Time : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-0AA4-C %Y Max-Planck-Institut für Informatik %C Saarbrücken %D 2016 %P 51 p. %X Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore longitudinal document collections by querying for entities or events without knowing associated important dates apriori. In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present search results diversified along time. %B Research Report %@ false
[154]
D. Gupta, J. Strötgen, and K. Berberich, “DIGITALHISTORIAN: Search & Analytics Using Annotations,” in HistoInformatics 2016, The 3rd HistoInformatics Workshop on Computational History, Krakow, Poland, 2016.
Export
BibTeX
@inproceedings{Gupta, TITLE = {{DIGITALHISTORIAN}: {S}earch \& Analytics Using Annotations}, AUTHOR = {Gupta, Dhruv and Str{\"o}tgen, Jannik and Berberich, Klaus}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-1632-7}, PUBLISHER = {CEUR-WS.org}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {HistoInformatics 2016, The 3rd HistoInformatics Workshop on Computational History}, EDITOR = {D{\"u}ring, Marten and Jatowt, Adam and Preiser-Kappeller, Johannes and van Den Bosch, Antal}, PAGES = {5--10}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {1632}, ADDRESS = {Krakow, Poland}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Strötgen, Jannik %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T DIGITALHISTORIAN: Search & Analytics Using Annotations : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-0885-2 %D 2016 %B The 3rd HistoInformatics Workshop on Computational History %Z date of event: 2016-07-11 - 2016-07-11 %C Krakow, Poland %B HistoInformatics 2016 %E Düring, Marten; Jatowt, Adam; Preiser-Kappeller, Johannes; van Den Bosch, Antal %P 5 - 10 %I CEUR-WS.org %B CEUR Workshop Proceedings %N 1632 %@ false %U http://ceur-ws.org/Vol-1632/paper_1.pdf
[155]
D. Gupta, J. Strötgen, and K. Berberich, “EventMiner: Mining Events from Annotated Documents,” in ICTIR’2016, ACM International Conference on the Theory of Information Retrieval, Newark, DE, USA, 2016.
Export
BibTeX
@inproceedings{GuptaICTIR2016, TITLE = {{EventMiner}: {M}ining Events from Annotated Documents}, AUTHOR = {Gupta, Dhruv and Str{\"o}tgen, Jannik and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4497-5}, DOI = {10.1145/2970398.2970411}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {ICTIR'2016, ACM International Conference on the Theory of Information Retrieval}, PAGES = {261--270}, ADDRESS = {Newark, DE, USA}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Strötgen, Jannik %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T EventMiner: Mining Events from Annotated Documents : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-B262-0 %R 10.1145/2970398.2970411 %D 2016 %B ACM International Conference on the Theory of Information Retrieval %Z date of event: 2016-09-12 - 2016-09-16 %C Newark, DE, USA %B ICTIR'2016 %P 261 - 270 %I ACM %@ 978-1-4503-4497-5
[156]
S. Gurajada and M. Theobald, “Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1,” 2016. [Online]. Available: http://arxiv.org/abs/1609.05293. (arXiv: 1609.05293)
Abstract
We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface.
Export
BibTeX
@online{Gurajada1609.05293, TITLE = {Distributed Processing of Generalized Graph-Pattern Queries in {SPARQL} 1.1}, AUTHOR = {Gurajada, Sairam and Theobald, Martin}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1609.05293}, EPRINT = {1609.05293}, EPRINTTYPE = {arXiv}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface.}, }
Endnote
%0 Report %A Gurajada, Sairam %A Theobald, Martin %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1 : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2212-C %U http://arxiv.org/abs/1609.05293 %D 2016 %X We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface. %K Computer Science, Databases, cs.DB
[157]
S. Gurajada and M. Theobald, “Distributed Set Reachability,” in SIGMOD’16, ACM SIGMOD International Conference on Management of Data, San Francisco, CA, USA, 2016.
Export
BibTeX
@inproceedings{GurajadaSIGMOD2016, TITLE = {Distributed Set Reachability}, AUTHOR = {Gurajada, Sairam and Theobald, Martin}, LANGUAGE = {eng}, ISBN = {978-1-4503-3531-7}, DOI = {10.1145/2882903.2915226}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {SIGMOD'16, ACM SIGMOD International Conference on Management of Data}, PAGES = {1247--1261}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Gurajada, Sairam %A Theobald, Martin %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Distributed Set Reachability : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-220F-5 %R 10.1145/2882903.2915226 %D 2016 %B ACM SIGMOD International Conference on Management of Data %Z date of event: 2016-06-26 - 2016-07-01 %C San Francisco, CA, USA %B SIGMOD'16 %P 1247 - 1261 %I ACM %@ 978-1-4503-3531-7
[158]
M. Halbe, “Skim: Alternative Candidate Selections for Slim through Sketching,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{HalbeBcS2016, TITLE = {Skim: Alternative Candidate Selections for Slim through Sketching}, AUTHOR = {Halbe, Magnus}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, TYPE = {Bachelor's thesis}, }
Endnote
%0 Thesis %A Halbe, Magnus %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Skim: Alternative Candidate Selections for Slim through Sketching : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5F44-6 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P X, 52 p. %V bachelor %9 bachelor
[159]
Y. He, K. Chakrabarti, T. Cheng, and T. Tylenda, “Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora,” in WWW’16, 25th International Conference on World Wide Web, Montréal, Canada, 2016.
Export
BibTeX
@inproceedings{He_WWW2016, TITLE = {Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora}, AUTHOR = {He, Yeye and Chakrabarti, Kaushik and Cheng, Tao and Tylenda, Tomasz}, LANGUAGE = {eng}, ISBN = {978-1-4503-4143-1}, DOI = {10.1145/2872427.2874816}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WWW'16, 25th International Conference on World Wide Web}, PAGES = {1429--1439}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A He, Yeye %A Chakrabarti, Kaushik %A Cheng, Tao %A Tylenda, Tomasz %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-312D-5 %R 10.1145/2872427.2874816 %D 2016 %B 25th International Conference on World Wide Web %Z date of event: 2016-05-11 - 2016-05-15 %C Montréal, Canada %B WWW'16 %P 1429 - 1439 %I ACM %@ 978-1-4503-4143-1
[160]
J. Hoffart, D. Milchevski, G. Weikum, A. Anand, and J. Singh, “The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities,” in WWW’16 Companion, Montréal, Canada, 2016.
Export
BibTeX
@inproceedings{HoffartWWW2016, TITLE = {The Knowledge Awakens: {K}eeping Knowledge Bases Fresh with Emerging Entities}, AUTHOR = {Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard and Anand, Avishek and Singh, Jaspreet}, LANGUAGE = {eng}, ISBN = {978-1-4503-4144-8}, DOI = {10.1145/2872518.2890537}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WWW'16 Companion}, PAGES = {203--206}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Hoffart, Johannes %A Milchevski, Dragan %A Weikum, Gerhard %A Anand, Avishek %A Singh, Jaspreet %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-01BB-4 %R 10.1145/2872518.2890537 %D 2016 %B 25th International Conference on World Wide Web %Z date of event: 2016-05-11 - 2016-05-15 %C Montréal, Canada %B WWW'16 Companion %P 203 - 206 %I ACM %@ 978-1-4503-4144-8
[161]
K. Hui and K. Berberich, “Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations,” in WWW’16 Companion, Montréal, Canada, 2016.
Export
BibTeX
@inproceedings{HuiWWW2016, TITLE = {Cluster Hypothesis in Low-Cost {IR} Evaluation with Different Document Representations}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4144-8}, DOI = {10.1145/2872518.2889370}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WWW'16 Companion}, PAGES = {47--48}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-08E3-C %R 10.1145/2872518.2889370 %D 2016 %B 25th International Conference on World Wide Web %Z date of event: 2016-05-11 - 2016-05-15 %C Montréal, Canada %B WWW'16 Companion %P 47 - 48 %I ACM %@ 978-1-4503-4144-8
[162]
Y. Ibrahim, M. Riedewald, and G. Weikum, “Making Sense of Entities and Quantities in Web Tables,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.
Export
BibTeX
@inproceedings{Ibrahim:CIKM2016, TITLE = {Making Sense of Entities and Quantities in {Web} Tables}, AUTHOR = {Ibrahim, Yusra and Riedewald, Mirek and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4073-1}, DOI = {10.1145/2983323.2983772}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management}, PAGES = {1703--1712}, ADDRESS = {Indianapolis, IN, USA}, }
Endnote
%0 Conference Proceedings %A Ibrahim, Yusra %A Riedewald, Mirek %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Making Sense of Entities and Quantities in Web Tables : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4852-E %R 10.1145/2983323.2983772 %D 2016 %B 25th ACM Conference on Information and Knowledge Management %Z date of event: 2016-10-24 - 2016-10-28 %C Indianapolis, IN, USA %B CIKM'16 %P 1703 - 1712 %I ACM %@ 978-1-4503-4073-1
[163]
J. Kalofolias, “Maximum Entropy Models for Redescription Mining,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{KalofoliasMSc2016, TITLE = {Maximum Entropy Models for Redescription Mining}, AUTHOR = {Kalofolias, Janis}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Kalofolias, Janis %Y Miettinen, Pauli %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Maximum Entropy Models for Redescription Mining : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-54C0-6 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P III, 51 p. %V master %9 master
[164]
S. Karaev and P. Miettinen, “Cancer: Another Algorithm for Subtropical Matrix Factorization,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Riva del Garda, Italy, 2016. (Data Mining and Knowledge Discovery Best Student Paper Award)
Export
BibTeX
@inproceedings{KaraevECML2016, TITLE = {Cancer: {A}nother Algorithm for Subtropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-3-319-46226-4}, DOI = {10.1007/978-3-319-46227-1_36}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)}, EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles}, PAGES = {576--592}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {9852}, ADDRESS = {Riva del Garda, Italy}, }
Endnote
%0 Conference Proceedings %A Karaev, Sanjar %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cancer: Another Algorithm for Subtropical Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A926-A %R 10.1007/978-3-319-46227-1_36 %D 2016 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2016-09-19 - 2016-09-23 %C Riva del Garda, Italy %B Machine Learning and Knowledge Discovery in Databases %E Frasconi, Paolo; Landwehr, Niels; Manco, Guiseppe; Vreeken, Jilles %P 576 - 592 %I Springer %@ 978-3-319-46226-4 %B Lecture Notes in Artificial Intelligence %N 9852
[165]
S. Karaev and P. Miettinen, “Capricorn: An Algorithm for Subtropical Matrix Factorization,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.
Abstract
Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
Export
BibTeX
@inproceedings{karaev16capricorn, TITLE = {Capricorn: {An} Algorithm for Subtropical Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-61197-434-8}, DOI = {10.1137/1.9781611974348.79}, PUBLISHER = {SIAM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.}, BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)}, EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner}, PAGES = {702--710}, ADDRESS = {Miama, FL, USA}, }
Endnote
%0 Conference Proceedings %A Karaev, Sanjar %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Capricorn: An Algorithm for Subtropical Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-542F-3 %R 10.1137/1.9781611974348.79 %D 2016 %B 16th SIAM International Conference on Data Mining %Z date of event: 2016-05-05 - 2016-05-07 %C Miama, FL, USA %X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data. %B Proceedings of the Sixteenth SIAM International Conference on Data Mining %E Chawla Venkatasubramanian, Sanjay; Meira, Wagner %P 702 - 710 %I SIAM %@ 978-1-61197-434-8
[166]
M. Krötzsch and G. Weikum, “Editorial,” Journal of Web Semantics, vol. 37/38, 2016.
Export
BibTeX
@article{Kroetzsch2016, TITLE = {Editorial}, AUTHOR = {Kr{\"o}tzsch, Markus and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {1570-8268}, DOI = {10.1016/j.websem.2016.04.002}, PUBLISHER = {Elsevier}, ADDRESS = {Amsterdam}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, JOURNAL = {Journal of Web Semantics}, VOLUME = {37/38}, PAGES = {53--54}, }
Endnote
%0 Journal Article %A Krötzsch, Markus %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Editorial : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-EB8D-B %R 10.1016/j.websem.2016.04.002 %7 2016 %D 2016 %J Journal of Web Semantics %O Science, Services and Agents on the World Wide Web Web Semantics: Science, Services and Agents on the World Wide Web %V 37/38 %& 53 %P 53 - 54 %I Elsevier %C Amsterdam %@ false
[167]
E. Kuzey, J. Strötgen, V. Setty, and G. Weikum, “Temponym Tagging: Temporal Scopes for Textual Phrases,” in WWW’16 Companion, Montréal, Canada, 2016.
Export
BibTeX
@inproceedings{Kuzey:2016:TTT:2872518.2889289, TITLE = {Temponym Tagging: {T}emporal Scopes for Textual Phrases}, AUTHOR = {Kuzey, Erdal and Str{\"o}tgen, Jannik and Setty, Vinay and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4144-8}, DOI = {10.1145/2872518.2889289}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WWW'16 Companion}, PAGES = {841--842}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Kuzey, Erdal %A Strötgen, Jannik %A Setty, Vinay %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Temponym Tagging: Temporal Scopes for Textual Phrases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-4134-1 %R 10.1145/2872518.2889289 %D 2016 %B 25th International Conference on World Wide Web %Z date of event: 2016-05-11 - 2016-05-15 %C Montréal, Canada %B WWW'16 Companion %P 841 - 842 %I ACM %@ 978-1-4503-4144-8
[168]
E. Kuzey, V. Setty, J. Strötgen, and G. Weikum, “As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes,” in WWW’16, 25th International Conference on World Wide Web, Montréal, Canada, 2016.
Export
BibTeX
@inproceedings{Kuzey_WWW2016, TITLE = {As Time Goes By: {C}omprehensive Tagging of Textual Phrases with Temporal Scopes}, AUTHOR = {Kuzey, Erdal and Setty, Vinay and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4143-1}, DOI = {10.1145/2872427.2883055}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WWW'16, 25th International Conference on World Wide Web}, PAGES = {915--925}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Kuzey, Erdal %A Setty, Vinay %A Strötgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T As Time Goes By: Comprehensive Tagging of Textual Phrases with Temporal Scopes : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-310D-D %R 10.1145/2872427.2883055 %D 2016 %B 25th International Conference on World Wide Web %Z date of event: 2016-05-11 - 2016-05-15 %C Montréal, Canada %B WWW'16 %P 915 - 925 %I ACM %@ 978-1-4503-4143-1
[169]
S. Metzler, S. Günnemann, and P. Miettinen, “Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques,” 2016. [Online]. Available: http://arxiv.org/abs/1602.04650. (arXiv: 1602.04650)
Abstract
Cliques (or quasi-cliques) are frequently used to model communities: a set of nodes where each pair is (equally) likely to be connected. However, when observing real-world communities, we see that most communities have more structure than that. In particular, the nodes can be ordered in such a way that (almost) all edges in the community lie below a hyperbola. In this paper we present three new models for communities that capture this phenomenon. Our models explain the structure of the communities differently, but we also prove that they are identical in their expressive power. Our models fit to real-world data much better than traditional block models, and allow for more in-depth understanding of the structure of the data.
Export
BibTeX
@online{Metzler_arXiv2016, TITLE = {Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques}, AUTHOR = {Metzler, Saskia and G{\"u}nnemann, Stephan and Miettinen, Pauli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1602.04650}, EPRINT = {1602.04650}, EPRINTTYPE = {arXiv}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Cliques (or quasi-cliques) are frequently used to model communities: a set of nodes where each pair is (equally) likely to be connected. However, when observing real-world communities, we see that most communities have more structure than that. In particular, the nodes can be ordered in such a way that (almost) all edges in the community lie below a hyperbola. In this paper we present three new models for communities that capture this phenomenon. Our models explain the structure of the communities differently, but we also prove that they are identical in their expressive power. Our models fit to real-world data much better than traditional block models, and allow for more in-depth understanding of the structure of the data.}, }
Endnote
%0 Report %A Metzler, Saskia %A Günnemann, Stephan %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-08E5-8 %U http://arxiv.org/abs/1602.04650 %D 2016 %X Cliques (or quasi-cliques) are frequently used to model communities: a set of nodes where each pair is (equally) likely to be connected. However, when observing real-world communities, we see that most communities have more structure than that. In particular, the nodes can be ordered in such a way that (almost) all edges in the community lie below a hyperbola. In this paper we present three new models for communities that capture this phenomenon. Our models explain the structure of the communities differently, but we also prove that they are identical in their expressive power. Our models fit to real-world data much better than traditional block models, and allow for more in-depth understanding of the structure of the data. %K cs.SI, Physics, Physics and Society, physics.soc-ph
[170]
P. Mirza, S. Razniewski, and W. Nutt, “Expanding Wikidata’s Parenthood Information by 178%, or How To Mine Relation Cardinalities,” in Proceedings of the ISWC 2016 Posters & Demonstrations Track co-located with 15th International Semantic Web Conference (ISWC-P&D 2016), Kobe, Japan, 2016.
Export
BibTeX
@inproceedings{DBLP:conf/semweb/MirzaRN16, TITLE = {Expanding {W}ikidata's Parenthood Information by 178{\%}, or How To Mine Relation Cardinalities}, AUTHOR = {Mirza, Paramita and Razniewski, Simon and Nutt, Werner}, LANGUAGE = {eng}, URL = {urn:nbn:de:0074-1690-5}, PUBLISHER = {CEUR-WS.org}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the ISWC 2016 Posters \& Demonstrations Track co-located with 15th International Semantic Web Conference (ISWC-P\&D 2016)}, EDITOR = {Kawamura, Takahiro and Paulheim, Heiko}, EID = {4}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {1690}, ADDRESS = {Kobe, Japan}, }
Endnote
%0 Conference Proceedings %A Mirza, Paramita %A Razniewski, Simon %A Nutt, Werner %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Expanding Wikidata's Parenthood Information by 178%, or How To Mine Relation Cardinalities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-23C1-9 %D 2016 %B ISWC 2016 Posters & Demonstrations Trac %Z date of event: 2016-10-19 - 2016-10-19 %C Kobe, Japan %B Proceedings of the ISWC 2016 Posters & Demonstrations Track co-located with 15th International Semantic Web Conference %E Kawamura, Takahiro; Paulheim, Heiko %Z sequence number: 4 %I CEUR-WS.org %B CEUR Workshop Proceedings %N 1690
[171]
P. Mirza and S. Tonelli, “CATENA: CAusal and TEmporal relation extraction from NAtural language texts,” in Proceedings of COLING 2016: Technical Papers, Osaka, Japan, 2016.
Export
BibTeX
@inproceedings{mirza-tonelli:2016:COLING1, TITLE = {{CATENA}: {CAusal} and {TEmporal} relation extraction from {NAtural} language texts}, AUTHOR = {Mirza, Paramita and Tonelli, Sara}, LANGUAGE = {eng}, ISBN = {978-4-87974-702-0}, PUBLISHER = {ACL}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of COLING 2016: Technical Papers}, PAGES = {64--75}, ADDRESS = {Osaka, Japan}, }
Endnote
%0 Conference Proceedings %A Mirza, Paramita %A Tonelli, Sara %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T CATENA: CAusal and TEmporal relation extraction from NAtural language texts : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-23B8-0 %D 2016 %B The 26th International Conference on Computational Linguistics %Z date of event: 2016-12-11 - 2016-12-16 %C Osaka, Japan %B Proceedings of COLING 2016: Technical Papers %P 64 - 75 %I ACL %@ 978-4-87974-702-0
[172]
P. Mirza and S. Tonelli, “On the Contribution of Word Embeddings to Temporal Relation Classification,” in Proceedings of COLING 2016: Technical Papers, Osaka, Japan, 2016.
Export
BibTeX
@inproceedings{mirza-tonelli:2016:COLING2, TITLE = {On the Contribution of Word Embeddings to Temporal Relation Classification}, AUTHOR = {Mirza, Paramita and Tonelli, Sara}, LANGUAGE = {eng}, ISBN = {978-4-87974-702-0}, PUBLISHER = {ACL}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of COLING 2016: Technical Papers}, PAGES = {2818--2828}, ADDRESS = {Osaka, Japan}, }
Endnote
%0 Conference Proceedings %A Mirza, Paramita %A Tonelli, Sara %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T On the Contribution of Word Embeddings to Temporal Relation Classification : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-23BB-A %D 2016 %B The 26th International Conference on Computational Linguistics %Z date of event: 2016-12-11 - 2016-12-16 %C Osaka, Japan %B Proceedings of COLING 2016: Technical Papers %P 2818 - 2828 %I ACL %@ 978-4-87974-702-0
[173]
A. Mishra and K. Berberich, “Leveraging Semantic Annotations to Link Wikipedia and News Archives,” Max-Planck-Institut für Informatik, Saarbrücken, MPI-I-2016-5-002, 2016.
Abstract
The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.
Export
BibTeX
@techreport{MishraBerberich16, TITLE = {Leveraging Semantic Annotations to Link Wikipedia and News Archives}, AUTHOR = {Mishra, Arunav and Berberich, Klaus}, LANGUAGE = {eng}, ISSN = {0946-011X}, NUMBER = {MPI-I-2016-5-002}, INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.}, TYPE = {Research Reports}, }
Endnote
%0 Report %A Mishra, Arunav %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Leveraging Semantic Annotations to Link Wikipedia and News Archives : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-5FF0-A %Y Max-Planck-Institut für Informatik %C Saarbrücken %D 2016 %P 21 p. %X The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best. %B Research Reports %@ false
[174]
A. Mishra and K. Berberich, “Leveraging Semantic Annotations to Link Wikipedia and News Archives,” in Advances in Information Retrieval (ECIR 2016), Padova, Italy, 2016.
Export
BibTeX
@inproceedings{MishraECIR2016, TITLE = {Leveraging Semantic Annotations to Link {W}ikipedia and News Archives}, AUTHOR = {Mishra, Arunav and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-30670-4}, DOI = {10.1007/978-3-319-30671-1_3}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2016)}, EDITOR = {Ferro, Nicola and Crestani, Fabio and Moens, Marie-Francine and Mothe, Josiane and Silvestre, Fabrizio and Di Nunzio, Giorgio Maria and Hauff, Claudia and Silvello, Gianmaria}, PAGES = {30--42}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9626}, ADDRESS = {Padova, Italy}, }
Endnote
%0 Conference Proceedings %A Mishra, Arunav %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Leveraging Semantic Annotations to Link Wikipedia and News Archives : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-48DC-F %R 10.1007/978-3-319-30671-1_3 %D 2016 %B 38th European Conference on Information Retrieval %Z date of event: 2016-03-20 - 2016-03-23 %C Padova, Italy %B Advances in Information Retrieval %E Ferro, Nicola; Crestani, Fabio; Moens, Marie-Francine; Mothe, Josiane; Silvestre, Fabrizio; Di Nunzio, Giorgio Maria; Hauff, Claudia; Silvello, Gianmaria %P 30 - 42 %I Springer %@ 978-3-319-30670-4 %B Lecture Notes in Computer Science %N 9626
[175]
A. Mishra and K. Berberich, “Estimating Time Models for News Article Excerpts,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.
Export
BibTeX
@inproceedings{DBLP:conf/cikm/MishraB16, TITLE = {Estimating Time Models for News Article Excerpts}, AUTHOR = {Mishra, Arunav and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4073-1}, DOI = {10.1145/2983323.2983802}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management}, PAGES = {781--790}, ADDRESS = {Indianapolis, IN, USA}, }
Endnote
%0 Conference Proceedings %A Mishra, Arunav %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Estimating Time Models for News Article Excerpts : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-20CF-3 %R 10.1145/2983323.2983802 %D 2016 %B 25th ACM Conference on Information and Knowledge Management %Z date of event: 2016-10-24 - 2016-10-28 %C Indianapolis, IN, USA %B CIKM'16 %P 781 - 790 %I ACM %@ 978-1-4503-4073-1
[176]
A. Mishra and K. Berberich, “Event Digest: A Holistic View on Past Events,” in SIGIR’16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 2016.
Export
BibTeX
@inproceedings{MishraSIGIR2016, TITLE = {Event Digest: {A} Holistic View on Past Events}, AUTHOR = {Mishra, Arunav and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-4069-4}, DOI = {10.1145/2911451.2911526}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {SIGIR'16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {493--502}, ADDRESS = {Pisa, Italy}, }
Endnote
%0 Conference Proceedings %A Mishra, Arunav %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Event Digest: A Holistic View on Past Events : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-0895-D %R 10.1145/2911451.2911526 %D 2016 %B 39th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2016-07-17 - 2016-07-21 %C Pisa, Italy %B SIGIR'16 %P 493 - 502 %I ACM %@ 978-1-4503-4069-4
[177]
S. Mukherjee, S. Günnemann, and G. Weikum, “Continuous Experience-aware Language Model,” in KDD’16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016.
Export
BibTeX
@inproceedings{MukherjeeKDD2016, TITLE = {Continuous Experience-aware Language Model}, AUTHOR = {Mukherjee, Subhabrata and G{\"u}nnemann, Stephan and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4232-2}, DOI = {10.1145/2939672.2939780}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {KDD'16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, PAGES = {1075--1084}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Mukherjee, Subhabrata %A Günnemann, Stephan %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Continuous Experience-aware Language Model : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A678-6 %R 10.1145/2939672.2939780 %D 2016 %B 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining %Z date of event: 2016-08-13 - 2016-08-17 %C San Francisco, CA, USA %B KDD'16 %P 1075 - 1084 %I ACM %@ 978-1-4503-4232-2
[178]
S. Mukherjee, S. Dutta, and G. Weikum, “Credible Review Detection with Limited Information Using Consistency Features,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Riva del Garda, Italy, 2016.
Export
BibTeX
@inproceedings{MukherjeeECML2016, TITLE = {Credible Review Detection with Limited Information Using Consistency Features}, AUTHOR = {Mukherjee, Subhabrata and Dutta, Sourav and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-46226-4}, DOI = {10.1007/978-3-319-46227-1_13}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016)}, EDITOR = {Frasconi, Paolo and Landwehr, Niels and Manco, Guiseppe and Vreeken, Jilles}, PAGES = {195--213}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {9852}, ADDRESS = {Riva del Garda, Italy}, }
Endnote
%0 Conference Proceedings %A Mukherjee, Subhabrata %A Dutta, Sourav %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Credible Review Detection with Limited Information Using Consistency Features : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A67C-D %R 10.1007/978-3-319-46227-1_13 %D 2016 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2016-09-19 - 2016-09-23 %C Riva del Garda, Italy %B Machine Learning and Knowledge Discovery in Databases %E Frasconi, Paolo; Landwehr, Niels; Manco, Guiseppe; Vreeken, Jilles %P 195 - 213 %I Springer %@ 978-3-319-46226-4 %B Lecture Notes in Artificial Intelligence %N 9852
[179]
N. Mukuze and P. Miettinen, “Interactive Constrained Boolean Matrix Factorization,” in Proceedings of the ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2016), San Francisco, CA, USA, 2016.
Export
BibTeX
@inproceedings{mukuze16interactive, TITLE = {Interactive Constrained {B}oolean Matrix Factorization}, AUTHOR = {Mukuze, Nelson and Miettinen, Pauli}, LANGUAGE = {eng}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2016)}, EDITOR = {Chau, Duen Horng and Vreeken, Jilles and van Leeuwen, Matthijs and Shahaf, Dafna and Faloutsos, Christos}, PAGES = {96--104}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Mukuze, Nelson %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Interactive Constrained Boolean Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-226C-2 %D 2016 %B ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics %Z date of event: 2015-08-14 - 2014-08-14 %C San Francisco, CA, USA %B Proceedings of the ACM SIGKDD 2016 Full-day Workshop on Interactive Data Exploration and Analytics %E Chau, Duen Horng; Vreeken, Jilles; van Leeuwen, Matthijs; Shahaf, Dafna; Faloutsos , Christos %P 96 - 104
[180]
N. Mukuze, “Interactive Boolean Matrix Factorization,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{MukuzeMSc2016, TITLE = {Interactive Boolean Matrix Factorization}, AUTHOR = {Mukuze, Nelson}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Mukuze, Nelson %Y Miettinen, Pauli %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Interactive Boolean Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-54C8-5 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P III, 68 p. %V master %9 master
[181]
S. Nag Chowdhury, “Commonsense for Making Sense of Data,” in Proceedings of the VLDB 2016 PhD Workshop co-located with the 42nd International Conference on Very Large Databases (VLDB 2016), New Delhi, India, 2016.
Export
BibTeX
@inproceedings{NagChowdhuryVLDB2016, TITLE = {Commonsense for Making Sense of Data}, AUTHOR = {Nag Chowdhury, Sreyasi}, LANGUAGE = {eng}, URL = {urn:nbn:de:0074-1671-7; urn:nbn:de:0074-1671-7}, PUBLISHER = {CEUR-WS.org}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the VLDB 2016 PhD Workshop co-located with the 42nd International Conference on Very Large Databases (VLDB 2016)}, EDITOR = {Grust, Torsten and Karlapalem, Kamal and Pavlo, Andyq}, EID = {8}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {1671}, ADDRESS = {New Delhi, India}, }
Endnote
%0 Conference Proceedings %A Nag Chowdhury, Sreyasi %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Commonsense for Making Sense of Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-22E4-3 %U urn:nbn:de:0074-1671-7 %D 2016 %B VLDB 2016 PhD Workshop %Z date of event: 2016-09-09 - 2016-09-09 %C New Delhi, India %B Proceedings of the VLDB 2016 PhD Workshop co-located with the 42nd International Conference on Very Large Databases (VLDB 2016) %E Grust, Torsten; Karlapalem, Kamal; Pavlo, Andyq %Z sequence number: 8 %I CEUR-WS.org %B CEUR Workshop Proceedings %N 1671
[182]
S. Nag Chowdhury, N. Tandon, and G. Weikum, “Know2Look: Commonsense Knowledge for Visual Search,” in AKBC 2016, 5th Workshop on Automated Knowledge Base Construction, San Diego, CA, USA, 2016.
Export
BibTeX
@inproceedings{DBLP:conf/akbc/ChowdhuryTW16, TITLE = {{Know2Look}: {C}ommonsense Knowledge for Visual Search}, AUTHOR = {Nag Chowdhury, Sreyasi and Tandon, Niket and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://www.akbc.ws/2016/papers/11_Paper.pdf}, PUBLISHER = {AKBC Board}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {AKBC 2016, 5th Workshop on Automated Knowledge Base Construction}, PAGES = {57--62}, ADDRESS = {San Diego, CA, USA}, }
Endnote
%0 Conference Proceedings %A Nag Chowdhury, Sreyasi %A Tandon, Niket %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Know2Look: Commonsense Knowledge for Visual Search : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A633-2 %U http://www.akbc.ws/2016/papers/11_Paper.pdf %D 2016 %B 5th Workshop on Automated Knowledge Base Construction %Z date of event: 2016-06-17 - 2016-06-17 %C San Diego, CA, USA %B AKBC 2016 %P 57 - 62 %I AKBC Board
[183]
D. B. Nguyen, A. Abujabal, N. K. Tran, M. Theobald, and G. Weikum, “Query-Driven On-The-Fly Knowledge Base Construction,” Proceedings of the VLDB Endowment (Proc. VLDB 2017), vol. 9, no. 1, 2016.
Export
BibTeX
@article{escidoc:2530450, TITLE = {Query-Driven On-The-Fly Knowledge Base Construction}, AUTHOR = {Nguyen, Dat Ba and Abujabal, Abdalghani and Tran, Nam Khanh and Theobald, Martin and Weikum, Gerhard}, LANGUAGE = {eng}, DOI = {10.14778/3136610.3136616}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)}, VOLUME = {9}, NUMBER = {1}, PAGES = {66--79}, BOOKTITLE = {Proceedings of the 44th International Conference on Very Large Data Bases (VLDB 2017)}, EDITOR = {Bhowmick, Sourav and Torres, Ricardo}, }
Endnote
%0 Journal Article %A Nguyen, Dat Ba %A Abujabal, Abdalghani %A Tran, Nam Khanh %A Theobald, Martin %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Query-Driven On-The-Fly Knowledge Base Construction : %G eng %U http://hdl.handle.net/21.11116/0000-0000-3B51-3 %R 10.14778/3136610.3136616 %7 2016 %D 2016 %J Proceedings of the VLDB Endowment %O PVLDB %V 9 %N 1 %& 66 %P 66 - 79 %I ACM %C New York, NY %B Proceedings of the 44th International Conference on Very Large Data Bases %O VLDB 2017 Rio de Janeiro, Brazil, August 27-31, 2018
[184]
D. B. Nguyen, M. Theobald, and G. Weikum, “J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features,” Transactions of the Association for Computational Linguistics, vol. 4, 2016.
Export
BibTeX
@article{Nguyen2016, TITLE = {{J}-{NERD}: {J}oint {N}amed {E}ntity {R}ecognition and {D}isambiguation with Rich Linguistic Features}, AUTHOR = {Nguyen, Dat Ba and Theobald, Martin and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {2307-387X}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, JOURNAL = {Transactions of the Association for Computational Linguistics}, VOLUME = {4}, PAGES = {215--229}, }
Endnote
%0 Journal Article %A Nguyen, Dat Ba %A Theobald, Martin %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-0199-1 %7 2016 %D 2016 %J Transactions of the Association for Computational Linguistics %O TACL %V 4 %& 215 %P 215 - 229 %@ false %U https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/698
[185]
H.-V. Nguyen and J. Vreeken, “Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.
Abstract
Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
Export
BibTeX
@inproceedings{VreekenSDM2016, TITLE = {Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series}, AUTHOR = {Nguyen, Hoang-Vu and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-434-8}, DOI = {10.1137/1.9781611974348.93}, PUBLISHER = {SIAM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.}, BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)}, EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner}, PAGES = {828--836}, ADDRESS = {Miama, FL, USA}, }
Endnote
%0 Conference Proceedings %A Nguyen, Hoang-Vu %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A937-4 %R 10.1137/1.9781611974348.93 %D 2016 %B 16th SIAM International Conference on Data Mining %Z date of event: 2016-05-05 - 2016-05-07 %C Miama, FL, USA %X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data. %B Proceedings of the Sixteenth SIAM International Conference on Data Mining %E Chawla Venkatasubramanian, Sanjay; Meira, Wagner %P 828 - 836 %I SIAM %@ 978-1-61197-434-8
[186]
H.-V. Nguyen and J. Vreeken, “Flexibly Mining Better Subgroups,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.
Abstract
Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
Export
BibTeX
@inproceedings{NguyenSDM2016, TITLE = {Flexibly Mining Better Subgroups}, AUTHOR = {Nguyen, Hoang-Vu and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-434-8}, DOI = {10.1137/1.9781611974348.66}, PUBLISHER = {SIAM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.}, BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)}, EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner}, PAGES = {585--593}, ADDRESS = {Miama, FL, USA}, }
Endnote
%0 Conference Proceedings %A Nguyen, Hoang-Vu %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Flexibly Mining Better Subgroups : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A933-C %R 10.1137/1.9781611974348.66 %D 2016 %B 16th SIAM International Conference on Data Mining %Z date of event: 2016-05-05 - 2016-05-07 %C Miama, FL, USA %X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data. %B Proceedings of the Sixteenth SIAM International Conference on Data Mining %E Chawla Venkatasubramanian, Sanjay; Meira, Wagner %P 585 - 593 %I SIAM %@ 978-1-61197-434-8
[187]
H.-V. Nguyen, P. Mandros, and J. Vreeken, “Universal Dependency Analysis,” in Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016), Miama, FL, USA, 2016.
Abstract
Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
Export
BibTeX
@inproceedings{MandrosSDM2016, TITLE = {Universal Dependency Analysis}, AUTHOR = {Nguyen, Hoang-Vu and Mandros, Panagiotis and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-434-8}, DOI = {10.1137/1.9781611974348.89}, PUBLISHER = {SIAM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.}, BOOKTITLE = {Proceedings of the Sixteenth SIAM International Conference on Data Mining (SDM 2016)}, EDITOR = {Chawla Venkatasubramanian, Sanjay and Meira, Wagner}, PAGES = {792--800}, ADDRESS = {Miama, FL, USA}, }
Endnote
%0 Conference Proceedings %A Nguyen, Hoang-Vu %A Mandros, Panagiotis %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Universal Dependency Analysis : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A935-8 %R 10.1137/1.9781611974348.89 %D 2016 %B 16th SIAM International Conference on Data Mining %Z date of event: 2016-05-05 - 2016-05-07 %C Miama, FL, USA %X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data. %B Proceedings of the Sixteenth SIAM International Conference on Data Mining %E Chawla Venkatasubramanian, Sanjay; Meira, Wagner %P 792 - 800 %I SIAM %@ 978-1-61197-434-8
[188]
K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum, “Credibility Assessment of Textual Claims on the Web,” in CIKM’16, 25th ACM International Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.
Export
BibTeX
@inproceedings{PopatCIKM2016, TITLE = {Credibility Assessment of Textual Claims on the {Web}}, AUTHOR = {Popat, Kashyap and Mukherjee, Subhabrata and Str{\"o}tgen, Jannik and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4073-1}, DOI = {10.1145/2983323.2983661}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {CIKM'16, 25th ACM International Conference on Information and Knowledge Management}, PAGES = {2173--2178}, ADDRESS = {Indianapolis, IN, USA}, }
Endnote
%0 Conference Proceedings %A Popat, Kashyap %A Mukherjee, Subhabrata %A Strötgen, Jannik %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Credibility Assessment of Textual Claims on the Web : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-B260-3 %R 10.1145/2983323.2983661 %D 2016 %B 25th ACM International Conference on Information and Knowledge Management %Z date of event: 2016-10-24 - 2016-10-28 %C Indianapolis, IN, USA %B CIKM'16 %P 2173 - 2178 %I ACM %@ 978-1-4503-4073-1
[189]
T. Rebele, F. Suchanek, J. Hoffart, J. Biega, E. Kuzey, and G. Weikum, “YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames,” in The Semantic Web -- ISWC 2016, Kobe, Japan, 2016.
Export
BibTeX
@inproceedings{RebeleISWC2016, TITLE = {{YAGO}: A Multilingual Knowledge Base from {W}ikipedia, {W}ordnet, and {G}eonames}, AUTHOR = {Rebele, Thomas and Suchanek, Fabian and Hoffart, Johannes and Biega, Joanna and Kuzey, Erdal and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-3-319-46546-3}, DOI = {10.1007/978-3-319-46547-0_19}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {The Semantic Web -- ISWC 2016}, EDITOR = {Groth, Paul and Simperl, Elena and Gray, Alasdair and Sabou, Marta and Kr{\"o}tzsch, Markus and Lecue, Freddy and Fl{\"o}ck, Fabian and Gil, Yolanda}, PAGES = {177--185}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9982}, ADDRESS = {Kobe, Japan}, }
Endnote
%0 Conference Proceedings %A Rebele, Thomas %A Suchanek, Fabian %A Hoffart, Johannes %A Biega, Joanna %A Kuzey, Erdal %A Weikum, Gerhard %+ Télécom ParisTech Télécom ParisTech Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A69A-9 %R 10.1007/978-3-319-46547-0_19 %D 2016 %B 15th International Semantic Web Conference %Z date of event: 2016-10-17 - 2016-10-21 %C Kobe, Japan %B The Semantic Web -- ISWC 2016 %E Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Krötzsch, Markus; Lecue, Freddy; Flöck, Fabian; Gil, Yolanda %P 177 - 185 %I Springer %@ 978-3-319-46546-3 %B Lecture Notes in Computer Science %N 9982
[190]
R. S. Roy, S. Agarwal, N. Ganguly, and M. Choudhury, “Syntactic Complexity of Web Search Queries through the Lenses of Language Models, Networks and Users,” Information Processing and Management, vol. 52, no. 5, 2016.
Export
BibTeX
@article{Roy2016, TITLE = {Syntactic complexity of {W}eb search queries through the lenses of language models, networks and users}, AUTHOR = {Roy, Rishiraj Saha and Agarwal, Smith and Ganguly, Niloy and Choudhury, Monojit}, LANGUAGE = {eng}, ISSN = {0306-4573}, DOI = {10.1016/j.ipm.2016.04.002}, PUBLISHER = {Elsevier}, ADDRESS = {Amsterdam}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, JOURNAL = {Information Processing and Management}, VOLUME = {52}, NUMBER = {5}, PAGES = {923--948}, }
Endnote
%0 Journal Article %A Roy, Rishiraj Saha %A Agarwal, Smith %A Ganguly, Niloy %A Choudhury, Monojit %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Syntactic Complexity of Web Search Queries through the Lenses of Language Models, Networks and Users : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-7EBE-B %R 10.1016/j.ipm.2016.04.002 %7 2016 %D 2016 %J Information Processing and Management %V 52 %N 5 %& 923 %P 923 - 948 %I Elsevier %C Amsterdam %@ false
[191]
R. S. Roy, A. Suresh, N. Ganguly, and M. Choudhury, “Improving Document Ranking for Long Queries with Nested Query Segmentation,” in Advances in Information Retrieval (ECIR 2016), Padova, Italy, 2016.
Export
BibTeX
@inproceedings{RoyECIR2016, TITLE = {Improving Document Ranking for Long Queries with Nested Query Segmentation}, AUTHOR = {Roy, Rishiraj Saha and Suresh, Anusha and Ganguly, Niloy and Choudhury, Monojit}, LANGUAGE = {eng}, ISBN = {978-3-319-30670-4}, DOI = {10.1007/978-3-319-30671-1_67}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2016)}, EDITOR = {Ferro, Nicola and Crestani, Fabio and Moens, Marie-Francine and Mothe, Josiane and Silvestre, Fabrizio and Di Nunzio, Giorgio Maria and Hauff, Claudia and Silvello, Gianmaria}, PAGES = {775--781}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9626}, ADDRESS = {Padova, Italy}, }
Endnote
%0 Conference Proceedings %A Roy, Rishiraj Saha %A Suresh, Anusha %A Ganguly, Niloy %A Choudhury, Monojit %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Improving Document Ranking for Long Queries with Nested Query Segmentation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-48DF-9 %R 10.1007/978-3-319-30671-1_67 %D 2016 %B 38th European Conference on Information Retrieval %Z date of event: 2016-03-20 - 2016-03-23 %C Padova, Italy %B Advances in Information Retrieval %E Ferro, Nicola; Crestani, Fabio; Moens, Marie-Francine; Mothe, Josiane; Silvestre, Fabrizio; Di Nunzio, Giorgio Maria; Hauff, Claudia; Silvello, Gianmaria %P 775 - 781 %I Springer %@ 978-3-319-30670-4 %B Lecture Notes in Computer Science %N 9626
[192]
P. Rozenshtein, A. Gionis, B. A. Prakash, and J. Vreeken, “Reconstructing an Epidemic Over Time,” in KDD’16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016.
Export
BibTeX
@inproceedings{RozenshteinKDD2016, TITLE = {Reconstructing an Epidemic Over Time}, AUTHOR = {Rozenshtein, Polina and Gionis, Aristides and Prakash, B. Aditya and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-4503-4232-2}, DOI = {10.1145/2939672.2939865}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {KDD'16, 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, PAGES = {1835 --1844}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Rozenshtein, Polina %A Gionis, Aristides %A Prakash, B. Aditya %A Vreeken, Jilles %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Reconstructing an Epidemic Over Time : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A92F-7 %R 10.1145/2939672.2939865 %D 2016 %B 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining %Z date of event: 2016-08-13 - 2016-08-17 %C San Francisco, CA, USA %B KDD'16 %P 1835 - 1844 %I ACM %@ 978-1-4503-4232-2
[193]
M. Salyaeva, “Summarising and Recommending with Skipisodes,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{SalyaevaMSc2016, TITLE = {Summarising and Recommending with Skipisodes}, AUTHOR = {Salyaeva, Margarita}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Salyaeva, Margarita %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Summarising and Recommending with Skipisodes : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5F46-2 %I Universität des Saarlandes %C Saarbrücken %D 2016 %V master %9 master
[194]
A. Schmidt, J. Hoffart, D. Milchevski, and G. Weikum, “Context-Sensitive Auto-Completion for Searching with Entities and Categories,” in SIGIR’16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 2016.
Export
BibTeX
@inproceedings{SchmidtIGIR2016, TITLE = {Context-Sensitive Auto-Completion for Searching with Entities and Categories}, AUTHOR = {Schmidt, Andreas and Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4069-4}, DOI = {10.1145/2911451.2911461}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {SIGIR'16, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval}, PAGES = {1097--1100}, ADDRESS = {Pisa, Italy}, }
Endnote
%0 Conference Proceedings %A Schmidt, Andreas %A Hoffart, Johannes %A Milchevski, Dragan %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Context-Sensitive Auto-Completion for Searching with Entities and Categories : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A924-E %R 10.1145/2911451.2911461 %D 2016 %B 39th International ACM SIGIR Conference on Research and Development in Information Retrieval %Z date of event: 2016-07-17 - 2016-07-21 %C Pisa, Italy %B SIGIR'16 %P 1097 - 1100 %I ACM %@ 978-1-4503-4069-4
[195]
S. Seufert, P. Ernst, S. J. Bedathur, S. K. Kondreddi, K. Berberich, and G. Weikum, “Instant Espresso: Interactive Analysis of Relationships in Knowledge Graphs,” in WWW’16 Companion, Montréal, Canada, 2016.
Export
BibTeX
@inproceedings{SeufertWWW2016, TITLE = {Instant {E}spresso: {I}nteractive Analysis of Relationships in Knowledge Graphs}, AUTHOR = {Seufert, Stephan and Ernst, Patrick and Bedathur, Srikanta J. and Kondreddi, Sarath Kumar and Berberich, Klaus and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4144-8}, DOI = {10.1145/2872518.2890528}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WWW'16 Companion}, PAGES = {251--254}, ADDRESS = {Montr{\'e}al, Canada}, }
Endnote
%0 Conference Proceedings %A Seufert, Stephan %A Ernst, Patrick %A Bedathur, Srikanta J. %A Kondreddi, Sarath Kumar %A Berberich, Klaus %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Instant Espresso: Interactive Analysis of Relationships in Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-01BD-F %R 10.1145/2872518.2890528 %D 2016 %B 25th International Conference on World Wide Web %Z date of event: 2016-05-11 - 2016-05-15 %C Montréal, Canada %B WWW'16 Companion %P 251 - 254 %I ACM %@ 978-1-4503-4144-8
[196]
S. Seufert, K. Berberich, S. J. Bedathur, S. K. Kondreddi, P. Ernst, and G. Weikum, “ESPRESSO: Explaining Relationships between Entity Sets,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.
Export
BibTeX
@inproceedings{DBLP:conf/cikm/SeufertBBKEW16, TITLE = {{ESPRESSO}: {E}xplaining Relationships between Entity Sets}, AUTHOR = {Seufert, Stephan and Berberich, Klaus and Bedathur, Srikanta J. and Kondreddi, Sarath Kumar and Ernst, Patrick and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-4073-1}, DOI = {10.1145/2983323.2983778}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management}, PAGES = {1311--1320}, ADDRESS = {Indianapolis, IN, USA}, }
Endnote
%0 Conference Proceedings %A Seufert, Stephan %A Berberich, Klaus %A Bedathur, Srikanta J. %A Kondreddi, Sarath Kumar %A Ernst, Patrick %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T ESPRESSO: Explaining Relationships between Entity Sets : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-20D3-8 %R 10.1145/2983323.2983778 %D 2016 %B 25th ACM Conference on Information and Knowledge Management %Z date of event: 2016-10-24 - 2016-10-28 %C Indianapolis, IN, USA %B CIKM'16 %P 1311 - 1320 %I ACM %@ 978-1-4503-4073-1
[197]
D. Seyler, M. Yahya, K. Berberich, and O. Alonso, “Automated Question Generation for Quality Control in Human Computation Tasks,” in WebSci’16, ACM Web Science Conference, Hannover, Germany, 2016.
Export
BibTeX
@inproceedings{SeylerWebSci2016, TITLE = {Automated Question Generation for Quality Control in Human Computation Tasks}, AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus and Alonso, Omar}, LANGUAGE = {eng}, ISBN = {978-1-4503-4208-7}, DOI = {10.1145/2908131.2908210}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WebSci'16, ACM Web Science Conference}, PAGES = {360--362}, ADDRESS = {Hannover, Germany}, }
Endnote
%0 Conference Proceedings %A Seyler, Dominic %A Yahya, Mohamed %A Berberich, Klaus %A Alonso, Omar %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Automated Question Generation for Quality Control in Human Computation Tasks : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-08DF-7 %R 10.1145/2908131.2908210 %D 2016 %B ACM Web Science Conference %Z date of event: 2016-05-22 - 2016-05-25 %C Hannover, Germany %B WebSci'16 %P 360 - 362 %I ACM %@ 978-1-4503-4208-7
[198]
D. Seyler, M. Yahya, and K. Berberich, “Knowledge Questions from Knowledge Graphs,” 2016. [Online]. Available: http://arxiv.org/abs/1610.09935. (arXiv: 1610.09935)
Abstract
We address the novel problem of automatically generating quiz-style knowledge questions from a knowledge graph such as DBpedia. Questions of this kind have ample applications, for instance, to educate users about or to evaluate their knowledge in a specific domain. To solve the problem, we propose an end-to-end approach. The approach first selects a named entity from the knowledge graph as an answer. It then generates a structured triple-pattern query, which yields the answer as its sole result. If a multiple-choice question is desired, the approach selects alternative answer options. Finally, our approach uses a template-based method to verbalize the structured query and yield a natural language question. A key challenge is estimating how difficult the generated question is to human users. To do this, we make use of historical data from the Jeopardy! quiz show and a semantically annotated Web-scale document collection, engineer suitable features, and train a logistic regression classifier to predict question difficulty. Experiments demonstrate the viability of our overall approach.
Export
BibTeX
@online{Seyler1610.09935, TITLE = {Knowledge Questions from Knowledge Graphs}, AUTHOR = {Seyler, Dominic and Yahya, Mohamed and Berberich, Klaus}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1610.09935}, EPRINT = {1610.09935}, EPRINTTYPE = {arXiv}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {We address the novel problem of automatically generating quiz-style knowledge questions from a knowledge graph such as DBpedia. Questions of this kind have ample applications, for instance, to educate users about or to evaluate their knowledge in a specific domain. To solve the problem, we propose an end-to-end approach. The approach first selects a named entity from the knowledge graph as an answer. It then generates a structured triple-pattern query, which yields the answer as its sole result. If a multiple-choice question is desired, the approach selects alternative answer options. Finally, our approach uses a template-based method to verbalize the structured query and yield a natural language question. A key challenge is estimating how difficult the generated question is to human users. To do this, we make use of historical data from the Jeopardy! quiz show and a semantically annotated Web-scale document collection, engineer suitable features, and train a logistic regression classifier to predict question difficulty. Experiments demonstrate the viability of our overall approach.}, }
Endnote
%0 Report %A Seyler, Dominic %A Yahya, Mohamed %A Berberich, Klaus %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Knowledge Questions from Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1CB5-F %U http://arxiv.org/abs/1610.09935 %D 2016 %X We address the novel problem of automatically generating quiz-style knowledge questions from a knowledge graph such as DBpedia. Questions of this kind have ample applications, for instance, to educate users about or to evaluate their knowledge in a specific domain. To solve the problem, we propose an end-to-end approach. The approach first selects a named entity from the knowledge graph as an answer. It then generates a structured triple-pattern query, which yields the answer as its sole result. If a multiple-choice question is desired, the approach selects alternative answer options. Finally, our approach uses a template-based method to verbalize the structured query and yield a natural language question. A key challenge is estimating how difficult the generated question is to human users. To do this, we make use of historical data from the Jeopardy! quiz show and a semantically annotated Web-scale document collection, engineer suitable features, and train a logistic regression classifier to predict question difficulty. Experiments demonstrate the viability of our overall approach. %K Computer Science, Computation and Language, cs.CL
[199]
A. Shah, “Recognizing Visual Activities,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{ShahMSc2016, TITLE = {Recognizing Visual Activities}, AUTHOR = {Shah, Ali}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Shah, Ali %Y Weikum, Gerhard %A referee: Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Recognizing Visual Activities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-439D-1 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P 56 p. %V master %9 master
[200]
J. Singh, J. Hoffart, and A. Anand, “Discovering Entities with Just a Little Help from You,” in CIKM’16, 25th ACM Conference on Information and Knowledge Management, Indianapolis, IN, USA, 2016.
Export
BibTeX
@inproceedings{Singh:2016:DEJ:2983323.2983798, TITLE = {Discovering Entities with Just a Little Help from You}, AUTHOR = {Singh, Jaspreet and Hoffart, Johannes and Anand, Avishek}, LANGUAGE = {eng}, ISBN = {978-1-4503-4073-1}, DOI = {10.1145/2983323.2983798}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {CIKM'16, 25th ACM Conference on Information and Knowledge Management}, PAGES = {1331--1340}, ADDRESS = {Indianapolis, IN, USA}, }
Endnote
%0 Conference Proceedings %A Singh, Jaspreet %A Hoffart, Johannes %A Anand, Avishek %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Discovering Entities with Just a Little Help from You : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1CC2-2 %R 10.1145/2983323.2983798 %D 2016 %B 25th ACM Conference on Information and Knowledge Management %Z date of event: 2016-10-24 - 2016-10-28 %C Indianapolis, IN, USA %B CIKM'16 %P 1331 - 1340 %I ACM %@ 978-1-4503-4073-1
[201]
A. Siu, P. Ernst, and G. Weikum, “Disambiguation of Entities in MEDLINE Abstracts by Combining MeSH Terms with Knowledge,” in Proceedings of the 15th Workshop on Biomedical Natural Language Processing (BioNLP 2016), Berlin, Germany, 2016.
Export
BibTeX
@inproceedings{Siu16, TITLE = {Disambiguation of entities in {MEDLINE} abstracts by combining {MeSH} terms with knowledge}, AUTHOR = {Siu, Amy and Ernst, Patrick and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-945626-12-8}, PUBLISHER = {ACL}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Proceedings of the 15th Workshop on Biomedical Natural Language Processing (BioNLP 2016)}, PAGES = {72--76}, ADDRESS = {Berlin, Germany}, }
Endnote
%0 Conference Proceedings %A Siu, Amy %A Ernst, Patrick %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Disambiguation of Entities in MEDLINE Abstracts by Combining MeSH Terms with Knowledge : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2040-3 %D 2016 %B 15th Workshop on Biomedical Natural Language Processing %Z date of event: 2016-08-12 - 2016-08-12 %C Berlin, Germany %B Proceedings of the 15th Workshop on Biomedical Natural Language Processing %P 72 - 76 %I ACL %@ 978-1-945626-12-8 %U http://aclweb.org/anthology/W/W16/W16-2909.pdf
[202]
D. Spanier, “An Incremental Approach to Distilling Named Events from News Streams,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{SpanierMSc2016, TITLE = {An Incremental Approach to Distilling Named Events from News Streams}, AUTHOR = {Spanier, Daniel}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Spanier, Daniel %Y Weikum, Gerhard %A referee: Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T An Incremental Approach to Distilling Named Events from News Streams : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4913-0 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P XI, 58 p. %V master %9 master
[203]
J. Strötgen and M. Gertz, Domain-Sensitive Temporal Tagging. San Rafael, CA: Morgan & Claypool Publishers, 2016.
Export
BibTeX
@book{StroetgenBook2016, TITLE = {Domain-Sensitive Temporal Tagging}, AUTHOR = {Str{\"o}tgen, Jannik and Gertz, Michael}, LANGUAGE = {eng}, ISSN = {1947-4040}, ISBN = {9781627054591; 9781627054997}, DOI = {10.2200/S00721ED1V01Y201606HLT036}, PUBLISHER = {Morgan \& Claypool Publishers}, ADDRESS = {San Rafael, CA}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, PAGES = {151 p.}, SERIES = {Synthesis Lectures on Human Language Technologies}, }
Endnote
%0 Book %A Strötgen, Jannik %A Gertz, Michael %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Domain-Sensitive Temporal Tagging : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-1777-9 %@ 9781627054591 %@ 9781627054997 %R 10.2200/S00721ED1V01Y201606HLT036 %I Morgan & Claypool Publishers %C San Rafael, CA %D 2016 %P 151 p. %B Synthesis Lectures on Human Language Technologies %@ false
[204]
J. Strötgen, “Domänen-sensitives Temporal Tagging für Event-zentriertes Information Retrieval,” in Ausgezeichnete Informatikdissertationen 2015, Bonn: GI, 2016.
Export
BibTeX
@incollection{StrotgenLNI_Diss16, TITLE = {{Dom{\"a}nen-sensitives Temporal Tagging f{\"u}r Event-zentriertes Information Retrieval}}, AUTHOR = {Str{\"o}tgen, Jannik}, LANGUAGE = {deu}, ISBN = {978-3-88579-975-7}, PUBLISHER = {GI}, ADDRESS = {Bonn}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Ausgezeichnete Informatikdissertationen 2015}, EDITOR = {H{\"o}lldobler, Steffen}, PAGES = {279--288}, SERIES = {Lecture Notes in Informatics -- Dissertations}, VOLUME = {16}, }
Endnote
%0 Book Section %A Strötgen, Jannik %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Domänen-sensitives Temporal Tagging für Event-zentriertes Information Retrieval : %G deu %U http://hdl.handle.net/11858/00-001M-0000-002B-B26A-F %D 2016 %B Ausgezeichnete Informatikdissertationen 2015 %E Hölldobler, Steffen %P 279 - 288 %I GI %C Bonn %@ 978-3-88579-975-7 %S Lecture Notes in Informatics - Dissertations %N 16
[205]
A. Talaika, J. Biega, A. Amarilli, and F. M. Suchanek, “IBEX: Harvesting Entities from the Web Using Unique Identifiers,” in Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015), Melbourne, Australia, 2016.
Export
BibTeX
@inproceedings{Talaika2016, TITLE = {{IBEX}: {H}arvesting Entities from the {Web} Using Unique Identifiers}, AUTHOR = {Talaika, Aliaksandr and Biega, Joanna and Amarilli, Antoine and Suchanek, Fabian M.}, LANGUAGE = {eng}, ISBN = {978-1-4503-3627-7}, DOI = {10.1145/2767109.2767116}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015)}, EDITOR = {Stoyanovich, Julia and Suchanek, Fabian M.}, PAGES = {13--19}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Talaika, Aliaksandr %A Biega, Joanna %A Amarilli, Antoine %A Suchanek, Fabian M. %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Télécom ParisTech Télécom ParisTech %T IBEX: Harvesting Entities from the Web Using Unique Identifiers : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-AF0D-5 %R 10.1145/2767109.2767116 %D 2016 %B 18th International Workshop on the Web and Databases %Z date of event: 2015-05-31 - 2015-05-31 %C Melbourne, Australia %B Proceedings of the 18th International Workshop on Web and Databases %E Stoyanovich, Julia; Suchanek, Fabian M. %P 13 - 19 %I ACM %@ 978-1-4503-3627-7
[206]
N. Tandon, “Commonsense Knowledge Acquisition and Applications,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@phdthesis{TandonPhD2016, TITLE = {Commonsense Knowledge Acquisition and Applications}, AUTHOR = {Tandon, Niket}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-66291}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Tandon, Niket %Y Weikum, Gerhard %A referee: Lieberman, Henry %A referee: Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Commonsense Knowledge Acquisition and Applications : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-78F6-A %U urn:nbn:de:bsz:291-scidok-66291 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P XIV, 154 p. %V phd %9 phd %U http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=dehttp://scidok.sulb.uni-saarland.de/volltexte/2016/6629/
[207]
N. Tandon, C. D. Hariman, J. Urbani, A. Rohrbach, M. Rohrbach, and G. Weikum, “Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 2016.
Export
BibTeX
@inproceedings{TandonAAAI2016, TITLE = {Commonsense in Parts: Mining Part-Whole Relations from the {Web} and Image Tags}, AUTHOR = {Tandon, Niket and Hariman, Charles Darwis and Urbani, Jacopo and Rohrbach, Anna and Rohrbach, Marcus and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-57735-760-5}, PUBLISHER = {AAAI Press}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence}, PAGES = {243--250}, ADDRESS = {Phoenix, AZ, USA}, }
Endnote
%0 Conference Proceedings %A Tandon, Niket %A Hariman, Charles Darwis %A Urbani, Jacopo %A Rohrbach, Anna %A Rohrbach, Marcus %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-ABFE-1 %D 2016 %B Thirtieth AAAI Conference on Artificial Intelligence %Z date of event: 2016-02-12 - 2016-02-17 %C Phoenix, AZ, USA %B Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence %P 243 - 250 %I AAAI Press %@ 978-1-57735-760-5 %U http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12337/11590
[208]
C. Teflioudi, “Algorithms for Shared-Memory Matrix Completion and Maximum Inner Product Search,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@phdthesis{Teflioudiphd2016, TITLE = {Algorithms for Shared-Memory Matrix Completion and Maximum Inner Product Search}, AUTHOR = {Teflioudi, Christina}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291-scidok-64699}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Teflioudi, Christina %Y Gemulla, Rainer %A referee: Weikum, Gerhard %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Algorithms for Shared-Memory Matrix Completion and Maximum Inner Product Search : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-43FA-2 %U urn:nbn:de:bsz:291-scidok-64699 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P xi, 110 p. %V phd %9 phd %U http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=dehttp://scidok.sulb.uni-saarland.de/volltexte/2016/6469/
[209]
Y. S. Uddin, V. Setty, Y. Zhao, R. Vitenberg, and N. Venkatasubramanian, “RichNote: Adaptive Selection and Delivery of Rich Media Notifications to Mobile Users,” in ICDCS 2016, 36th International Conference on Distributed Computing Systems, Nara, Japan, 2016.
Export
BibTeX
@inproceedings{UddinICDCS2016, TITLE = {{RichNote}: {A}daptive Selection and Delivery of Rich Media Notifications to Mobile Users}, AUTHOR = {Uddin, Yusuf Sarwar and Setty, Vinay and Zhao, Ye and Vitenberg, Roman and Venkatasubramanian, Nalini}, LANGUAGE = {eng}, ISSN = {1063-6927}, DOI = {10.1109/ICDCS.2016.107}, PUBLISHER = {IEEE}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {ICDCS 2016, 36th International Conference on Distributed Computing Systems}, PAGES = {159--168}, ADDRESS = {Nara, Japan}, }
Endnote
%0 Conference Proceedings %A Uddin, Yusuf Sarwar %A Setty, Vinay %A Zhao, Ye %A Vitenberg, Roman %A Venkatasubramanian, Nalini %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T RichNote: Adaptive Selection and Delivery of Rich Media Notifications to Mobile Users : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-9AFE-C %R 10.1109/ICDCS.2016.107 %D 2016 %B 36th International Conference on Distributed Computing Systems %Z date of event: 2016-06-27 - 2016-06-30 %C Nara, Japan %B ICDCS 2016 %P 159 - 168 %I IEEE %@ false
[210]
J. Urbani, S. Dutta, S. Gurajada, and G. Weikum, “KOGNAC: Efficient Encoding of Large Knowledge Graphs,” in Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2016), New York, NY, USA, 2016.
Export
BibTeX
@inproceedings{UrbaniIJCAI2016, TITLE = {{KOGNAC}: {E}fficient Encoding of Large Knowledge Graphs}, AUTHOR = {Urbani, Jacopo and Dutta, Sourav and Gurajada, Sairam and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-57735-771-1}, URL = {http://www.ijcai.org/Proceedings/16/Papers/548.pdf}, PUBLISHER = {AAAI}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2016)}, EDITOR = {Kambhampati, Subbarao}, PAGES = {3896--3902}, ADDRESS = {New York, NY, USA}, }
Endnote
%0 Conference Proceedings %A Urbani, Jacopo %A Dutta, Sourav %A Gurajada, Sairam %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T KOGNAC: Efficient Encoding of Large Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A641-2 %U http://www.ijcai.org/Proceedings/16/Papers/548.pdf %D 2016 %B 25th International Joint Conference on Artificial Intelligence %Z date of event: 2016-07-09 - 2016-07-15 %C New York, NY, USA %B Twenty-Fifth International Joint Conference on Artificial Intelligence %E Kambhampati, Subbarao %P 3896 - 3902 %I AAAI %@ 978-1-57735-771-1
[211]
J. Urbani, S. Dutta, S. Gurajada, and G. Weikum, “KOGNAC: Efficient Encoding of Large Knowledge Graphs,” 2016. [Online]. Available: http://arxiv.org/abs/1604.04795. (arXiv: 1604.04795)
Abstract
Many Web applications require efficient querying of large Knowledge Graphs (KGs). We propose KOGNAC, a dictionary-encoding algorithm designed to improve SPARQL querying with a judicious combination of statistical and semantic techniques. In KOGNAC, frequent terms are detected with a frequency approximation algorithm and encoded to maximise compression. Infrequent terms are semantically grouped into ontological classes and encoded to increase data locality. We evaluated KOGNAC in combination with state-of-the-art RDF engines, and observed that it significantly improves SPARQL querying on KGs with up to 1B edges.
Export
BibTeX
@online{Urbani2016, TITLE = {{KOGNAC}: Efficient Encoding of Large Knowledge Graphs}, AUTHOR = {Urbani, Jacopo and Dutta, Sourav and Gurajada, Sairam and Weikum, Gerhard}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1604.04795}, EPRINT = {1604.04795}, EPRINTTYPE = {arXiv}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Many Web applications require efficient querying of large Knowledge Graphs (KGs). We propose KOGNAC, a dictionary-encoding algorithm designed to improve SPARQL querying with a judicious combination of statistical and semantic techniques. In KOGNAC, frequent terms are detected with a frequency approximation algorithm and encoded to maximise compression. Infrequent terms are semantically grouped into ontological classes and encoded to increase data locality. We evaluated KOGNAC in combination with state-of-the-art RDF engines, and observed that it significantly improves SPARQL querying on KGs with up to 1B edges.}, }
Endnote
%0 Report %A Urbani, Jacopo %A Dutta, Sourav %A Gurajada, Sairam %A Weikum, Gerhard %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T KOGNAC: Efficient Encoding of Large Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-01C1-3 %U http://arxiv.org/abs/1604.04795 %D 2016 %X Many Web applications require efficient querying of large Knowledge Graphs (KGs). We propose KOGNAC, a dictionary-encoding algorithm designed to improve SPARQL querying with a judicious combination of statistical and semantic techniques. In KOGNAC, frequent terms are detected with a frequency approximation algorithm and encoded to maximise compression. Infrequent terms are semantically grouped into ontological classes and encoded to increase data locality. We evaluated KOGNAC in combination with state-of-the-art RDF engines, and observed that it significantly improves SPARQL querying on KGs with up to 1B edges. %K Computer Science, Artificial Intelligence, cs.AI
[212]
Y. Wang, Z. Ren, M. Theobald, M. Dylla, and G. de Melo, “Summary Generation for Temporal Extractions,” in Database and Expert Systems Applications (DEXA 2016), Porto, Portugal, 2016.
Export
BibTeX
@inproceedings{Wang_DEXA2016, TITLE = {Summary Generation for Temporal Extractions}, AUTHOR = {Wang, Yafang and Ren, Zhaochun and Theobald, Martin and Dylla, Maximilian and de Melo, Gerard}, LANGUAGE = {eng}, ISBN = {978-3-319-44402-4}, DOI = {10.1007/978-3-319-44403-1_23}, PUBLISHER = {Springer}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {Database and Expert Systems Applications (DEXA 2016)}, EDITOR = {Hartmann, Sven and Ma, Hui}, PAGES = {370--386}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9827}, ADDRESS = {Porto, Portugal}, }
Endnote
%0 Conference Proceedings %A Wang, Yafang %A Ren, Zhaochun %A Theobald, Martin %A Dylla, Maximilian %A de Melo, Gerard %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Summary Generation for Temporal Extractions : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-2DE2-F %R 10.1007/978-3-319-44403-1_23 %D 2016 %B 27th International Conference on Database and Expert Systems Application %Z date of event: 2016-09-05 - 2016-09-08 %C Porto, Portugal %B Database and Expert Systems Applications %E Hartmann, Sven; Ma, Hui %P 370 - 386 %I Springer %@ 978-3-319-44402-4 %B Lecture Notes in Computer Science %N 9827
[213]
G. Weikum, J. Hoffart, and F. Suchanek, “Ten Years of Knowledge Harvesting: Lessons and Challenges,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 39, no. 3, 2016.
Export
BibTeX
@article{Weikum_Hoffart_Suchanek2016, TITLE = {Ten Years of Knowledge Harvesting: {L}essons and Challenges}, AUTHOR = {Weikum, Gerhard and Hoffart, Johannes and Suchanek, Fabian}, LANGUAGE = {eng}, URL = {http://sites.computer.org/debull/A16sept/p41.pdf}, PUBLISHER = {IEEE Computer Society}, ADDRESS = {Los Alamitos, CA}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, JOURNAL = {Bulletin of the IEEE Computer Society Technical Committee on Data Engineering}, VOLUME = {39}, NUMBER = {3}, PAGES = {41--50}, }
Endnote
%0 Journal Article %A Weikum, Gerhard %A Hoffart, Johannes %A Suchanek, Fabian %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Télécom ParisTech %T Ten Years of Knowledge Harvesting: Lessons and Challenges : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A618-F %U http://sites.computer.org/debull/A16sept/p41.pdf %7 2016 %D 2016 %J Bulletin of the IEEE Computer Society Technical Committee on Data Engineering %V 39 %N 3 %& 41 %P 41 - 50 %I IEEE Computer Society %C Los Alamitos, CA
[214]
G. Weikum, “Die Abteilung Datenbanken und Informationssysteme am Max-Planck-Institut für Informatik,” Datenbank Spektrum, vol. 16, no. 1, 2016.
Export
BibTeX
@article{WeikumDBSpektrum2016, TITLE = {{Die Abteilung Datenbanken und Informationssysteme am Max-Planck-Institut f{\"u}r Informatik}}, AUTHOR = {Weikum, Gerhard}, LANGUAGE = {deu}, DOI = {10.1007/s13222-016-0211-z}, PUBLISHER = {Springer}, ADDRESS = {Berlin}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, JOURNAL = {Datenbank Spektrum}, VOLUME = {16}, NUMBER = {1}, PAGES = {77--82}, }
Endnote
%0 Journal Article %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Die Abteilung Datenbanken und Informationssysteme am Max-Planck-Institut für Informatik : %G deu %U http://hdl.handle.net/11858/00-001M-0000-002B-0194-B %R 10.1007/s13222-016-0211-z %7 2016 %D 2016 %J Datenbank Spektrum %V 16 %N 1 %& 77 %P 77 - 82 %I Springer %C Berlin
[215]
B. A. Wójciak, “Spaghetti: Finding Storylines in Large Collections of Documents,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{WojciakMSc2016, TITLE = {Spaghetti: Finding Storylines in Large Collections of Documents}, AUTHOR = {W{\'o}jciak, Beata Anna}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Wójciak, Beata Anna %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Spaghetti: Finding Storylines in Large Collections of Documents : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5F3F-3 %I Universität des Saarlandes %C Saarbrücken %D 2016 %V master %9 master
[216]
H. Wu, M. Sun, J. Vreeken, N. Tatti, C. North, and N. Ramakrishnan, “Interactive and Iterative Discovery of Entity Network Subgraphs,” 2016. [Online]. Available: http://arxiv.org/abs/1608.03889. (arXiv: 1608.03889)
Abstract
Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions.
Export
BibTeX
@online{Wu1608.03889, TITLE = {Interactive and Iterative Discovery of Entity Network Subgraphs}, AUTHOR = {Wu, Hao and Sun, Maoyuan and Vreeken, Jilles and Tatti, Nikolaj and North, Chris and Ramakrishnan, Naren}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1608.03889}, EPRINT = {1608.03889}, EPRINTTYPE = {arXiv}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions.}, }
Endnote
%0 Report %A Wu, Hao %A Sun, Maoyuan %A Vreeken, Jilles %A Tatti, Nikolaj %A North, Chris %A Ramakrishnan, Naren %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Interactive and Iterative Discovery of Entity Network Subgraphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A939-F %U http://arxiv.org/abs/1608.03889 %D 2016 %X Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions. %K cs.SI,Computer Science, Databases, cs.DB
[217]
H. Wu, Y. Ning, P. Chakraborty, J. Vreeken, N. Tatti, and N. Ramakrishnan, “Generating Realistic Synthetic Population Datasets,” 2016. [Online]. Available: http://arxiv.org/abs/1602.06844. (arXiv: 1602.06844)
Abstract
Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and on US census datasets, and demonstrate its feasibility using an epidemic simulation application.
Export
BibTeX
@online{Wu_arXiv2016, TITLE = {Generating Realistic Synthetic Population Datasets}, AUTHOR = {Wu, Hao and Ning, Yue and Chakraborty, Prithwish and Vreeken, Jilles and Tatti, Nikolaj and Ramakrishnan, Naren}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1602.06844}, EPRINT = {1602.06844}, EPRINTTYPE = {arXiv}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, ABSTRACT = {Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and on US census datasets, and demonstrate its feasibility using an epidemic simulation application.}, }
Endnote
%0 Report %A Wu, Hao %A Ning, Yue %A Chakraborty, Prithwish %A Vreeken, Jilles %A Tatti, Nikolaj %A Ramakrishnan, Naren %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Generating Realistic Synthetic Population Datasets : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-08F9-B %U http://arxiv.org/abs/1602.06844 %D 2016 %X Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and on US census datasets, and demonstrate its feasibility using an epidemic simulation application. %K Computer Science, Databases, cs.DB
[218]
M. Yahya, D. Barbosa, K. Berberich, Q. Wang, and G. Weikum, “Relationship Queries on Extended Knowledge Graphs,” in WSDM’16, 9th ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, 2016.
Export
BibTeX
@inproceedings{YahyaWSDM2016, TITLE = {Relationship Queries on Extended Knowledge Graphs}, AUTHOR = {Yahya, Mohamed and Barbosa, Denilson and Berberich, Klaus and Wang, Quiyue and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-3716-8}, DOI = {10.1145/2835776.2835795}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WSDM'16, 9th ACM International Conference on Web Search and Data Mining}, PAGES = {605--614}, ADDRESS = {San Francisco, CA, USA}, }
Endnote
%0 Conference Proceedings %A Yahya, Mohamed %A Barbosa, Denilson %A Berberich, Klaus %A Wang, Quiyue %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Relationship Queries on Extended Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-ABAA-0 %R 10.1145/2835776.2835795 %D 2016 %B 9th ACM International Conference on Web Search and Data Mining %Z date of event: 2016-02-22 - 2016-02-25 %C San Francisco, CA, USA %B WSDM'16 %P 605 - 614 %I ACM %@ 978-1-4503-3716-8
[219]
M. Yahya, “Question Answering and Query Processing for Extended Knowledge Graphs,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@phdthesis{yahyaphd2016, TITLE = {Question Answering and Query Processing for Extended Knowledge Graphs}, AUTHOR = {Yahya, Mohamed}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Yahya, Mohamed %Y Weikum, Gerhard %A referee: Schütze, Hinrich %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Question Answering and Query Processing for Extended Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-48C2-7 %I Universität des Saarlandes %C Saarbrücken %D 2016 %P x, 160 p. %V phd %9 phd %U http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=dehttp://scidok.sulb.uni-saarland.de/volltexte/2016/6476/
[220]
M. Yahya, K. Berberich, M. Ramanath, and G. Weikum, “Exploratory Querying of Extended Knowledge Graphs,” Proceedings of the VLDB Endowment (Proc. VLDB 2016), vol. 9, no. 1, 2016.
Export
BibTeX
@article{YahyaVLDB2016, TITLE = {Exploratory Querying of Extended Knowledge Graphs}, AUTHOR = {Yahya, Mohamed and Berberich, Klaus and Ramanath, Maya and Weikum, Gerhard}, LANGUAGE = {eng}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, JOURNAL = {Proceedings of the VLDB Endowment (Proc. VLDB)}, VOLUME = {9}, NUMBER = {1}, PAGES = {1521--1524}, BOOKTITLE = {Proceedings of the 42nd International Conference on Very Large Data Bases (VLDB 2016)}, EDITOR = {Chaudhuri, Surajit and Haritsa, Jayant}, }
Endnote
%0 Journal Article %A Yahya, Mohamed %A Berberich, Klaus %A Ramanath, Maya %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Exploratory Querying of Extended Knowledge Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A61C-7 %7 2016 %D 2016 %J Proceedings of the VLDB Endowment %O PVLDB %V 9 %N 1 %& 1521 %P 1521 - 1524 %I ACM %C New York, NY %B Proceedings of the 42nd International Conference on Very Large Data Bases %O VLDB 2016 New Delhi, India, September 5 - 9, 2016 %U http://www.vldb.org/pvldb/vol9/p1521-yahya.pdf
[221]
L. Zervakis, C. Tryfonopoulos, V. Setty, S. Seufert, and S. Skiadopoulos, “Towards Publish/Subscribe Functionality on Graphs,” in Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, Bordeaux, France, 2016.
Export
BibTeX
@inproceedings{DBLP:conf/edbt/ZervakisTSSS16, TITLE = {Towards Publish/Subscribe Functionality on Graphs}, AUTHOR = {Zervakis, Lefteris and Tryfonopoulos, Christos and Setty, Vinay and Seufert, Stephan and Skiadopoulos, Spiros}, LANGUAGE = {eng}, ISSN = {1613-0073}, URL = {urn:nbn:de:0074-1558-2}, PUBLISHER = {CEUR-WS.org}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference}, EDITOR = {Palpanas, Thermis and Stefanidis, Kostas}, EID = {13}, SERIES = {CEUR Workshop Proceedings}, VOLUME = {1558}, ADDRESS = {Bordeaux, France}, }
Endnote
%0 Conference Proceedings %A Zervakis, Lefteris %A Tryfonopoulos, Christos %A Setty, Vinay %A Seufert, Stephan %A Skiadopoulos, Spiros %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Towards Publish/Subscribe Functionality on Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-1CAE-1 %D 2016 %B 2nd International Workshop on Preservation of Evolving Big Data %Z date of event: 2016-03-15 - 2016-03-15 %C Bordeaux, France %B Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference %E Palpanas, Thermis; Stefanidis, Kostas %Z sequence number: 13 %I CEUR-WS.org %B CEUR Workshop Proceedings %N 1558 %@ false
[222]
H. Zhang and V. Setty, “Finding Diverse Needles in a Haystack of Comments -- Social Media Exploration for News,” in WebSci’16, ACM Web Science Conference, Hannover, Germany, 2016.
Export
BibTeX
@inproceedings{ZhangWebSci2016, TITLE = {Finding Diverse Needles in a Haystack of Comments -- Social Media Exploration for News}, AUTHOR = {Zhang, Hang and Setty, Vinay}, LANGUAGE = {eng}, ISBN = {978-1-4503-4208-7}, DOI = {10.1145/2908131.2908168}, PUBLISHER = {ACM}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, BOOKTITLE = {WebSci'16, ACM Web Science Conference}, PAGES = {286--290}, ADDRESS = {Hannover, Germany}, }
Endnote
%0 Conference Proceedings %A Zhang, Hang %A Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Finding Diverse Needles in a Haystack of Comments -- Social Media Exploration for News : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-020A-C %R 10.1145/2908131.2908168 %D 2016 %B ACM Web Science Conference %Z date of event: 2016-05-22 - 2016-05-25 %C Hannover, Germany %B WebSci'16 %P 286 - 290 %I ACM %@ 978-1-4503-4208-7
[223]
H. Zhang, “Diversified Social Media Retrieval for News Stories,” Universität des Saarlandes, Saarbrücken, 2016.
Export
BibTeX
@mastersthesis{ZhangMSc2016, TITLE = {Diversified Social Media Retrieval for News Stories}, AUTHOR = {Zhang, Hang}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2016}, MARGINALMARK = {$\bullet$}, DATE = {2016}, }
Endnote
%0 Thesis %A Zhang, Hang %Y Neumann, Günther %A referee: Weikum, Gerhard %A referee: Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Diversified Social Media Retrieval for News Stories : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-48D3-E %I Universität des Saarlandes %C Saarbrücken %D 2016 %V master %9 master
2015
[224]
S. Abiteboul, L. Dong, O. Etzioni, D. Srivastava, G. Weikum, J. Stoyanovich, and F. M. Suchanek, “The Elephant in the Room: Getting Value from Big Data,” in Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015), Melbourne, Australia, 2015.
Export
BibTeX
@inproceedings{AbiteboulWebDB2015, TITLE = {The Elephant in the Room: {G}etting Value from {Big Data}}, AUTHOR = {Abiteboul, Serge and Dong, Luna and Etzioni, Oren and Srivastava, Divesh and Weikum, Gerhard and Stoyanovich, Julia and Suchanek, Fabian M.}, LANGUAGE = {eng}, ISBN = {978-1-4503-3627-7}, DOI = {10.1145/2767109.2770014}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {Proceedings of the 18th International Workshop on Web and Databases (WebDB 2015)}, EDITOR = {Stoyanovich, Julia and Suchanek, Fabian M.}, PAGES = {1--5}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Abiteboul , Serge %A Dong, Luna %A Etzioni, Oren %A Srivastava, Divesh %A Weikum, Gerhard %A Stoyanovich, Julia %A Suchanek, Fabian M. %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Télécom ParisTech %T The Elephant in the Room: Getting Value from Big Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0027-D3F2-F %R 10.1145/2767109.2770014 %D 2015 %B 18th International Workshop on the Web and Databases %Z date of event: 2015-05-31 - 2015-05-31 %C Melbourne, Australia %B Proceedings of the 18th International Workshop on Web and Databases %E Stoyanovich, Julia; Suchanek, Fabian M. %P 1 - 5 %I ACM %@ 978-1-4503-3627-7
[225]
A. Abujabal and K. Berberich, “Important Events in the Past, Present, and Future,” in WWW’15 Companion, Florence, Italy, 2015.
Export
BibTeX
@inproceedings{AbjuabalWWW2015, TITLE = {Important Events in the Past, Present, and Future}, AUTHOR = {Abujabal, Abdalghani and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-1-4503-3473-0}, DOI = {10.1145/2740908.2741692}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {WWW'15 Companion}, PAGES = {1315--1320}, ADDRESS = {Florence, Italy}, }
Endnote
%0 Conference Proceedings %A Abujabal, Abdalghani %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Important Events in the Past, Present, and Future : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-E33A-8 %R 10.1145/2740908.2741692 %D 2015 %B 24th International Conference on World Wide Web %Z date of event: 2015-04-18 - 2015-04-22 %C Florence, Italy %B WWW'15 Companion %P 1315 - 1320 %I ACM %@ 978-1-4503-3473-0
[226]
A. Abujabal, “Mining Past, Present, and Future,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{AbujabalMaster2015, TITLE = {Mining Past, Present, and Future}, AUTHOR = {Abujabal, Abdalghani}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Abujabal, Abdalghani %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Mining Past, Present, and Future : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0025-A974-2 %I Universität des Saarlandes %C Saarbrücken %D 2015 %P XII, 86 p. %V master %9 master
[227]
A. Anagnostopoulos, L. Becchetti, I. Bordino, S. Leonardi, I. Mele, and P. Sankowski, “Stochastic Query Covering for Fast Approximate Document Retrieval,” ACM Transactions on Information Systems, vol. 33, no. 3, 2015.
Export
BibTeX
@article{Anagnostopoulos:TOIS, TITLE = {Stochastic Query Covering for Fast Approximate Document Retrieval}, AUTHOR = {Anagnostopoulos, Aris and Becchetti, Luca and Bordino, Ilaria and Leonardi, Stefano and Mele, Ida and Sankowski, Piotr}, LANGUAGE = {eng}, ISSN = {1046-8188}, DOI = {10.1145/2699671}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, JOURNAL = {ACM Transactions on Information Systems}, VOLUME = {33}, NUMBER = {3}, PAGES = {1--35}, EID = {11}, }
Endnote
%0 Journal Article %A Anagnostopoulos, Aris %A Becchetti, Luca %A Bordino, Ilaria %A Leonardi, Stefano %A Mele, Ida %A Sankowski, Piotr %+ External Organizations External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Stochastic Query Covering for Fast Approximate Document Retrieval : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-B6C7-2 %R 10.1145/2699671 %7 2015 %D 2015 %J ACM Transactions on Information Systems %O TOIS %V 33 %N 3 %& 1 %P 1 - 35 %Z sequence number: 11 %I ACM %C New York, NY %@ false
[228]
A. Anagnostopoulos, L. Becchetti, A. Fazzone, I. Mele, and M. Riondato, “The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing,” in SIGMOD’15, ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 2015.
Export
BibTeX
@inproceedings{Anagnostopoulos:SIGMOD2015, TITLE = {The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing}, AUTHOR = {Anagnostopoulos, Aris and Becchetti, Luca and Fazzone, Adriano and Mele, Ida and Riondato, Matteo}, LANGUAGE = {eng}, ISBN = {978-1-4503-2758-9}, DOI = {10.1145/2723372.2723722}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {SIGMOD'15, ACM SIGMOD International Conference on Management of Data}, PAGES = {983--998}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Anagnostopoulos, Aris %A Becchetti, Luca %A Fazzone, Adriano %A Mele, Ida %A Riondato, Matteo %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-B6BE-7 %R 10.1145/2723372.2723722 %D 2015 %B ACM SIGMOD International Conference on Management of Data %Z date of event: 2015-05-31 - 2015-06-04 %C Melbourne, Australia %B SIGMOD'15 %P 983 - 998 %I ACM %@ 978-1-4503-2758-9
[229]
K. Balog, J. Dalton, A. Doucet, and Y. Ibrahim, Eds., ESAIR’15. ACM, 2015.
Export
BibTeX
@proceedings{Balog:2015:2810133, TITLE = {ESAIR'15, Eight Workshop on Exploiting Semantic Annotations in Information Retrieval}, EDITOR = {Balog, Krisztian and Dalton, Jeffrey and Doucet, Antoine and Ibrahim, Yusra}, LANGUAGE = {eng}, ISBN = {978-1-4503-3790-8}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %E Balog, Krisztian %E Dalton, Jeffrey %E Doucet, Antoine %E Ibrahim, Yusra %+ External Organizations External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T ESAIR'15 : Proceedings of the 2015 Workshop on Exploiting Semantic Annotations in Information Retrieval %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-4869-B %@ 978-1-4503-3790-8 %I ACM %D 2015 %B Eight Workshop on Exploiting Semantic Annotations in Information Retrieval %Z date of event: 2015-10-23 - 2015-10-23 %D 2015 %C Melbourne, Australia
[230]
H. R. Bazoobandi, S. de Rooij, J. Urbani, A. ten Teije, F. van Harmelen, and H. Bal, “A Compact In-Memory Dictionary for RDF Data,” in The Semantic Web (ESWC 2015), Portorož, Slovenia, 2015.
Export
BibTeX
@inproceedings{Urbanilncs15, TITLE = {A Compact In-Memory Dictionary for {RDF} Data}, AUTHOR = {Bazoobandi, Hamid R. and de Rooij, Steve and Urbani, Jacopo and ten Teije, Annette and van Harmelen, Frank and Bal, Henri}, LANGUAGE = {eng}, ISBN = {978-3-319-18817-1}, DOI = {10.1007/978-3-319-18818-8_13}, PUBLISHER = {Springer}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {The Semantic Web (ESWC 2015)}, EDITOR = {Gandon, Fabien and Sabou, Marta and Sack, Harald and d'Amato, Claudia and Cudr{\'e}-Mauroux, Philippe and Zimmermann, Antoine}, PAGES = {205--220}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9088}, ADDRESS = {Portoro{\v z}, Slovenia}, }
Endnote
%0 Conference Proceedings %A Bazoobandi, Hamid R. %A de Rooij, Steve %A Urbani, Jacopo %A ten Teije, Annette %A van Harmelen, Frank %A Bal, Henri %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T A Compact In-Memory Dictionary for RDF Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0028-F1A6-9 %R 10.1007/978-3-319-18818-8_13 %D 2015 %B 12th European Semantic Web Conference %Z date of event: 2015-05-31 - 2015-06-04 %C Portorož, Slovenia %B The Semantic Web %E Gandon, Fabien; Sabou, Marta; Sack, Harald; d'Amato, Claudia; Cudré-Mauroux, Philippe; Zimmermann, Antoine %P 205 - 220 %I Springer %@ 978-3-319-18817-1 %B Lecture Notes in Computer Science %N 9088
[231]
K. Beedkar, K. Berberich, R. Gemulla, and I. Miliaraki, “Closing the Gap: Sequence Mining at Scale,” ACM Transactions on Database Systems, vol. 40, no. 2, 2015.
Export
BibTeX
@article{DBLP:journals/tods/BeedkarBGM15, TITLE = {Closing the Gap: {S}equence Mining at Scale}, AUTHOR = {Beedkar, Kaustubh and Berberich, Klaus and Gemulla, Rainer and Miliaraki, Iris}, LANGUAGE = {eng}, ISSN = {0362-5915}, DOI = {10.1145/2757217}, PUBLISHER = {ACM}, ADDRESS = {New York, NY}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, JOURNAL = {ACM Transactions on Database Systems}, VOLUME = {40}, NUMBER = {2}, PAGES = {1--44}, EID = {8}, }
Endnote
%0 Journal Article %A Beedkar, Kaustubh %A Berberich, Klaus %A Gemulla, Rainer %A Miliaraki, Iris %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Closing the Gap: Sequence Mining at Scale : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-5712-1 %R 10.1145/2757217 %7 2015 %D 2015 %J ACM Transactions on Database Systems %V 40 %N 2 %& 1 %P 1 - 44 %Z sequence number: 8 %I ACM %C New York, NY %@ false
[232]
A. Biswas, “Retrieving Web User’s Personal Information,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{BiswasMSc2015, TITLE = {Retrieving Web User's Personal Information}, AUTHOR = {Biswas, Angeeka}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Biswas, Angeeka %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T Retrieving Web User's Personal Information : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-48D1-1 %I Universität des Saarlandes %C Saarbrücken %D 2015 %P VIII, 30 p. %V master %9 master
[233]
K. Budhathoki and J. Vreeken, “The Difference and the Norm - Characterising Similarities and Differences Between Databases,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), Porto, Portugal, 2015.
Export
BibTeX
@inproceedings{BudhathokiECML2015, TITLE = {The Difference and the Norm -- Characterising Similarities and Differences Between Databases}, AUTHOR = {Budhathoki, Kailash and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-3-319-23524-0}, DOI = {10.1007/978-3-319-23525-7_13}, PUBLISHER = {Springer}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015)}, EDITOR = {Appice, Annalisa and Pereira Rodrigues, Pedro and Gama, Jo{\~a}o and Al{\'i}pio, Jorge and Soares, Carlos}, PAGES = {206--223}, SERIES = {Lecture Notes in Artificial Intelligence}, VOLUME = {9285}, ADDRESS = {Porto, Portugal}, }
Endnote
%0 Conference Proceedings %A Budhathoki, Kailash %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T The Difference and the Norm - Characterising Similarities and Differences Between Databases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-2271-F %R 10.1007/978-3-319-23525-7_13 %D 2015 %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases %Z date of event: 2015-09-07 - 2015-09-11 %C Porto, Portugal %B Machine Learning and Knowledge Discovery in Databases %E Appice, Annalisa; Pereira Rodrigues, Pedro; Gama, João; Alípio, Jorge; Soares, Carlos %P 206 - 223 %I Springer %@ 978-3-319-23524-0 %B Lecture Notes in Artificial Intelligence %N 9285
[234]
K. Budhathoki, “Correlation by Compression,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{BudhathokiMaster2015, TITLE = {Correlation by Compression}, AUTHOR = {Budhathoki, Kailash}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Budhathoki, Kailash %Y Vreeken, Jilles %A referee: Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Correlation by Compression : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-0753-D %I Universität des Saarlandes %C Saarbrücken %D 2015 %P X, 56 p. %V master %9 master
[235]
K. Chakrabarti, “K-Shortest Paths with Overlap Constraints,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{ChakrabartiMSc2015, TITLE = {K-Shortest Paths with Overlap Constraints}, AUTHOR = {Chakrabarti, Kaustuv}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Chakrabarti, Kaustuv %Y Weikum, Gerhard %A referee: Setty, Vinay %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T K-Shortest Paths with Overlap Constraints : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-43A5-D %I Universität des Saarlandes %C Saarbrücken %D 2015 %P 54 p. %V master %9 master
[236]
P. Chau, J. Vreeken, M. van Leeuwen, and C. Faloutsos, Eds., Proceedings of the ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics. 2015.
Export
BibTeX
@proceedings{chau:15:idea, TITLE = {Proceedings of the ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics (IDEA 2015)}, EDITOR = {Chau, Polo and Vreeken, Jilles and van Leeuwen, Matthijs and Faloutsos, Christos}, LANGUAGE = {eng}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, PAGES = {72 p.}, ADDRESS = {Sydney, Australia}, }
Endnote
%0 Conference Proceedings %E Chau, Polo %E Vreeken, Jilles %E van Leeuwen, Matthijs %E Faloutsos, Christos %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Proceedings of the ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-578A-0 %D 2015 %B ACM SIGKDD 2015 Full-day Workshop on Interactive Data Exploration and Analytics %Z date of event: 2015-08-10 - 2014-08-10 %D 2015 %C Sydney, Australia %P 72 p. %U http://poloclub.gatech.edu/idea2015/papers/idea15-proceedings.pdf
[237]
D. Dedik, “Robust Type Classification of Out of Knowledge Base Entities,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{DedikMaster2015, TITLE = {Robust Type Classification of Out of Knowledge Base Entities}, AUTHOR = {Dedik, Darya}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Dedik, Darya %Y Weikum, Gerhard %A referee: Spaniol, Marc %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Robust Type Classification of Out of Knowledge Base Entities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0026-C0EC-F %I Universität des Saarlandes %C Saarbrücken %D 2015 %P 65 p. %V master %9 master
[238]
L. Del Corro, A. Abujabal, R. Gemulla, and G. Weikum, “FINET: Context-Aware Fine-Grained Named Entity Typing,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 2015.
Export
BibTeX
@inproceedings{delcorro-EtAl:2015:EMNLP, TITLE = {{FINET}: {C}ontext-Aware Fine-Grained Named Entity Typing}, AUTHOR = {Del Corro, Luciano and Abujabal, Abdalghani and Gemulla, Rainer and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-941643-32-7}, URL = {https://aclweb.org/anthology/D/D15/D15-1103}, PUBLISHER = {ACL}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)}, PAGES = {868--878}, ADDRESS = {Lisbon, Portugal}, }
Endnote
%0 Conference Proceedings %A Del Corro, Luciano %A Abujabal, Abdalghani %A Gemulla, Rainer %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T FINET: Context-Aware Fine-Grained Named Entity Typing : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-49C3-C %U https://aclweb.org/anthology/D/D15/D15-1103 %D 2015 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2015-09-17 - 2015-09-21 %C Lisbon, Portugal %B Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing %P 868 - 878 %I ACL %@ 978-1-941643-32-7 %U https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/EMNLP/pdf/EMNLP103.pdf
[239]
S. Dutta, S. Bhattacherjee, and A. Narang, “Mining Wireless Intelligence using Unsupervised Edge and Core Analytics,” in 2nd Workshop on Smarter Planet and Big Data Analytics, Goa, Indien. (Accepted/in press)
Export
BibTeX
@inproceedings{SouSPBDA2015, TITLE = {Mining Wireless Intelligence using Unsupervised Edge and Core Analytics}, AUTHOR = {Dutta, Sourav and Bhattacherjee, Souvik and Narang, Ankur}, LANGUAGE = {eng}, YEAR = {2015}, PUBLREMARK = {Accepted}, BOOKTITLE = {2nd Workshop on Smarter Planet and Big Data Analytics}, ADDRESS = {Goa, Indien}, }
Endnote
%0 Conference Proceedings %A Dutta, Sourav %A Bhattacherjee, Souvik %A Narang, Ankur %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Mining Wireless Intelligence using Unsupervised Edge and Core Analytics : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-54B5-0 %D 2014 %B 2nd Workshop on Smarter Planet and Big Data Analytics %Z date of event: 2015-01-04 - 2015-01-07 %C Goa, Indien %B 2nd Workshop on Smarter Planet and Big Data Analytics
[240]
S. Dutta, “MIST: Top-k Approximate Sub-String Mining using Triplet Statistical Significance,” in Advances in Information Retrieval (ECIR 2015), Vienna, Austria, 2015.
Export
BibTeX
@inproceedings{SouECIR2015, TITLE = {{MIST}: Top-k Approximate Sub-String Mining using Triplet Statistical Significance}, AUTHOR = {Dutta, Sourav}, LANGUAGE = {eng}, ISBN = {978-3-319-16353-6}, DOI = {10.1007/978-3-319-16354-3_31}, PUBLISHER = {Springer}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {Advances in Information Retrieval (ECIR 2015)}, EDITOR = {Hanbury, Allan and Kazai, Gabriella and Rauber, Andreas and Fuhr, Norbert}, PAGES = {284--290}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9022}, ADDRESS = {Vienna, Austria}, }
Endnote
%0 Conference Proceedings %A Dutta, Sourav %+ Databases and Information Systems, MPI for Informatics, Max Planck Society %T MIST: Top-k Approximate Sub-String Mining using Triplet Statistical Significance : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-54B2-5 %R 10.1007/978-3-319-16354-3_31 %D 2015 %B 37th European Conference on Information Retrieval %Z date of event: 2015-03-29 - 2015-04-02 %C Vienna, Austria %B Advances in Information Retrieval %E Hanbury, Allan; Kazai, Gabriella; Rauber, Andreas; Fuhr, Norbert %P 284 - 290 %I Springer %@ 978-3-319-16353-6 %B Lecture Notes in Computer Science %N 9022
[241]
S. Dutta and G. Weikum, “Cross-document Co-reference Resolution using Sample-based Clustering with Knowledge Enrichment,” Transactions of the Association for Computational Linguistics, vol. 3, 2015.
Export
BibTeX
@article{SouTACL2015, TITLE = {Cross-document Co-reference Resolution using Sample-based Clustering with Knowledge Enrichment}, AUTHOR = {Dutta, Sourav and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {2307-387X}, PUBLISHER = {ACL}, ADDRESS = {Stroudsbourg, PA}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, JOURNAL = {Transactions of the Association for Computational Linguistics}, VOLUME = {3}, PAGES = {15--28}, }
Endnote
%0 Journal Article %A Dutta, Sourav %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Cross-document Co-reference Resolution using Sample-based Clustering with Knowledge Enrichment : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-54B7-C %7 2015 %D 2015 %J Transactions of the Association for Computational Linguistics %O TACL %V 3 %& 15 %P 15 - 28 %I ACL %C Stroudsbourg, PA %@ false
[242]
S. Dutta, A. Narang, and S. Bhattacherjee, “Predictive Caching Framework for Mobile Wireless Networks,” in MDM 2015, 16th International Conference on Mobile Data Management, Pittsburgh, PA, USA, 2015.
Export
BibTeX
@inproceedings{DuttaMDM2015, TITLE = {Predictive Caching Framework for Mobile Wireless Networks}, AUTHOR = {Dutta, Sourav and Narang, Ankur and Bhattacherjee, Souvik}, LANGUAGE = {eng}, ISBN = {978-1-4799-9972-9}, DOI = {10.1109/MDM.2015.14}, PUBLISHER = {IEEE}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {MDM 2015, 16th International Conference on Mobile Data Management}, PAGES = {179--184}, ADDRESS = {Pittsburgh, PA, USA}, }
Endnote
%0 Conference Proceedings %A Dutta, Sourav %A Narang, Ankur %A Bhattacherjee, Souvik %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Predictive Caching Framework for Mobile Wireless Networks : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002B-A5AF-3 %R 10.1109/MDM.2015.14 %D 2015 %B 16th International Conference on Mobile Data Management %Z date of event: 2015-06-15 - 2015-06-18 %C Pittsburgh, PA, USA %B MDM 2015 %P 179 - 184 %I IEEE %@ 978-1-4799-9972-9
[243]
S. Dutta and G. Weikum, “C3EL: A Joint Model for Cross-Document Co-Reference Resolution and Entity Linking,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 2015.
Export
BibTeX
@inproceedings{dutta-weikum:2015:EMNLP, TITLE = {{C3EL}: {A} Joint Model for Cross-Document Co-Reference Resolution and Entity Linking}, AUTHOR = {Dutta, Sourav and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-941643-32-7}, URL = {https://aclweb.org/anthology/D/D15/D15-1101}, PUBLISHER = {ACL}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)}, PAGES = {846--856}, ADDRESS = {Lisbon, Portugal}, }
Endnote
%0 Conference Proceedings %A Dutta, Sourav %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T C3EL: A Joint Model for Cross-Document Co-Reference Resolution and Entity Linking : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-49C1-0 %U https://aclweb.org/anthology/D/D15/D15-1101 %D 2015 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2015-09-17 - 2015-09-21 %C Lisbon, Portugal %B Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing %P 846 - 856 %I ACL %@ 978-1-941643-32-7 %U https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/EMNLP/pdf/EMNLP101.pdf
[244]
P. Ernst, A. Siu, and G. Weikum, “KnowLife: A Versatile Approach for Constructing a Large Knowledge Graph for Biomedical Sciences,” BMC Bioinformatics, vol. 16, no. 1, 2015.
Export
BibTeX
@article{ErnstSiuWeikum2015, TITLE = {{KnowLife}: A Versatile Approach for Constructing a Large Knowledge Graph for Biomedical Sciences}, AUTHOR = {Ernst, Patrick and Siu, Amy and Weikum, Gerhard}, LANGUAGE = {eng}, ISSN = {1471-2105}, URL = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4448285&tool=pmcentrez&rendertype=abstract}, DOI = {10.1186/s12859-015-0549-5}, PUBLISHER = {BioMed Central}, ADDRESS = {London}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, JOURNAL = {BMC Bioinformatics}, VOLUME = {16}, NUMBER = {1}, EID = {157}, }
Endnote
%0 Journal Article %A Ernst, Patrick %A Siu, Amy %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T KnowLife: A Versatile Approach for Constructing a Large Knowledge Graph for Biomedical Sciences : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0027-7AB7-0 %F OTHER: pmcidPMC4448285 %F OTHER: pmc-uid4448285 %F OTHER: publisher-id549 %R 10.1186/s12859-015-0549-5 %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4448285&tool=pmcentrez&rendertype=abstract %7 2015-05-14 %D 2015 %8 14.05.2015 %K Relation extraction %J BMC Bioinformatics %V 16 %N 1 %Z sequence number: 157 %I BioMed Central %C London %@ false
[245]
M. Gad-Elrab, “AIDArabic+ Named Entity Disambiguation for Arabic Text,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{Gad-ElrabMaster2015, TITLE = {{AIDArabic}+ Named Entity Disambiguation for Arabic Text}, AUTHOR = {Gad-Elrab, Mohamed}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Gad-Elrab, Mohamed %Y Weikum, Gerhard %A referee: Berberich, Klaus %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T AIDArabic+ Named Entity Disambiguation for Arabic Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-0F70-5 %I Universität des Saarlandes %C Saarbrücken %D 2015 %P 56 p. %V master %9 master
[246]
M. H. Gad-Elrab, M. A. Yosef, and G. Weikum, “Named Entity Disambiguation for Resource-poor Languages,” in ESAIR’15, Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, Melbourne, Australia, 2015.
Export
BibTeX
@inproceedings{Gad-ElrabESAIR2015, TITLE = {Named Entity Disambiguation for Resource-poor Languages}, AUTHOR = {Gad-Elrab, Mohamed H. and Yosef, Mohamed Amir and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-3790-8}, DOI = {10.1145/2810133.2810138}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {ESAIR'15, Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval}, EDITOR = {Alonso, Omar and Kamps, Jaap and Karlgren, Jussi}, PAGES = {29--34}, ADDRESS = {Melbourne, Australia}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed H. %A Yosef, Mohamed Amir %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Named Entity Disambiguation for Resource-poor Languages : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-077F-B %R 10.1145/2810133.2810138 %D 2015 %B Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval %Z date of event: 2015-10-23 - 2015-10-23 %C Melbourne, Australia %B ESAIR'15 %E Alonso, Omar; Kamps, Jaap; Karlgren, Jussi %P 29 - 34 %I ACM %@ 978-1-4503-3790-8
[247]
M. H. Gad-Elrab, M. A. Yosef, and G. Weikum, “EDRAK: Entity-Centric Data Resource for Arabic Knowledge,” in The Second Workshop on Arabic Natural Language Processing (ANLP 2015), Beijing, China, 2015.
Export
BibTeX
@inproceedings{Gad-ElrabAnLP2015, TITLE = {{EDRAK}: {E}ntity-Centric Data Resource for {Arabic} Knowledge}, AUTHOR = {Gad-Elrab, Mohamed H. and Yosef, Mohamed Amir and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-941643-58-7}, PUBLISHER = {ACL}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, BOOKTITLE = {The Second Workshop on Arabic Natural Language Processing (ANLP 2015)}, PAGES = {191--200}, ADDRESS = {Beijing, China}, }
Endnote
%0 Conference Proceedings %A Gad-Elrab, Mohamed H. %A Yosef, Mohamed Amir %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T EDRAK: Entity-Centric Data Resource for Arabic Knowledge : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002A-0773-3 %D 2015 %B The Second Workshop on Arabic Natural Language Processing %Z date of event: 2015-07-26 - 2015-07-31 %C Beijing, China %B The Second Workshop on Arabic Natural Language Processing %P 191 - 200 %I ACL %@ 978-1-941643-58-7
[248]
L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek, “Fast Rule Mining in Ontological Knowledge Bases with AMIE+,” The VLDB Journal, vol. 24, no. 6, 2015.
Export
BibTeX
@article{Galarrag2015, TITLE = {Fast Rule Mining in Ontological Knowledge Bases with {AMIE}+}, AUTHOR = {Gal{\'a}rraga, Luis and Teflioudi, Christina and Hose, Katja and Suchanek, Fabian M.}, LANGUAGE = {eng}, ISSN = {1066-8888}, DOI = {10.1007/s00778-015-0394-1}, PUBLISHER = {Springer}, ADDRESS = {Berlin}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, JOURNAL = {The VLDB Journal}, VOLUME = {24}, NUMBER = {6}, PAGES = {707--730}, }
Endnote
%0 Journal Article %A Galárraga, Luis %A Teflioudi, Christina %A Hose, Katja %A Suchanek, Fabian M. %+ External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Télécom ParisTech %T Fast Rule Mining in Ontological Knowledge Bases with AMIE+ : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-3510-3 %R 10.1007/s00778-015-0394-1 %7 2015 %D 2015 %J The VLDB Journal %V 24 %N 6 %& 707 %P 707 - 730 %I Springer %C Berlin %@ false
[249]
J. Geiß, A. Spitz, J. Strötgen, and M. Gertz, “The Wikipedia Location Network - Overcoming Borders and Oceans,” in Proceedings of the 9th Workshop on Geographic Information Retrieval (GIR 2015), Paris, France, 2015.
Export
BibTeX
@inproceedings{GIR2015, TITLE = {The {Wikipedia} Location Network -- Overcoming Borders and Oceans}, AUTHOR = {Gei{\ss}, Johanna and Spitz, Andreas and Str{\"o}tgen, Jannik and Gertz, Michael}, LANGUAGE = {eng}, ISBN = {978-1-4503-3937-7}, DOI = {10.1145/2837689.2837694}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {Proceedings of the 9th Workshop on Geographic Information Retrieval (GIR 2015)}, EDITOR = {Purves, Ross S. and Jones, Christopher B.}, PAGES = {1--3}, EID = {2}, ADDRESS = {Paris, France}, }
Endnote
%0 Conference Proceedings %A Geiß, Johanna %A Spitz, Andreas %A Strötgen, Jannik %A Gertz, Michael %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T The Wikipedia Location Network - Overcoming Borders and Oceans : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-216D-0 %R 10.1145/2837689.2837694 %D 2015 %B 9th Workshop on Geographic Information Retrieval %Z date of event: 2015-11-26 - 2015-11-27 %C Paris, France %B Proceedings of the 9th Workshop on Geographic Information Retrieval %E Purves, Ross S.; Jones, Christopher B. %P 1 - 3 %Z sequence number: 2 %I ACM %@ 978-1-4503-3937-7
[250]
A. Grycner, G. Weikum, J. Pujara, J. Foulds, and L. Getoor, “RELLY: Inferring Hypernym Relationships Between Relational Phrases,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 2015.
Export
BibTeX
@inproceedings{grycner-EtAl:2015:EMNLP, TITLE = {{RELLY}: {I}nferring Hypernym Relationships Between Relational Phrases}, AUTHOR = {Grycner, Adam and Weikum, Gerhard and Pujara, Jay and Foulds, James and Getoor, Lise}, LANGUAGE = {eng}, ISBN = {978-1-941643-32-7}, URL = {http://aclweb.org/anthology/D15-1113}, PUBLISHER = {ACL}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)}, PAGES = {971--981}, ADDRESS = {Lisbon, Portugal}, }
Endnote
%0 Conference Proceedings %A Grycner, Adam %A Weikum, Gerhard %A Pujara, Jay %A Foulds, James %A Getoor, Lise %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T RELLY: Inferring Hypernym Relationships Between Relational Phrases : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-49B0-5 %U http://aclweb.org/anthology/D15-1113 %D 2015 %B Conference on Empirical Methods in Natural Language Processing %Z date of event: 2015-09-17 - 2015-09-21 %C Lisbon, Portugal %B Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing %P 971 - 981 %I ACL %@ 978-1-941643-32-7 %U https://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/EMNLP/pdf/EMNLP113.pdf
[251]
D. Gupta and K. Berberich, “Temporal Query Classification at Different Granularities,” in String Processing and Information Retrieval (SPIRE 2015), London, UK, 2015.
Export
BibTeX
@inproceedings{spire15-gupta, TITLE = {Temporal Query Classification at Different Granularities}, AUTHOR = {Gupta, Dhruv and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-23825-8}, DOI = {10.1007/978-3-319-23826-5_16}, PUBLISHER = {Springer}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {String Processing and Information Retrieval (SPIRE 2015)}, EDITOR = {Iliopoulos, Costas S. and Publisi, Simon J. and Yilmaz, Emine}, PAGES = {137--148}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9309}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Gupta, Dhruv %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Temporal Query Classification at Different Granularities : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-4249-D %R 10.1007/978-3-319-23826-5_16 %D 2015 %B 22nd International Symposium on String Processing and Information Retrieval %Z date of event: 2015-08-31 - 2015-09-02 %C London, UK %B String Processing and Information Retrieval %E Iliopoulos, Costas S.; Publisi, Simon J.; Yilmaz, Emine %P 137 - 148 %I Springer %@ 978-3-319-23825-8 %B Lecture Notes in Computer Science %N 9309
[252]
C. D. Hariman, “Part-Whole Commonsense Knowledge Harvesting from the Web,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{HarimanMaster2015, TITLE = {Part-Whole Commonsense Knowledge Harvesting from the Web}, AUTHOR = {Hariman, Charles Darwis}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Hariman, Charles Darwis %Y Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Part-Whole Commonsense Knowledge Harvesting from the Web : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0026-C0E6-C %I Universität des Saarlandes %C Saarbrücken %D 2015 %P 53 p. %V master %9 master
[253]
J. Hoffart, “Discovering and Disambiguating Named Entities in Text,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@phdthesis{Hoffartthesis, TITLE = {Discovering and Disambiguating Named Entities in Text}, AUTHOR = {Hoffart, Johannes}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Hoffart, Johannes %Y Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Discovering and Disambiguating Named Entities in Text : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0025-6C44-0 %I Universität des Saarlandes %C Saarbrücken %D 2015 %P X, 103 p. %V phd %9 phd %U http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=dehttp://scidok.sulb.uni-saarland.de/volltexte/2015/6022/
[254]
J. Hoffart, N. Preda, F. M. Suchanek, and G. Weikum, “Knowledge Bases for Web Content Analytics,” in WWW’15 Companion, Florence, Italy, 2015.
Export
BibTeX
@inproceedings{hoffart2015knowledgebases, TITLE = {Knowledge Bases for {Web} Content Analytics}, AUTHOR = {Hoffart, Johannes and Preda, Nicoleta and Suchanek, Fabian M. and Weikum, Gerhard}, LANGUAGE = {eng}, ISBN = {978-1-4503-3473-0}, DOI = {10.1145/2740908.2741984}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {WWW'15 Companion}, PAGES = {1535--1535}, ADDRESS = {Florence, Italy}, }
Endnote
%0 Conference Proceedings %A Hoffart, Johannes %A Preda, Nicoleta %A Suchanek, Fabian M. %A Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Knowledge Bases for Web Content Analytics : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0028-8E68-7 %R 10.1145/2740908.2741984 %D 2015 %B 24th International Conference on World Wide Web %Z date of event: 2015-05-18 - 2015-05-22 %C Florence, Italy %B WWW'15 Companion %P 1535 - 1535 %I ACM %@ 978-1-4503-3473-0
[255]
K. Hui and K. Berberich, “Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation,” in String Processing and Information Retrieval (SPIRE 2015), London, UK, 2015.
Export
BibTeX
@inproceedings{spire15-kaihui, TITLE = {Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation}, AUTHOR = {Hui, Kai and Berberich, Klaus}, LANGUAGE = {eng}, ISBN = {978-3-319-23825-8}, DOI = {10.1007/978-3-319-23826-5_14}, PUBLISHER = {Springer}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {String Processing and Information Retrieval (SPIRE 2015)}, EDITOR = {Iliopoulos, Costas S. and Publisi, Simon J. and Yilmaz, Emine}, PAGES = {137--148}, SERIES = {Lecture Notes in Computer Science}, VOLUME = {9309}, ADDRESS = {London, UK}, }
Endnote
%0 Conference Proceedings %A Hui, Kai %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0028-5DAA-5 %R 10.1007/978-3-319-23826-5_14 %D 2015 %B 22nd International Symposium on String Processing and Information Retrieval %Z date of event: 2015-08-31 - 2015-09-02 %C London, UK %B String Processing and Information Retrieval %E Iliopoulos, Costas S.; Publisi, Simon J.; Yilmaz, Emine %P 137 - 148 %I Springer %@ 978-3-319-23825-8 %B Lecture Notes in Computer Science %N 9309
[256]
S. Karaev, P. Miettinen, and J. Vreeken, “Getting to Know the Unknown Unknowns: Destructive-noise Resistant Boolean Matrix Factorization,” in Proceedings of the 2015 SIAM International Conference on Data Mining (SDM 2015), Vancouver, Canada, 2015.
Abstract
Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.
Export
BibTeX
@inproceedings{karaev15getting, TITLE = {Getting to Know the Unknown Unknowns: {D}estructive-noise Resistant {Boolean} Matrix Factorization}, AUTHOR = {Karaev, Sanjar and Miettinen, Pauli and Vreeken, Jilles}, LANGUAGE = {eng}, ISBN = {978-1-61197-401-0}, DOI = {10.1137/1.9781611974010.37}, PUBLISHER = {SIAM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, ABSTRACT = {Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.}, BOOKTITLE = {Proceedings of the 2015 SIAM International Conference on Data Mining (SDM 2015)}, EDITOR = {Venkatasubramanian, Suresh and Ye, Jieping}, PAGES = {325--333}, ADDRESS = {Vancouver, Canada}, }
Endnote
%0 Conference Proceedings %A Karaev, Sanjar %A Miettinen, Pauli %A Vreeken, Jilles %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Getting to Know the Unknown Unknowns: Destructive-noise Resistant Boolean Matrix Factorization : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-6C59-C %R 10.1137/1.9781611974010.37 %D 2015 %B 15th SIAM International Conference on Data Mining %Z date of event: 2015-04-30 - 2015-05-02 %C Vancouver, Canada %X Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data. %B Proceedings of the 2015 SIAM International Conference on Data Mining %E Venkatasubramanian, Suresh; Ye, Jieping %P 325 - 333 %I SIAM %@ 978-1-61197-401-0
[257]
A. Kopali, “Mitigation of Privacy Risk for Search Queries,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{KopaliMSc2015, TITLE = {Mitigation of Privacy Risk for Search Queries}, AUTHOR = {Kopali, Agim}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Kopali, Agim %Y Weikum, Gerhard %A referee: Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Mitigation of Privacy Risk for Search Queries : %G eng %U http://hdl.handle.net/11858/00-001M-0000-002C-48CC-F %I Universität des Saarlandes %C Saarbrücken %D 2015 %P X, 48 p. %V master %9 master
[258]
D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Summarizing and Understanding Large Graphs,” Statistical Analysis and Data Mining, vol. 8, no. 3, 2015.
Export
BibTeX
@article{koutra:15:vog, TITLE = {Summarizing and Understanding Large Graphs}, AUTHOR = {Koutra, Danai and Kang, U and Vreeken, Jilles and Faloutsos, Christos}, LANGUAGE = {eng}, ISSN = {1932-1872}, DOI = {10.1002/sam.11267}, PUBLISHER = {Wiley-Blackwell}, ADDRESS = {Chichester}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, JOURNAL = {Statistical Analysis and Data Mining}, VOLUME = {8}, NUMBER = {3}, PAGES = {183--202}, }
Endnote
%0 Journal Article %A Koutra, Danai %A Kang, U %A Vreeken, Jilles %A Faloutsos, Christos %+ External Organizations External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations %T Summarizing and Understanding Large Graphs : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0026-D185-2 %R 10.1002/sam.11267 %7 2015-05-18 %D 2015 %J Statistical Analysis and Data Mining %O The ASA Data Science Journal %V 8 %N 3 %& 183 %P 183 - 202 %I Wiley-Blackwell %C Chichester %@ false
[259]
P. Mandros, “Information Theoretic Supervised Feature Selection for Continuous Data,” Universität des Saarlandes, Saarbrücken, 2015.
Export
BibTeX
@mastersthesis{MandrosMaster2015, TITLE = {Information Theoretic Supervised Feature Selection for Continuous Data}, AUTHOR = {Mandros, Panagiotis}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, }
Endnote
%0 Thesis %A Mandros, Panagiotis %Y Weikum, Gerhard %A referee: Vreeken, Jilles %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Information Theoretic Supervised Feature Selection for Continuous Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-BAF3-F %I Universität des Saarlandes %C Saarbrücken %D 2015 %P 67 p. %V master %9 master
[260]
S. Metzger, R. Schenkel, and M. Sydow, “Aspect-based Similar Entity Search in Semantic Knowledge Graphs with Diversity-awareness and Relaxation,” in The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops (WI-IAT 2014), Warsaw, Poland, 2015.
Export
BibTeX
@inproceedings{MetzgerIAT2014, TITLE = {Aspect-based Similar Entity Search in Semantic Knowledge Graphs with Diversity-awareness and Relaxation}, AUTHOR = {Metzger, Steffen and Schenkel, Ralf and Sydow, Marcin}, LANGUAGE = {eng}, ISBN = {978-1-4799-4143-8}, DOI = {10.1109/WI-IAT.2014.17}, PUBLISHER = {IEEE}, YEAR = {2014}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology -- Workshops (WI-IAT 2014)}, EDITOR = {{\'S}l{\c e}zak, Dominik and Nguyen, Hung Son and Reformat, Marek and Santos, Eugene}, PAGES = {60--69}, ADDRESS = {Warsaw, Poland}, }
Endnote
%0 Conference Proceedings %A Metzger, Steffen %A Schenkel, Ralf %A Sydow, Marcin %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Aspect-based Similar Entity Search in Semantic Knowledge Graphs with Diversity-awareness and Relaxation : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0029-424D-5 %R 10.1109/WI-IAT.2014.17 %D 2015 %B IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology %Z date of event: 2014-08-11 - 2014-08-14 %C Warsaw, Poland %B The 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops %E Ślęzak, Dominik; Nguyen, Hung Son; Reformat, Marek; Santos, Eugene %P 60 - 69 %I IEEE %@ 978-1-4799-4143-8
[261]
S. Metzler and P. Miettinen, “On Defining SPARQL with Boolean Tensor Algebra,” 2015. [Online]. Available: http://arxiv.org/abs/1503.00301. (arXiv: 1503.00301)
Abstract
The Resource Description Framework (RDF) represents information as subject-predicate-object triples. These triples are commonly interpreted as a directed labelled graph. We propose an alternative approach, interpreting the data as a 3-way Boolean tensor. We show how SPARQL queries - the standard queries for RDF - can be expressed as elementary operations in Boolean algebra, giving us a complete re-interpretation of RDF and SPARQL. We show how the Boolean tensor interpretation allows for new optimizations and analyses of the complexity of SPARQL queries. For example, estimating the size of the results for different join queries becomes much simpler.
Export
BibTeX
@online{metzler15defining:arxiv, TITLE = {On Defining {SPARQL} with {B}oolean Tensor Algebra}, AUTHOR = {Metzler, Saskia and Miettinen, Pauli}, LANGUAGE = {eng}, URL = {http://arxiv.org/abs/1503.00301}, EPRINT = {1503.00301}, EPRINTTYPE = {arXiv}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, ABSTRACT = {The Resource Description Framework (RDF) represents information as subject-predicate-object triples. These triples are commonly interpreted as a directed labelled graph. We propose an alternative approach, interpreting the data as a 3-way Boolean tensor. We show how SPARQL queries -- the standard queries for RDF -- can be expressed as elementary operations in Boolean algebra, giving us a complete re-interpretation of RDF and SPARQL. We show how the Boolean tensor interpretation allows for new optimizations and analyses of the complexity of SPARQL queries. For example, estimating the size of the results for different join queries becomes much simpler.}, }
Endnote
%0 Report %A Metzler, Saskia %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T On Defining SPARQL with Boolean Tensor Algebra : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0025-054A-9 %U http://arxiv.org/abs/1503.00301 %D 2015 %8 03.03.2015 %X The Resource Description Framework (RDF) represents information as subject-predicate-object triples. These triples are commonly interpreted as a directed labelled graph. We propose an alternative approach, interpreting the data as a 3-way Boolean tensor. We show how SPARQL queries - the standard queries for RDF - can be expressed as elementary operations in Boolean algebra, giving us a complete re-interpretation of RDF and SPARQL. We show how the Boolean tensor interpretation allows for new optimizations and analyses of the complexity of SPARQL queries. For example, estimating the size of the results for different join queries becomes much simpler. %K Computer Science, Databases, cs.DB
[262]
S. Metzler and P. Miettinen, “Join Size Estimation on Boolean Tensors of RDF Data,” in WWW’15 Companion, Florence, Italy, 2015.
Export
BibTeX
@inproceedings{metzler15join, TITLE = {Join Size Estimation on {Boolean} Tensors of {RDF} Data}, AUTHOR = {Metzler, Saskia and Miettinen, Pauli}, LANGUAGE = {eng}, ISBN = {978-1-4503-3473-0}, DOI = {10.1145/2740908.2742738}, PUBLISHER = {ACM}, YEAR = {2015}, MARGINALMARK = {$\bullet$}, DATE = {2015}, BOOKTITLE = {WWW'15 Companion}, PAGES = {77--78}, ADDRESS = {Florence, Italy}, }
Endnote
%0 Conference Proceedings %A Metzler, Saskia %A Miettinen, Pauli %+ Databases and Information Systems, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T Join Size Estimation on Boolean Tensors of RDF Data : %G eng %U http://hdl.handle.net/11858/00-001M-0000-0024-CCED-A %R 10.1145/2740908.2742738 %D 2015 %B 24th International Conference on World Wide Web %Z date of event: 2015-05-18 - 2015-05-22 %C Florence, Italy %B WWW'15 Companion %P 77 - 78 %I ACM %@ 978-1-4503-3473-0
[263]
S. Metzler and P. Miettinen, “Clustering Boolean Tensors,” 2015. [Online]. Available: http://arxiv.org/abs/1501.00696. (arXiv: 1501.00696)
Abstract
Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful t