Current Year

Master
[1]
M. Abouhamra, “AligNarr: Aligning Narratives of Different Length for Movie Summarization,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
Export
BibTeX
@mastersthesis{AbouhamraMSc2019, TITLE = {{AligNarr}: Aligning Narratives of Different Length for Movie Summarization}, AUTHOR = {Abouhamra, Mostafa}, LANGUAGE = {eng}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.}, }
Endnote
%0 Thesis %A Abouhamra, Mostafa %Y Weikum, Gerhard %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T AligNarr: Aligning Narratives of Different Length for Movie Summarization : %G eng %U http://hdl.handle.net/21.11116/0000-0004-5836-D %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 54 p. %V master %9 master %X Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.
PhD
[2]
A. Abujabal and K. Berberich, “Question Answering over Knowledge Bases with Continuous Learning,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.
Export
BibTeX
@phdthesis{Abujabalphd2013, TITLE = {Question Answering over Knowledge Bases with Continuous Learning}, AUTHOR = {Abujabal, Abdalghani and Berberich, Klaus}, LANGUAGE = {eng}, DOI = {10.22028/D291-27968}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words.}, }
Endnote
%0 Thesis %A Abujabal, Abdalghani %Y Weikum, Gerhard %A referee: Linn, Jimmy %A Berberich, Klaus %+ Databases and Information Systems, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society External Organizations Databases and Information Systems, MPI for Informatics, Max Planck Society %T Question Answering over Knowledge Bases with Continuous Learning : %G eng %U http://hdl.handle.net/21.11116/0000-0003-AEC0-0 %R 10.22028/D291-27968 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 141 p. %V phd %9 phd %X Answering complex natural language questions with crisp answers is crucial towards satisfying the information needs of advanced users. With the rapid growth of knowledge bases (KBs) such as Yago and Freebase, this goal has become attainable by translating questions into formal queries like SPARQL queries. Such queries can then be evaluated over knowledge bases to retrieve crisp answers. To this end, three research issues arise: (i) how to develop methods that are robust to lexical and syntactic variations in questions and can handle complex questions, (ii) how to design and curate datasets to advance research in question answering, and (iii) how to efficiently identify named entities in questions. In this dissertation, we make the following five contributions in the areas of question answering (QA) and named entity recognition (NER). For issue (i), we make the following contributions: We present QUINT, an approach for answering natural language questions over knowledge bases using automatically learned templates. Templates are an important asset for QA over KBs, simplifying the semantic parsing of input questions and generating formal queries for interpretable answers. QUINT is capable of answering both simple and compositional questions. We introduce NEQA, a framework for continuous learning for QA over KBs. NEQA starts with a small seed of training examples in the form of question-answer pairs, and improves its performance over time. NEQA combines both syntax, through template-based answering, and semantics, via a semantic similarity function. %when templates fail to do so. Moreover, it adapts to the language used after deployment by periodically retraining its underlying models. For issues (i) and (ii), we present TEQUILA, a framework for answering complex questions with explicit and implicit temporal conditions over KBs. TEQUILA is built on a rule-based framework that detects and decomposes temporal questions into simpler sub-questions that can be answered by standard KB-QA systems. TEQUILA reconciles the results of sub-questions into final answers. TEQUILA is accompanied with a dataset called TempQuestions, which consists of 1,271 temporal questions with gold-standard answers over Freebase. This collection is derived by judiciously selecting time-related questions from existing QA datasets. For issue (ii), we publish ComQA, a large-scale manually-curated dataset for QA. ComQA contains questions that represent real information needs and exhibit a wide range of difficulties such as the need for temporal reasoning, comparison, and compositionality. ComQA contains paraphrase clusters of semantically-equivalent questions that can be exploited by QA systems. We harness a combination of community question-answering platforms and crowdsourcing to construct the ComQA dataset. For issue (iii), we introduce a neural network model based on subword units for named entity recognition. The model learns word representations using a combination of characters, bytes and phonemes. While achieving comparable performance with word-level based models, our model has an order-of-magnitude smaller vocabulary size and lower memory requirements, and it handles out-of-vocabulary words. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27438
[3]
J. A. Biega, “Enhancing Privacy and Fairness in Search Systems,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.
Export
BibTeX
@phdthesis{biegaphd2019, TITLE = {Enhancing Privacy and Fairness in Search Systems}, AUTHOR = {Biega, Joanna Asia}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-278861}, DOI = {10.22028/D291-27886}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms.}, }
Endnote
%0 Thesis %A Biega, Joanna Asia %Y Weikum, Gerhard %A referee: Gummadi, Krishna %A referee: Nejdl, Wolfgang %+ Ontologies, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society External Organizations %T Enhancing Privacy and Fairness in Search Systems : %G eng %U http://hdl.handle.net/21.11116/0000-0003-9AED-5 %R 10.22028/D291-27886 %U urn:nbn:de:bsz:291--ds-278861 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 111 p. %V phd %9 phd %X Following a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people's economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: (1) We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. (2) We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. (3) We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user's relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. (4) We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27389
[4]
M. Döring, “Computational Approaches for Improving Treatment and Prevention of Viral Infections,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments. geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptorhiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.
Export
BibTeX
@phdthesis{Doringphd2013, TITLE = {Computational Approaches for Improving Treatment and Prevention of Viral Infections}, AUTHOR = {D{\"o}ring, Matthias}, LANGUAGE = {eng}, DOI = {10.22028/D291-27946}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments. geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptorhiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.}, }
Endnote
%0 Thesis %A Döring, Matthias %Y Pfeifer, Nico %A referee: Lengauer, Thomas %A referee: Kalinina, Olga V. %+ Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society %T Computational Approaches for Improving Treatment and Prevention of Viral Infections : %G eng %U http://hdl.handle.net/21.11116/0000-0003-AEBA-8 %R 10.22028/D291-27946 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 337 p. %V phd %9 phd %X The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments. geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptorhiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27443
[5]
P. Ebert, “What we leave behind : reproducibility in chromatin analysis within and across species,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Epigenetics is the field of biology that investigates heritable factors regulating gene expression without being directly encoded in the genome of an organism. The human genome is densely packed inside a cell's nucleus in the form of chromatin. Certain constituents of chromatin play a vital role as epigenetic factors in the dynamic regulation of gene expression. Epigenetic changes on the chromatin level are thus an integral part of the mechanisms governing the development of the functionally diverse cell types in multicellular species such as human. Studying these mechanisms is not only important to understand the biology of healthy cells, but also necessary to comprehend the epigenetic component in the formation of many complex diseases. Modern wet lab technology enables scientists to probe the epigenome with high throughput and in extensive detail. The fast generation of epigenetic datasets burdens computational researchers with the challenge of rapidly performing elaborate analyses without compromising on the scientific reproducibility of the reported findings. To facilitate reproducible computational research in epigenomics, this thesis proposes a task-oriented metadata model, relying on web technology and supported by database engineering, that aims at consistent and human-readable documentation of standardized computational workflows. The suggested approach features, e.g., computational validation of metadata records, automatic error detection, and progress monitoring of multi-step analyses, and was successfully field-tested as part of a large epigenome research consortium. This work leaves aside theoretical considerations, and intentionally emphasizes the realistic need of providing scientists with tools that assist them in performing reproducible research. Irrespective of the technological progress, the dynamic and cell-type specific nature of the epigenome commonly requires restricting the number of analyzed samples due to resource limitations. The second project of this thesis introduces the software tool SCIDDO, which has been developed for the differential chromatin analysis of cellular samples with potentially limited availability. By combining statistics, algorithmics, and best practices for robust software development, SCIDDO can quickly identify biologically meaningful regions of differential chromatin marking between cell types. We demonstrate SCIDDO's usefulness in an exemplary study in which we identify regions that establish a link between chromatin and gene expression changes. SCIDDO's quantitative approach to differential chromatin analysis is user-customizable, providing the necessary flexibility to adapt SCIDDO to specific research tasks. Given the functional diversity of cell types and the dynamics of the epigenome in response to environmental changes, it is hardly realistic to map the complete epigenome even for a single organism like human or mouse. For non-model organisms, e.g., cow, pig, or dog, epigenome data is particularly scarce. The third project of this thesis investigates to what extent bioinformatics methods can compensate for the comparatively little effort that is invested in charting the epigenome of non-model species. This study implements a large integrative analysis pipeline, including state-of-the-art machine learning, to transfer chromatin data for predictive modeling between 13 species. The evidence presented here indicates that a partial regulatory epigenetic signal is stably retained even over millions of years of evolutionary distance between the considered species. This finding suggests complementary and cost-effective ways for bioinformatics to contribute to comparative epigenome analysis across species boundaries.
Export
BibTeX
@phdthesis{Ebertphd2019, TITLE = {What we leave behind : reproducibility in chromatin analysis within and across species}, AUTHOR = {Ebert, Peter}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-278311}, DOI = {doi.org/10.22028/D291-27831}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Epigenetics is the field of biology that investigates heritable factors regulating gene expression without being directly encoded in the genome of an organism. The human genome is densely packed inside a cell's nucleus in the form of chromatin. Certain constituents of chromatin play a vital role as epigenetic factors in the dynamic regulation of gene expression. Epigenetic changes on the chromatin level are thus an integral part of the mechanisms governing the development of the functionally diverse cell types in multicellular species such as human. Studying these mechanisms is not only important to understand the biology of healthy cells, but also necessary to comprehend the epigenetic component in the formation of many complex diseases. Modern wet lab technology enables scientists to probe the epigenome with high throughput and in extensive detail. The fast generation of epigenetic datasets burdens computational researchers with the challenge of rapidly performing elaborate analyses without compromising on the scientific reproducibility of the reported findings. To facilitate reproducible computational research in epigenomics, this thesis proposes a task-oriented metadata model, relying on web technology and supported by database engineering, that aims at consistent and human-readable documentation of standardized computational workflows. The suggested approach features, e.g., computational validation of metadata records, automatic error detection, and progress monitoring of multi-step analyses, and was successfully field-tested as part of a large epigenome research consortium. This work leaves aside theoretical considerations, and intentionally emphasizes the realistic need of providing scientists with tools that assist them in performing reproducible research. Irrespective of the technological progress, the dynamic and cell-type specific nature of the epigenome commonly requires restricting the number of analyzed samples due to resource limitations. The second project of this thesis introduces the software tool SCIDDO, which has been developed for the differential chromatin analysis of cellular samples with potentially limited availability. By combining statistics, algorithmics, and best practices for robust software development, SCIDDO can quickly identify biologically meaningful regions of differential chromatin marking between cell types. We demonstrate SCIDDO's usefulness in an exemplary study in which we identify regions that establish a link between chromatin and gene expression changes. SCIDDO's quantitative approach to differential chromatin analysis is user-customizable, providing the necessary flexibility to adapt SCIDDO to specific research tasks. Given the functional diversity of cell types and the dynamics of the epigenome in response to environmental changes, it is hardly realistic to map the complete epigenome even for a single organism like human or mouse. For non-model organisms, e.g., cow, pig, or dog, epigenome data is particularly scarce. The third project of this thesis investigates to what extent bioinformatics methods can compensate for the comparatively little effort that is invested in charting the epigenome of non-model species. This study implements a large integrative analysis pipeline, including state-of-the-art machine learning, to transfer chromatin data for predictive modeling between 13 species. The evidence presented here indicates that a partial regulatory epigenetic signal is stably retained even over millions of years of evolutionary distance between the considered species. This finding suggests complementary and cost-effective ways for bioinformatics to contribute to comparative epigenome analysis across species boundaries.}, }
Endnote
%0 Thesis %A Ebert, Peter %Y Lengauer, Thomas %A referee: Lenhof, Hans-Peter %A referee: Weikum, Gerhard %+ Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society Algorithms and Complexity, MPI for Informatics, Max Planck Society Databases and Information Systems, MPI for Informatics, Max Planck Society %T What we leave behind : reproducibility in chromatin analysis within and across species : %G eng %U http://hdl.handle.net/21.11116/0000-0003-9ADF-5 %R doi.org/10.22028/D291-27831 %U urn:nbn:de:bsz:291--ds-278311 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 152 p. %V phd %9 phd %X Epigenetics is the field of biology that investigates heritable factors regulating gene expression without being directly encoded in the genome of an organism. The human genome is densely packed inside a cell's nucleus in the form of chromatin. Certain constituents of chromatin play a vital role as epigenetic factors in the dynamic regulation of gene expression. Epigenetic changes on the chromatin level are thus an integral part of the mechanisms governing the development of the functionally diverse cell types in multicellular species such as human. Studying these mechanisms is not only important to understand the biology of healthy cells, but also necessary to comprehend the epigenetic component in the formation of many complex diseases. Modern wet lab technology enables scientists to probe the epigenome with high throughput and in extensive detail. The fast generation of epigenetic datasets burdens computational researchers with the challenge of rapidly performing elaborate analyses without compromising on the scientific reproducibility of the reported findings. To facilitate reproducible computational research in epigenomics, this thesis proposes a task-oriented metadata model, relying on web technology and supported by database engineering, that aims at consistent and human-readable documentation of standardized computational workflows. The suggested approach features, e.g., computational validation of metadata records, automatic error detection, and progress monitoring of multi-step analyses, and was successfully field-tested as part of a large epigenome research consortium. This work leaves aside theoretical considerations, and intentionally emphasizes the realistic need of providing scientists with tools that assist them in performing reproducible research. Irrespective of the technological progress, the dynamic and cell-type specific nature of the epigenome commonly requires restricting the number of analyzed samples due to resource limitations. The second project of this thesis introduces the software tool SCIDDO, which has been developed for the differential chromatin analysis of cellular samples with potentially limited availability. By combining statistics, algorithmics, and best practices for robust software development, SCIDDO can quickly identify biologically meaningful regions of differential chromatin marking between cell types. We demonstrate SCIDDO's usefulness in an exemplary study in which we identify regions that establish a link between chromatin and gene expression changes. SCIDDO's quantitative approach to differential chromatin analysis is user-customizable, providing the necessary flexibility to adapt SCIDDO to specific research tasks. Given the functional diversity of cell types and the dynamics of the epigenome in response to environmental changes, it is hardly realistic to map the complete epigenome even for a single organism like human or mouse. For non-model organisms, e.g., cow, pig, or dog, epigenome data is particularly scarce. The third project of this thesis investigates to what extent bioinformatics methods can compensate for the comparatively little effort that is invested in charting the epigenome of non-model species. This study implements a large integrative analysis pipeline, including state-of-the-art machine learning, to transfer chromatin data for predictive modeling between 13 species. The evidence presented here indicates that a partial regulatory epigenetic signal is stably retained even over millions of years of evolutionary distance between the considered species. This finding suggests complementary and cost-effective ways for bioinformatics to contribute to comparative epigenome analysis across species boundaries. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27387
[6]
T. Leimkühler, “Artificial Intelligence for Efficient Image-based View Synthesis,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Synthesizing novel views from image data is a widely investigated topic in both computer graphics and computer vision, and has many applications like stereo or multi-view rendering for virtual reality, light field reconstruction, and image post-processing. While image-based approaches have the advantage of reduced computational load compared to classical model-based rendering, efficiency is still a major concern. This thesis demonstrates how concepts and tools from artificial intelligence can be used to increase the efficiency of image-based view synthesis algorithms. In particular it is shown how machine learning can help to generate point patterns useful for a variety of computer graphics tasks, how path planning can guide image warping, how sparsity-enforcing optimization can lead to significant speedups in interactive distribution effect rendering, and how probabilistic inference can be used to perform real-time 2D-to-3D conversion.
Export
BibTeX
@phdthesis{Leimphd2019, TITLE = {Artificial Intelligence for Efficient Image-based View Synthesis}, AUTHOR = {Leimk{\"u}hler, Thomas}, LANGUAGE = {eng}, DOI = {10.22028/D291-28379}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Synthesizing novel views from image data is a widely investigated topic in both computer graphics and computer vision, and has many applications like stereo or multi-view rendering for virtual reality, light field reconstruction, and image post-processing. While image-based approaches have the advantage of reduced computational load compared to classical model-based rendering, efficiency is still a major concern. This thesis demonstrates how concepts and tools from artificial intelligence can be used to increase the efficiency of image-based view synthesis algorithms. In particular it is shown how machine learning can help to generate point patterns useful for a variety of computer graphics tasks, how path planning can guide image warping, how sparsity-enforcing optimization can lead to significant speedups in interactive distribution effect rendering, and how probabilistic inference can be used to perform real-time 2D-to-3D conversion.}, }
Endnote
%0 Thesis %A Leimkühler, Thomas %Y Seidel, Hans-Peter %A referee: Ritschel, Tobias %A referee: Lensch, Hendrik %A referee: Drettakis, George %+ Computer Graphics, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society External Organizations %T Artificial Intelligence for Efficient Image-based View Synthesis : %G eng %U http://hdl.handle.net/21.11116/0000-0004-A589-7 %R 10.22028/D291-28379 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 136 p. %V phd %9 phd %X Synthesizing novel views from image data is a widely investigated topic in both computer graphics and computer vision, and has many applications like stereo or multi-view rendering for virtual reality, light field reconstruction, and image post-processing. While image-based approaches have the advantage of reduced computational load compared to classical model-based rendering, efficiency is still a major concern. This thesis demonstrates how concepts and tools from artificial intelligence can be used to increase the efficiency of image-based view synthesis algorithms. In particular it is shown how machine learning can help to generate point patterns useful for a variety of computer graphics tasks, how path planning can guide image warping, how sparsity-enforcing optimization can lead to significant speedups in interactive distribution effect rendering, and how probabilistic inference can be used to perform real-time 2D-to-3D conversion. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27664
[7]
E. Levinkov, “Generalizations of the Multicut Problem for Computer Vision,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Graph decomposition has always been a very important concept in machine learning and computer vision. Many tasks like image and mesh segmentation, community detection in social networks, as well as object tracking and human pose estimation can be formulated as a graph decomposition problem. The multicut problem in particular is a popular model to optimize for a decomposition of a given graph. Its main advantage is that no prior knowledge about the number of components or their sizes is required. However, it has several limitations, which we address in this thesis: Firstly, the multicut problem allows to specify only cost or reward for putting two direct neighbours into distinct components. This limits the expressibility of the cost function. We introduce special edges into the graph that allow to define cost or reward for putting any two vertices into distinct components, while preserving the original set of feasible solutions. We show that this considerably improves the quality of image and mesh segmentations. Second, multicut is notorious to be NP-hard for general graphs, that limits its applications to small super-pixel graphs. We define and implement two primal feasible heuristics to solve the problem. They do not provide any guarantees on the runtime or quality of solutions, but in practice show good convergence behaviour. We perform an extensive comparison on multiple graphs of different sizes and properties. Third, we extend the multicut framework by introducing node labels, so that we can jointly optimize for graph decomposition and nodes classification by means of exactly the same optimization algorithm, thus eliminating the need to hand-tune optimizers for a particular task. To prove its universality we applied it to diverse computer vision tasks, including human pose estimation, multiple object tracking, and instance-aware semantic segmentation. We show that we can improve the results over the prior art using exactly the same data as in the original works. Finally, we use employ multicuts in two applications: 1) a client-server tool for interactive video segmentation: After the pre-processing of the video a user draws strokes on several frames and a time-coherent segmentation of the entire video is performed on-the-fly. 2) we formulate a method for simultaneous segmentation and tracking of living cells in microscopy data. This task is challenging as cells split and our algorithm accounts for this, creating parental hierarchies. We also present results on multiple model fitting. We find models in data heavily corrupted by noise by finding components defining these models using higher order multicuts. We introduce an interesting extension that allows our optimization to pick better hyperparameters for each discovered model. In summary, this thesis extends the multicut problem in different directions, proposes algorithms for optimization, and applies it to novel data and settings.
Export
BibTeX
@phdthesis{Levinkovphd2013, TITLE = {Generalizations of the Multicut Problem for Computer Vision}, AUTHOR = {Levinkov, Evgeny}, LANGUAGE = {eng}, DOI = {10.22028/D291-27909}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Graph decomposition has always been a very important concept in machine learning and computer vision. Many tasks like image and mesh segmentation, community detection in social networks, as well as object tracking and human pose estimation can be formulated as a graph decomposition problem. The multicut problem in particular is a popular model to optimize for a decomposition of a given graph. Its main advantage is that no prior knowledge about the number of components or their sizes is required. However, it has several limitations, which we address in this thesis: Firstly, the multicut problem allows to specify only cost or reward for putting two direct neighbours into distinct components. This limits the expressibility of the cost function. We introduce special edges into the graph that allow to define cost or reward for putting any two vertices into distinct components, while preserving the original set of feasible solutions. We show that this considerably improves the quality of image and mesh segmentations. Second, multicut is notorious to be NP-hard for general graphs, that limits its applications to small super-pixel graphs. We define and implement two primal feasible heuristics to solve the problem. They do not provide any guarantees on the runtime or quality of solutions, but in practice show good convergence behaviour. We perform an extensive comparison on multiple graphs of different sizes and properties. Third, we extend the multicut framework by introducing node labels, so that we can jointly optimize for graph decomposition and nodes classification by means of exactly the same optimization algorithm, thus eliminating the need to hand-tune optimizers for a particular task. To prove its universality we applied it to diverse computer vision tasks, including human pose estimation, multiple object tracking, and instance-aware semantic segmentation. We show that we can improve the results over the prior art using exactly the same data as in the original works. Finally, we use employ multicuts in two applications: 1) a client-server tool for interactive video segmentation: After the pre-processing of the video a user draws strokes on several frames and a time-coherent segmentation of the entire video is performed on-the-fly. 2) we formulate a method for simultaneous segmentation and tracking of living cells in microscopy data. This task is challenging as cells split and our algorithm accounts for this, creating parental hierarchies. We also present results on multiple model fitting. We find models in data heavily corrupted by noise by finding components defining these models using higher order multicuts. We introduce an interesting extension that allows our optimization to pick better hyperparameters for each discovered model. In summary, this thesis extends the multicut problem in different directions, proposes algorithms for optimization, and applies it to novel data and settings.}, }
Endnote
%0 Thesis %A Levinkov, Evgeny %Y Andres, Bjoern %A referee: Lempitsky, Victor %A referee: Rother, Carsten %+ International Max Planck Research School, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Generalizations of the Multicut Problem for Computer Vision : %G eng %U http://hdl.handle.net/21.11116/0000-0003-9B27-3 %R 10.22028/D291-27909 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 151 p. %V phd %9 phd %X Graph decomposition has always been a very important concept in machine learning and computer vision. Many tasks like image and mesh segmentation, community detection in social networks, as well as object tracking and human pose estimation can be formulated as a graph decomposition problem. The multicut problem in particular is a popular model to optimize for a decomposition of a given graph. Its main advantage is that no prior knowledge about the number of components or their sizes is required. However, it has several limitations, which we address in this thesis: Firstly, the multicut problem allows to specify only cost or reward for putting two direct neighbours into distinct components. This limits the expressibility of the cost function. We introduce special edges into the graph that allow to define cost or reward for putting any two vertices into distinct components, while preserving the original set of feasible solutions. We show that this considerably improves the quality of image and mesh segmentations. Second, multicut is notorious to be NP-hard for general graphs, that limits its applications to small super-pixel graphs. We define and implement two primal feasible heuristics to solve the problem. They do not provide any guarantees on the runtime or quality of solutions, but in practice show good convergence behaviour. We perform an extensive comparison on multiple graphs of different sizes and properties. Third, we extend the multicut framework by introducing node labels, so that we can jointly optimize for graph decomposition and nodes classification by means of exactly the same optimization algorithm, thus eliminating the need to hand-tune optimizers for a particular task. To prove its universality we applied it to diverse computer vision tasks, including human pose estimation, multiple object tracking, and instance-aware semantic segmentation. We show that we can improve the results over the prior art using exactly the same data as in the original works. Finally, we use employ multicuts in two applications: 1) a client-server tool for interactive video segmentation: After the pre-processing of the video a user draws strokes on several frames and a time-coherent segmentation of the entire video is performed on-the-fly. 2) we formulate a method for simultaneous segmentation and tracking of living cells in microscopy data. This task is challenging as cells split and our algorithm accounts for this, creating parental hierarchies. We also present results on multiple model fitting. We find models in data heavily corrupted by noise by finding components defining these models using higher order multicuts. We introduce an interesting extension that allows our optimization to pick better hyperparameters for each discovered model. In summary, this thesis extends the multicut problem in different directions, proposes algorithms for optimization, and applies it to novel data and settings. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27415
[8]
S. Nikumbh, “Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multipronged regulatory mechanisms of the genome. The work described in this dissertation largely focuses on development and application of interpretable machine learning methods for prediction and analysis of long-range genomic interactions output from chromatin interaction experiments. In the first part, we demonstrate that the genetic sequence information at the ge- nomic loci is predictive of the long-range interactions of a particular locus of interest (LoI). For example, the genetic sequence information at and around enhancers can help predict whether it interacts with a promoter region of interest. This is achieved by building string kernel-based support vector classifiers together with two novel, in- tuitive visualization methods. These models suggest a potential general role of short tandem repeat motifs in the 3D genome organization. But, the insights gained out of these models are still coarse-grained. To this end, we devised a machine learning method, called CoMIK for Conformal Multi-Instance Kernels, capable of providing more fine-grained insights. When comparing sequences of variable length in the su- pervised learning setting, CoMIK can not only identify the features important for classification but also locate them within the sequence. Such precise identification of important segments of the whole sequence can help in gaining de novo insights into any role played by the intervening chromatin towards long-range interactions. Although CoMIK primarily uses only genetic sequence information, it can also si- multaneously utilize other information modalities such as the numerous functional genomics data if available. The second part describes our pipeline, pHDee, for easy manipulation of large amounts of 3D genomics data. We used the pipeline for analyzing HiChIP experimen- tal data for studying the 3D architectural changes in Ewing sarcoma (EWS) which is a rare cancer affecting adolescents. In particular, HiChIP data for two experimen- tal conditions, doxycycline-treated and untreated, and for primary tumor samples is analyzed. We demonstrate that pHDee facilitates processing and easy integration of large amounts of 3D genomics data analysis together with other data-intensive bioinformatics analyses.
Export
BibTeX
@phdthesis{Nikumbhphd2019, TITLE = {Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in {3D}}, AUTHOR = {Nikumbh, Sarvesh}, LANGUAGE = {eng}, DOI = {10.22028/D291-28153}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multipronged regulatory mechanisms of the genome. The work described in this dissertation largely focuses on development and application of interpretable machine learning methods for prediction and analysis of long-range genomic interactions output from chromatin interaction experiments. In the first part, we demonstrate that the genetic sequence information at the ge- nomic loci is predictive of the long-range interactions of a particular locus of interest (LoI). For example, the genetic sequence information at and around enhancers can help predict whether it interacts with a promoter region of interest. This is achieved by building string kernel-based support vector classifiers together with two novel, in- tuitive visualization methods. These models suggest a potential general role of short tandem repeat motifs in the 3D genome organization. But, the insights gained out of these models are still coarse-grained. To this end, we devised a machine learning method, called CoMIK for Conformal Multi-Instance Kernels, capable of providing more fine-grained insights. When comparing sequences of variable length in the su- pervised learning setting, CoMIK can not only identify the features important for classification but also locate them within the sequence. Such precise identification of important segments of the whole sequence can help in gaining de novo insights into any role played by the intervening chromatin towards long-range interactions. Although CoMIK primarily uses only genetic sequence information, it can also si- multaneously utilize other information modalities such as the numerous functional genomics data if available. The second part describes our pipeline, pHDee, for easy manipulation of large amounts of 3D genomics data. We used the pipeline for analyzing HiChIP experimen- tal data for studying the 3D architectural changes in Ewing sarcoma (EWS) which is a rare cancer affecting adolescents. In particular, HiChIP data for two experimen- tal conditions, doxycycline-treated and untreated, and for primary tumor samples is analyzed. We demonstrate that pHDee facilitates processing and easy integration of large amounts of 3D genomics data analysis together with other data-intensive bioinformatics analyses.}, }
Endnote
%0 Thesis %A Nikumbh, Sarvesh %Y Pfeifer, Nico %A referee: Marschall, Tobias %A referee: Ebert, Peter %+ Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society %T Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D : %G eng %U http://hdl.handle.net/21.11116/0000-0004-A5CE-A %R 10.22028/D291-28153 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P 150 p. %V phd %9 phd %X With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multipronged regulatory mechanisms of the genome. The work described in this dissertation largely focuses on development and application of interpretable machine learning methods for prediction and analysis of long-range genomic interactions output from chromatin interaction experiments. In the first part, we demonstrate that the genetic sequence information at the ge- nomic loci is predictive of the long-range interactions of a particular locus of interest (LoI). For example, the genetic sequence information at and around enhancers can help predict whether it interacts with a promoter region of interest. This is achieved by building string kernel-based support vector classifiers together with two novel, in- tuitive visualization methods. These models suggest a potential general role of short tandem repeat motifs in the 3D genome organization. But, the insights gained out of these models are still coarse-grained. To this end, we devised a machine learning method, called CoMIK for Conformal Multi-Instance Kernels, capable of providing more fine-grained insights. When comparing sequences of variable length in the su- pervised learning setting, CoMIK can not only identify the features important for classification but also locate them within the sequence. Such precise identification of important segments of the whole sequence can help in gaining de novo insights into any role played by the intervening chromatin towards long-range interactions. Although CoMIK primarily uses only genetic sequence information, it can also si- multaneously utilize other information modalities such as the numerous functional genomics data if available. The second part describes our pipeline, pHDee, for easy manipulation of large amounts of 3D genomics data. We used the pipeline for analyzing HiChIP experimen- tal data for studying the 3D architectural changes in Ewing sarcoma (EWS) which is a rare cancer affecting adolescents. In particular, HiChIP data for two experimen- tal conditions, doxycycline-treated and untreated, and for primary tumor samples is analyzed. We demonstrate that pHDee facilitates processing and easy integration of large amounts of 3D genomics data analysis together with other data-intensive bioinformatics analyses. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27471
[9]
N. Robertini, “Model-based Human Performance Capture in Outdoor Scenes,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Technologies for motion and performance capture of real actors have enabled the creation of realisticlooking virtual humans through detail and deformation transfer at the cost of extensive manual work and sophisticated in-studio marker-based systems. This thesis pushes the boundaries of performance capture by proposing automatic algorithms for robust 3D skeleton and detailed surface tracking in less constrained multi-view outdoor scenarios. Contributions include new multi-layered human body representations designed for effective model-based time-consistent reconstruction in complex dynamic environments with varying illumination, from a set of vision cameras. We design dense surface refinement approaches to enable smooth silhouette-free model-to-image alignment, as well as coarse-to-fine tracking techniques to enable joint estimation of skeleton motion and finescale surface deformations in complicated scenarios. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of personalized 3D virtual humans.
Export
BibTeX
@phdthesis{Robertini_PhD2019, TITLE = {Model-based Human Performance Capture in Outdoor Scenes}, AUTHOR = {Robertini, Nadia}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-285887}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Technologies for motion and performance capture of real actors have enabled the creation of realisticlooking virtual humans through detail and deformation transfer at the cost of extensive manual work and sophisticated in-studio marker-based systems. This thesis pushes the boundaries of performance capture by proposing automatic algorithms for robust 3D skeleton and detailed surface tracking in less constrained multi-view outdoor scenarios. Contributions include new multi-layered human body representations designed for effective model-based time-consistent reconstruction in complex dynamic environments with varying illumination, from a set of vision cameras. We design dense surface refinement approaches to enable smooth silhouette-free model-to-image alignment, as well as coarse-to-fine tracking techniques to enable joint estimation of skeleton motion and finescale surface deformations in complicated scenarios. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of personalized 3D virtual humans.}, }
Endnote
%0 Thesis %A Robertini, Nadia %Y Theobalt, Christian %A referee: Seidel, Hans-Peter %+ Computer Graphics, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society %T Model-based Human Performance Capture in Outdoor Scenes : %G eng %U http://hdl.handle.net/21.11116/0000-0004-9B2E-B %U urn:nbn:de:bsz:291--ds-285887 %F OTHER: hdl:20.500.11880/27667 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P XIX, 136, XI p. %V phd %9 phd %X Technologies for motion and performance capture of real actors have enabled the creation of realisticlooking virtual humans through detail and deformation transfer at the cost of extensive manual work and sophisticated in-studio marker-based systems. This thesis pushes the boundaries of performance capture by proposing automatic algorithms for robust 3D skeleton and detailed surface tracking in less constrained multi-view outdoor scenarios. Contributions include new multi-layered human body representations designed for effective model-based time-consistent reconstruction in complex dynamic environments with varying illumination, from a set of vision cameras. We design dense surface refinement approaches to enable smooth silhouette-free model-to-image alignment, as well as coarse-to-fine tracking techniques to enable joint estimation of skeleton motion and finescale surface deformations in complicated scenarios. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of personalized 3D virtual humans. %U https://scidok.sulb.uni-saarland.de/handle/20.500.11880/27667
[10]
H. Sattar, “Intents and Preferences Prediction Based on Implicit Human Cues,” Universität des Saarlandes, Saarbrücken, 2019.
Abstract
Visual search is an important task, and it is part of daily human life. Thus, it has been a long-standing goal in Computer Vision to develop methods aiming at analysing human search intent and preferences. As the target of the search only exists in mind of the person, search intent prediction remains challenging for machine perception. In this thesis, we focus on advancing techniques for search target and preference prediction from implicit human cues. First, we propose a search target inference algorithm from human fixation data recorded during visual search. In contrast to previous work that has focused on individual instances as a search target in a closed world, we propose the first approach to predict the search target in open-world settings by learning the compatibility between observed fixations and potential search targets. Second, we further broaden the scope of search target prediction to categorical classes, such as object categories and attributes. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism – incorporating both spatial and temporal aspects of human gaze behaviour. Third, we go one step further and investigate the feasibility of combining our gaze embedding approach, with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Forth, for the first time, we studied the effect of body shape on people preferences of outfits. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated. We show that our approach estimates a realistic looking body shape that captures a user’s weight group and body shape type, even from a single image of a clothed person. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded. In summary, this thesis addresses several different tasks that aims to enable the vision system to analyse human search intent and preferences in real-world scenarios. In particular, the thesis proposes several novel ideas and models in visual search target prediction from human fixation data, for the first time studied the correlation between shape and clothing categories opening a new direction in clothing recommendation systems, and introduces a new topic in privacy and computer vision, aimed at preventing automatic 3D shape extraction from images.
Export
BibTeX
@phdthesis{Sattar_PhD2019, TITLE = {Intents and Preferences Prediction Based on Implicit Human Cues}, AUTHOR = {Sattar, Hosnieh}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-281920}, DOI = {10.22028/D291-28192}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2019}, MARGINALMARK = {$\bullet$}, DATE = {2019}, ABSTRACT = {Visual search is an important task, and it is part of daily human life. Thus, it has been a long-standing goal in Computer Vision to develop methods aiming at analysing human search intent and preferences. As the target of the search only exists in mind of the person, search intent prediction remains challenging for machine perception. In this thesis, we focus on advancing techniques for search target and preference prediction from implicit human cues. First, we propose a search target inference algorithm from human fixation data recorded during visual search. In contrast to previous work that has focused on individual instances as a search target in a closed world, we propose the first approach to predict the search target in open-world settings by learning the compatibility between observed fixations and potential search targets. Second, we further broaden the scope of search target prediction to categorical classes, such as object categories and attributes. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism -- incorporating both spatial and temporal aspects of human gaze behaviour. Third, we go one step further and investigate the feasibility of combining our gaze embedding approach, with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Forth, for the first time, we studied the effect of body shape on people preferences of outfits. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated. We show that our approach estimates a realistic looking body shape that captures a user{\textquoteright}s weight group and body shape type, even from a single image of a clothed person. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded. In summary, this thesis addresses several different tasks that aims to enable the vision system to analyse human search intent and preferences in real-world scenarios. In particular, the thesis proposes several novel ideas and models in visual search target prediction from human fixation data, for the first time studied the correlation between shape and clothing categories opening a new direction in clothing recommendation systems, and introduces a new topic in privacy and computer vision, aimed at preventing automatic 3D shape extraction from images.}, }
Endnote
%0 Thesis %A Sattar, Hosnieh %Y Fritz, Mario %A referee: Schiele, Bernt %A referee: Sugano, Yusuke %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society %T Intents and Preferences Prediction Based on Implicit Human Cues : %G eng %U http://hdl.handle.net/21.11116/0000-0004-8E7F-F %R 10.22028/D291-28192 %U urn:nbn:de:bsz:291--ds-281920 %F OTHER: hdl:20.500.11880/27625 %I Universität des Saarlandes %C Saarbrücken %D 2019 %P X, 136 p. %V phd %9 phd %X Visual search is an important task, and it is part of daily human life. Thus, it has been a long-standing goal in Computer Vision to develop methods aiming at analysing human search intent and preferences. As the target of the search only exists in mind of the person, search intent prediction remains challenging for machine perception. In this thesis, we focus on advancing techniques for search target and preference prediction from implicit human cues. First, we propose a search target inference algorithm from human fixation data recorded during visual search. In contrast to previous work that has focused on individual instances as a search target in a closed world, we propose the first approach to predict the search target in open-world settings by learning the compatibility between observed fixations and potential search targets. Second, we further broaden the scope of search target prediction to categorical classes, such as object categories and attributes. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism – incorporating both spatial and temporal aspects of human gaze behaviour. Third, we go one step further and investigate the feasibility of combining our gaze embedding approach, with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Forth, for the first time, we studied the effect of body shape on people preferences of outfits. We propose a novel and robust multi-photo approach to estimate the body shapes of each user and build a conditional model of clothing categories given body-shape. We demonstrate that in real-world data, clothing categories and body-shapes are correlated. We show that our approach estimates a realistic looking body shape that captures a user’s weight group and body shape type, even from a single image of a clothed person. However, an accurate depiction of the naked body is considered highly private and therefore, might not be consented by most people. First, we studied the perception of such technology via a user study. Then, in the last part of this thesis, we ask if the automatic extraction of such information can be effectively evaded. In summary, this thesis addresses several different tasks that aims to enable the vision system to analyse human search intent and preferences in real-world scenarios. In particular, the thesis proposes several novel ideas and models in visual search target prediction from human fixation data, for the first time studied the correlation between shape and clothing categories opening a new direction in clothing recommendation systems, and introduces a new topic in privacy and computer vision, aimed at preventing automatic 3D shape extraction from images. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27625