Homepage
Martin Theobald
Max-Planck-Institut für Informatik
Department 5: Databases & Information Systems
Campus E1 4, Room 407
66123 Saarbrücken
Email: martin.theobald@mpi-inf.mpg.de
Phone: +49 681 9325 507
Fax: +49 681 9325 599
|
I'm head of the research group for Ranking and Uncertain Data Management at D5 of MPI-Inf.
My general research interests are in the integration of DB&IR. Specifically:
- Probabilistic databases and management of uncertain data
- Query processing in large RDF collections
- Information extraction
- XPath/XQuery full-text search
See also our URDF project page!
| |  |
| PhD |
- Maximilian Dylla (Probabilistic Temporal Databases, Top-k, Sampling)
- Sairam Gurajada (Distributed RDF Indexing & Query Processing)
- Levan Kasradze (Distributed XML Search & Ranking) @ Tbilisi State University, Georgia
|
| Master |
- Dat Ba Nguyen (Efficient Entity Disambiguation via Similarity Hashing)
- Arnab Dutta (Distributed SPARQL Processing using Message Passing)
- Manish Kumar (Updates for Top-k Queries over Inverted Indexes)
|
| Bachelor |
- Artem Boldyrev (Clustering Information Extraction Output via Matrix Factorization)
- Aliaksandr Talaika (Two-Pass Information Extraction with Patterns and CRF's)
|
via MPII Publication Server | DBLP | Google Schoolar
- Information Retrieval & Data Mining (core lecture, with Pauli Miettinen), Universität des Saarlandes, Winter '11/'12
- Probabilistic Models for Information Extraction (graduate level seminar), Universität des Saarlandes, Summer '11
- Information Extraction & Knowledge Harvesting (graduate level seminar), Universität des Saarlandes, Winter '10/11
- Semantic Web Technologies (graduate level seminar), Universität des Saarlandes, Summer Semester '10
- Distributed Information Systems (graduate level seminar), Universität des Saarlandes, Summer Semester '10
- Probabilistic Databases (graduate level seminar), Universität des Saarlandes, Summer Semester '09
- CS245 Principles of Database Systems (graduate level course), Stanford University, Summer Quarter '08
- Google Focused Research Award, December 2010
- ACM SIGMOD Dissertation Award Honorable Mention '06, SIGMOD Conference, Beijing, June 2007
- GI DBIS Dissertation Award '06/'07, BTW Conference, Aachen, March 2007
- Otto Hahn Medal of the Max Planck Society '06, Annual Gathering of the Max Planck Society, Kiel, June 2007
- Yago-QA: Answering Questions by Structured Knowledge Queries, ICSC 2011, Stanford, September 2011
- Interactive Reasoning in Large and Uncertain RDF Knowledge Bases, Free University of Bozen-Bolzano, December 2010
- LIVE - A lineage-supported, versioned DBMS, SSDBM 2010, Heidelberg, June 2010
- From Information to Knowledge - Harvesting Entities and Relationships From Web Sources, PODS Tutorial, Indianapolis, June 2010
- TopX 2.0 at the INEX 2009 Ad-hoc and Efficiency Tracks, INEX Workshop, Brisbane, December 2009
- TopX 2.0 at the INEX 2008 Efficiency Track, INEX Workshop, Schloss Dagstuhl, December 2008
- Overview of the INEX 2008 Efficiency Track, INEX Workshop, Schloss Dagstuhl, December 2008
- SpotSigs - Robust and Efficient Near Duplicate Detection in Large Web Collections, SIGIR, Singapore, July 2008
- TopX 2.0, Dagstuhl Seminar on Ranked XML Retrieval, Schloss Dagstuhl, March 2008
- Trio - A System for Integrated Management of Data, Lineage, and Uncertainty, USI Lugano, March 2008
- TopX (basically a compilation of previous talks with lots of animations, given at various occasions)
- TopX - Efficient and Versatile Top-k Query Processing for Text, Structured, and Semistructured Data, BTW, Aachen, March 2007
- An Efficient and Versatile Query Engine for TopX Search, VLDB, Trondheim, September 2005
- Efficient and Self-Tuning Incremental Query Expansion for Top-k Query Processing, SIGIR, Salvador de Bahia, August 2005
- Efficient Top-k Query Processing for Text, Semistructured, and Structured Data, MPI-IMPRS, May 2005
- Probabilistic Top-k Query Processing [poster], MPI-IMPRS, September 2004
- Top-k Query Processing with Probabilistic Guarantees, VLDB, Toronto, September 2004
- BINGO! and Daffodil: Personalized Exploration of Digital Libraries and Web Sources, RIAO, Avignon, April 2004
- Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data, WebDB 2003, San Diego, June 2003
- Reviews for Journals
- Conferences/Workshops PC
- I have been in the organizer's team of the international INEX (Initiative on the Evaluation of XML Retrieval) workshop for a few years, with a focus on the document exploration phase and topic development using TopX
INEX '06,
INEX '07,
INEX '08/'09/'10
and the Efficiency Track at INEX '08/'09.
INEX was an activity of the DELOS Network of Excellence for Digital Libraries until 2007.
- TopX 2.0
The current version of TopX with customized index structures and a C++ query processor.
- TopX
Efficient and Versatile Top-k Query Processing for Text, Semistructured, and Structured Data (original Java version)
- Stanford Trio Project
A System for Integrated Management of Data, Uncertainty, and Lineage
- The BINGO! Focused Crawler
Bookmark-Induced Gathering of Information with Adaptive Classification into Personalized Ontologies
- SpotSigs
This is the Java source code accompanying our 2008 SIGIR paper Efficient and Robust Near-Duplicate Detection in Large Web Collections.
SpotSigs needs the Java Colt package for some basic hashing operations.
The package also contains efficient Java implementations of LSH (Locality Sensitive Hashing, incl. Min-Hashing) and the I-Match algorithm for near-duplicate document detection.
Our Gold Set of 2,160 manually selected near-duplicate news articles used as reference set in the paper is also available.
- JNI_SVM-light-6.01
A true native interface for Thorsten Joachim's genuine SVM-light v6.01 in a compact Java API. Originally written as part of BINGO!,
this is probably the fastest currently available Java Native Interface (JNI) for SVM-light. It comes with two precompiled shared libraries for Windows (svmlight.dll) and RedHat/Debian/Suse Linux (svmlight.so)
and supports the full functionality of SVM-light such as classification, regression, and full Java-side parameterization. All sources can easily be recompiled for more eccentric operating systems.
See the JavaDoc and JNI_SVMLight_Test.java test class for more details.