Max-Planck-Institut für Informatik
Department 5: Databases & Information Systems
Campus E1 4, Room 407
66123 Saarbrücken
Email: martin.theobald@mpi-inf.mpg.de
Phone: +49 681 9325 507
Fax: +49 681 9325 599
via MPII Publication Server | DBLP
Probabilistic Databases (graduate level seminar), Universität des Saarlandes, Summer Semester '09
CS245 Principles of Database Systems (graduate level course), Stanford University, Summer Quarter '08
ACM-TODS, DAPD, IEEE-IS, IEEE-TC, IEEE-TKDE, IJCNS, IJDL, IJHIS, Information Systems, VLDB-J, WWW-J
ADBIS '07, ADBIS '08, DataX '08, DataX '09, DBRank'08, DBRank'10, DSMM '09, ECIR '07, ECIR '08, ECIR '09, ECIR '10, EDBT '08, EDBT '09, ICDE '09, ICDE '10, MUD '07, MUD '08, PersDB '08, PersDB '09, PersDL '07, VLDB Ph.D. Workshop '09, SIGIR '09, SIGIR '10, SIGMOD '10, WIDM '07, WWW '09, XSym '06
INEX '06, INEX '07, INEX '08/'09
INEX was an activity of the DELOS Network of Excellence for Digital Libraries until '07.
I'm organizing the new Efficiency Track at INEX '08/'09, together with Ralf Schenkel.
Efficient and Versatile Top-k Query Processing for Text, Semistructured, and Structured Data
Check out our online demo!
A System for Integrated Management of Data, Uncertainty, and Lineage
Check out our online demo!
Bookmark-Induced Gathering of Information with Adaptive Classification into Personalized Ontologies
This is the Java source code accompanying our 2008 SIGIR paper Efficient and Robust Near-Duplicate Detection in Large Web Collections.
SpotSigs needs the Java Colt package for some basic hashing operations.
The package also contains efficient Java implementations of LSH (Locality Sensitive Hashing, incl. Min-Hashing) and the I-Match algorithm for near-duplicate document detection.
Our Gold Set of 2,160 manually selected near-duplicate news articles used as reference set in the paper is also available.
A true native interface for Thorsten Joachim's genuine SVM-light v6.01 in a compact Java API. Originally written as part of BINGO!, this is probably the fastest currently available Java Native Interface (JNI) for SVM-light. It comes with two precompiled shared libraries for Windows (svmlight.dll) and RedHat/Debian/Suse Linux (svmlight.so) and supports the full functionality of SVM-light such as classification, regression, and full Java-side parameterization. All sources can easily be recompiled for more eccentric operating systems.
See the JavaDoc and JNI_SVMLight_Test.java test class for more details.