
We work on machine learning and applications in the areas of information retrieval and language processing. Currently, our work focuses on the following research topics.
Funding: IBM, Jazz Faculty Grant
Duration: 01/2008-12/2008
Principal Investigators: Andreas Zeller, Tobias Scheffer
What is it that makes a good development process? We want to develop a plug-in that learns from collaboration and defect data as tracked by Jazz, relates features of the collaborative development process to the defect density of individual components, and thereby automatically predicts code quality. For instance, the plug-in might advise that package P should be reviewed more, because a new dependency on compiler internals has been added shortly before the release date by a developer who is new to the team.
Funding: Strato Rechenzentrum AG
Duration: since 7/2005
Project Members: Michael Brückner, Uwe Dick, Peter Haider.
Strato is a European provider of webspace and server hosting services.
We analyze the adversarial classification problem of spam
identification. Spam filtering is a game between two opponents, spam
sender and spam filter, that react to each other's moves. We seek to
identify a winning strategy that cannot easily be dodged by spam
senders.
In cooperation with Strato AG, we have developed a spam filter that now processes roughly 1 percent of all emails sent and received worldwide.
Intrusions are attempted on a daily basis. Usually, attackers seek to
exploit insecure web sites in order to send huge amounts of spam emails via Strato's email servers. We develop an intelligent monitoring system that
tracks http requests and discriminates ligitimate use of a web site from attempts to exploit insecure scripts.
Funding: DaimlerChrysler AG
Duration: 08/2005-07/2008
Project Members: Sascha Schulz
We study the problem of discovering trends and new developments in production and warranty databases as well as in workshop reports. We develop technologies that automatically identify such trends and discover their hidden causes. The goal of this project is the constructive analysis of data mining methods that lead to improved service processes by integrating and analyzing textual information and data from multiple, heterogeneous and distributed databases.
Funding: Fresenius-affiliate NephroCare e-Services GmbH
Duration: since 04/2008
Project Members: Jochen Fischer, Manuel Stritt
We investigate model-building and the generation of actionable knowledge from records of dialysis treatments.
Funding: nugg.ad AG
Duration: since 02/2007
Project Members: Christoph Sawade, Arvid Terzibaschian
In this project, we investigate efficient algorithms that predict which ad a user is most likely to click at, based on that user's past clicking behavior and all other information that is available.
(German project title: Text Mining: Wissensentdeckung in Textsammlungen
und Effizienz von Dokumentenverarbeitungsprozessen)
Funding: German Science Foundation DFG
Duration: June 2003 through June 2008
Project Members: Steffen Bickel, Ulf Brefeld.
The amount of documents available in archives and on the web is
growing exponentially. This growth induces a demand for methods that
automatically analyze large volumes of documents, discover and utilize
valuable knowledge contained in them. A substantial part of our working
processes consists of processing (i.e., reading, writing, manipulating)
documents. Many tools support the administration of text documents,
such as file systems, databases, or document management systems. Much
greater efforts (and more expenses), however, are imposed by the actual
document manipulation processes — such as writing documents. Any
support of document manipulation processes requires substantial
knowledge; it is therefore much more difficult to support document
processing rather than document administration.
The goal of the „Text Mining“ project is to develop and study text
mining algorithms that discover knowledge in large document archives,
and
utilize this knowledge to support future text manipulation processes.