Decoration
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

Software Projects

YAGO

YAGO is a huge semantic knowledge base. Currently, YAGO knows over 900,000 entities (like persons, organizations, cities, etc.). It knows about 6 million facts about these entities. A Web-Interface allows users to pose questions to YAGO in the form of queries; the data is also available for download.

For further information about the project, please visit the project home page.


RDF-3X

RDF-3X is an RDF storage and retrieval system that achieves excellent performance by following a RISC-style design philosophy. The source code is available for non-commercial usage.

For further information about the project, please visit the project home page.


TopX

TopX is a search engine for ranked retrieval of XML (and plain-text) data, developed at the Max-Planck Institute for Informatics. TopX supports a probabilistic-IR scoring model for full-text content conditions and tag-term combinations, path conditions for all XPath axes as exact or relaxable constraints, and ontology-based relaxation of terms and tag names as similarity conditions for ranked retrieval. For speeding up top-k queries, various techniques are employed: probabilistic models as efficient score predictors for a variant of the threshold algorithm, judicious scheduling of sequential accesses for scanning index lists and random accesses to compute full scores, incremental merging of index lists for on-demand, self-tuning query expansion, and a suite of specifically designed, precomputed indexes to evaluate structural path conditions.

For further information about the project, please visit the project home page.


MINERVA

The peer-to-peer (P2P) approach, which has become popular in the context of file-sharing systems such as Gnutella or KaZaA, allows handling huge amounts of data in a distributed and self-organizing way. In such a system, all peers are equal and all of the functionality is shared among all peers so that there is no single point of failure and the load is evenly balanced across a large number of peers. These characteristics offer enormous potential benefits for search capabilities powerful in terms of scalability, efficiency, and resilience to failures and dynamics. Additionally, such a search engine can potentially benefit from the intellectual input (e.g., bookmarks, query logs, etc.) of a large user community.

For further information about the project, please visit the project home page.


BINGO!

Focused (thematic) crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It involves the automatic classification of visited documents into a user- or community-specific topic hierarchy (ontology). The quality of the training data for the classifier is the most critical issue and potential bottleneck for the effectivity and scale of a focused crawler.

The BINGO! implementation presents an approach to focused crawling that aims to overcome the limitations of the initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic "archetypes" and uses them for periodically re-training the classifier; this way the crawler is dynamically adapted based on the most significant documents seen so far.

The preliminary experiments indicate that the dynamic enhancement of training data based on archetypes improves the overall precision of a focused crawler by a substantial margin.

For further information about the project, please visit the project home page.


MENTOR-lite

High availability of distributed data is an important prerequisite for the efficiency of enterprise-wide business processes, so-called workflows. These workflows consist of several work steps that access different, autonomously managed databases and other information services. The group is developing infrastructure software, so-called middleware, with the goal of coordinated and reliable execution of workflows in highly heterogeneous, distributed information systems. Workflows are specified, executed and monitored by means of state and activity charts. Further, administrative tools are developed on top of the workflow kernel, resulting in a flexible WFMS architecture. The distributed runtime environment being developed aims to provide fault tolerance based on transaction-oriented services as well as efficient access to the history and the current context of workflows.

For further information about the project, please visit the project home page.


INEX Wikipedia XML Collection

The INEX initiative for the evaluation of XML retrieval uses a collection of XMLified Wikipedia articles that has been contributed by the MPI-INF. Besides usual, article-style markup, the collection additionally provides semantic markup of articles and outgoing links, based on the semantic knowledge base YAGO, explicitly labeling more than 5,800 classes of entities like persons, movies, cities, and many more.

For further information, please visit the home page.

Search MPII (type ? for help)