Department D5 of the Max Planck Institute for Informatics
We are witnessing an explosion of digital information. The Internet provides a seemingly endless amount of data that is constantly growing. From a technical viewpoint, this poses tremendous challenges regarding the intelligent organization, semantic search, and deep analysis of the data. This concerns not just the data and knowledge on the Web, but also in databases, social media, digital libraries, and scientific data repositories. The group's long-term objective is to develop methodologies for knowledge discovery: extracting, organizing, searching, exploring, and ranking facts from structured, semistructured, textual, and multimodal information sources. Our approach towards this ambitious goal combines concepts, models, and algorithms from several fields, including database systems, information retrieval, natural language processing, statistical learning, and data mining. Highlights from our ongoing research include the YAGO2 knowledge base, the AIDA tool for named entity disambiguation, the XML search engine TopX, the RDF search engine RDF-3X, and the work on robust and scalable fact discovery from web sources, in the context of a Google Focused Research Award. The group is coordinating the EU project LAWA on longitudinal analytics of web archive data.