
With the proliferation of photo and video footage on the Web, a knowledge base would not be complete without multimodal data on individual entities (people, places, etc.) and important events (concerts, award ceremonies, soccer matches, etc.). While photos of celebrities are abundant on the Internet, they are much harder to retrieve for less popular entities such as notable computer scientists or regionally interesting churches. Querying the entity names in image search engines yields large candidate lists, but they often have low precision and unsatisfactory recall. Moreover, even for more prominent targets, it is desirable to have a diverse collection of photos (e.g., from different time periods), some of which might be rare and difficult to locate using search engines. In some cases, the ambiguity of the entity name dilutes the search engine results. An example is the Berkeley professor and former ACM president David Patterson. None of the top-20 Google image or Bing image results (as of August 2009) show him; most show the governor of New York (whose name is actually David Paterson). An approach to overcome these problems is presented in our work: Gathering and Ranking Photos of Named Entities with High Precision, High Recall, and Diversity. It is based on knowledge-driven query expansions and weighted ensemble voting on the results.
Part of the YAGO-NAGA project at the Max-Planck Institute for Informatics in Saarbrücken/Germany.