Know2Look: Commonsense Knowledge for Visual Search

Overview

With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of images are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval. Know2Look is an image retrieval framework that portrays the ensemble effect of these three noisy components for improved image search over conventional text-based approaches.

Approach

Our method is based on statistical language models on unigram and bigram textual features. We use visual features in the form of object classes (and their WordNet hypernyms) detected by LSDA object detection algorithm. Our commonsense knowledge features are OpenIE (subject, predicate, object) triples acquired from Wikipedia documents.

Preliminary evaluation results on a small benchmark of 20 queries show promising performance of Know2Look over baseline (conventional Google search).

Appendix

Detailed mathematical formulation of the model can be found here.

Comparison with baselines can be found here.

Datasets

Images and corresponding captions pertaining to domain "tourism" that was used for the evaluation of Know2Look were collected from four different datasets - Flickr 30K, Pascal Sentence Dataset, SBU Captioned Photo Dataset, and MSCOCO.

OpenIE commonsense knowledge triples used for query expansion were extracted from Wikipedia documents.

Query benchmark for evaluation is constructed from Flickr co-occurrence tags.

Media

Know2Look: AKBC 2016 slides

Know2Look: AKBC 2016 poster

Publications

Know2Look: Commonsense Knowledge for Visual Search
Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum
AKBC 2016 [PDF]

Commonsense for Making Sense of Data
Sreyasi Nag Chowdhury
PhD Workshop VLDB 2016 [PDF]