Know2Look: Commonsense Knowledge for Visual Search

Overview

With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of images are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval. Know2Look is an image retrieval framework that portrays the ensemble effect of these three noisy components for improved image search over conventional text-based approaches.

Approach

Our method is based on statistical language models on unigram and bigram textual features. We use visual features in the form of object classes (and their WordNet hypernyms) detected by LSDA object detection algorithm. Our commonsense knowledge features are OpenIE (subject, predicate, object) triples acquired from Wikipedia documents.

Preliminary evaluation results on a small benchmark of 20 queries show promising performance of Know2Look over baseline (conventional Google search).

Appendix

Detailed mathematical formulation of the model can be found here.

Comparison with baselines can be found here.

Datasets

  • Images and corresponding captions pertaining to domain "tourism" that was used for the evaluation of Know2Look were collected from four different datasets - Flickr 30K, Pascal Sentence Dataset, SBU Captioned Photo Dataset, and MSCOCO.
  • Query benchmark for evaluation is constructed from Flickr co-occurrence tags.

Publications

  • Know2Look: Commonsense Knowledge for Visual Search
    Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum
    AKBC 2016 [PDF
  • Commonsense for Making Sense of Data
    Sreyasi Nag Chowdhury
    PhD Workshop VLDB 2016 [PDF