LNCS Homepage
ContentsAuthor IndexSearch

Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections

Yunchao Gong1, Liwei Wang2, Micah Hodosh2, Julia Hockenmaier2, and Svetlana Lazebnik2

1University of North Carolina at Chapel Hill, USA
yunchao@cs.unc.edu

2University of Illinois at Urbana-Champaign, USA
lwang97@illinois.edu
mhodosh2@illinois.edu
juliahmr@illinois.edu
slazebni@illinois.edu

Abstract. This paper studies the problem of associating images with descriptive sentences by embedding them in a common latent space. We are interested in learning such embeddings from hundreds of thousands or millions of examples. Unfortunately, it is prohibitively expensive to fully annotate this many training images with ground-truth sentences. Instead, we ask whether we can learn better image-sentence embeddings by augmenting small fully annotated training sets with millions of images that have weak and noisy annotations (titles, tags, or descriptions). After investigating several state-of-the-art scalable embedding methods, we introduce a new algorithm called Stacked Auxiliary Embedding that can successfully transfer knowledge from millions of weakly annotated images to improve the accuracy of retrieval-based image description.

LNCS 8692, p. 529 ff.

Full article in PDF | BibTeX


lncs@springer.com
© Springer International Publishing Switzerland 2014