ECCV 2014 - LNCS 8689-8695

Discovering Video Clusters from Visual Features and Noisy Tags^*

Arash Vahdat, Guang-Tong Zhou, and Greg Mori

School of Computing Science, Simon Fraser University, Canada
avahdat@cs.sfu.ca
gza11@cs.sfu.ca
mori@cs.sfu.ca

Abstract. We present an algorithm for automatically clustering tagged videos. Collections of tagged videos are commonplace, however, it is not trivial to discover video clusters therein. Direct methods that operate on visual features ignore the regularly available, valuable source of tag information. Solely clustering videos on these tags is error-prone since the tags are typically noisy. To address these problems, we develop a structured model that considers the interaction between visual features, video tags and video clusters. We model tags from visual features, and correct noisy tags by checking visual appearance consistency. In the end, videos are clustered from the refined tags as well as the visual features. We learn the clustering through a max-margin framework, and demonstrate empirically that this algorithm can produce more accurate clustering results than baseline methods based on tags or visual features, or both. Further, qualitative results verify that the clustering results can discover sub-categories and more specific instances of a given video category.

Electronic Supplementary Material:

Electronic Supplementary Material (PDF 234 KB)

LNCS 8694, p. 526 ff.

Full article in PDF | BibTeX

Discovering Video Clusters from Visual Features and Noisy Tags*

Discovering Video Clusters from Visual Features and Noisy Tags^*