LNCS Homepage
ContentsAuthor IndexSearch

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics

Xiaojiang Peng1,4,3, Limin Wang2,3, Yu Qiao3, and Qiang Peng1

1Southwest Jiaotong University, Chengdu, China

2Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong, China

3Shenzhen Key Lab of CVPR, Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, China

4Hengyang Normal University, Hengyang, China

Abstract. Recent studies show that aggregating local descriptors into super vector yields effective representation for retrieval and classification tasks. A popular method along this line is vector of locally aggregated descriptors (VLAD), which aggregates the residuals between descriptors and visual words. However, original VLAD ignores high-order statistics of local descriptors and its dictionary may not be optimal for classification tasks. In this paper, we address these problems by utilizing high-order statistics of local descriptors and peforming supervised dictionary learning. The main contributions are twofold. Firstly, we propose a high-order VLAD (H-VLAD) for visual recognition, which leverages two kinds of high-order statistics in the VLAD-like framework, namely diagonal covariance and skewness. These high-order statistics provide complementary information for VLAD and allow for efficient computation. Secondly, to further boost the performance of H-VLAD, we design a supervised dictionary learning algorithm to discriminatively refine the dictionary, which can be also extended for other super vector based encoding methods. We examine the effectiveness of our methods in image-based object categorization and video-based action recognition. Extensive experiments on PASCAL VOC 2007, HMDB51, and UCF101 datasets exhibit that our method achieves the state-of-the-art performance on both tasks.

LNCS 8691, p. 660 ff.

Full article in PDF | BibTeX


lncs@springer.com
© Springer International Publishing Switzerland 2014