LNCS Homepage
ContentsAuthor IndexSearch

Action Recognition with Stacked Fisher Vectors

Xiaojiang Peng1, 3, 2, Changqing Zou3, 2, Yu Qiao2, 4, and Qiang Peng1

1Southwest Jiaotong University, Chengdu, China

2Shenzhen Key Lab of CVPR, Shenzhen Institutes of Advanced Technology, CAS, China

3Department of Computer Science, Hengyang Normal University, Hengyang, China

4The Chinese University of Hong Kong, China

Abstract. Representation of video is a vital problem in action recognition. This paper proposes Stacked Fisher Vectors (SFV), a new representation with multi-layer nested Fisher vector encoding, for action recognition. In the first layer, we densely sample large subvolumes from input videos, extract local features, and encode them using Fisher vectors (FVs). The second layer compresses the FVs of subvolumes obtained in previous layer, and then encodes them again with Fisher vectors. Compared with standard FV, SFV allows refining the representation and abstracting semantic information in a hierarchical way. Compared with recent mid-level based action representations, SFV need not to mine discriminative action parts but can preserve mid-level information through Fisher vector encoding in higher layer. We evaluate the proposed methods on three challenging datasets, namely Youtube, J-HMDB, and HMDB51. Experimental results demonstrate the effectiveness of SFV, and the combination of the traditional FV and SFV outperforms state-of-the-art methods on these datasets with a large margin.

Keywords: Action recognition, Fisher vectors, stacked Fisher vectors, max-margin dimensionality reduction

LNCS 8693, p. 581 ff.

Full article in PDF | BibTeX


lncs@springer.com
© Springer International Publishing Switzerland 2014