![]() |
|
||
Action Recognition with Stacked Fisher VectorsXiaojiang Peng1, 3, 2, Changqing Zou3, 2, Yu Qiao2, 4, and Qiang Peng1 1Southwest Jiaotong University, Chengdu, China 2Shenzhen Key Lab of CVPR, Shenzhen Institutes of Advanced Technology, CAS, China 3Department of Computer Science, Hengyang Normal University, Hengyang, China 4The Chinese University of Hong Kong, China Abstract. Representation of video is a vital problem in action recognition. This paper proposes Stacked Fisher Vectors (SFV), a new representation with multi-layer nested Fisher vector encoding, for action recognition. In the first layer, we densely sample large subvolumes from input videos, extract local features, and encode them using Fisher vectors (FVs). The second layer compresses the FVs of subvolumes obtained in previous layer, and then encodes them again with Fisher vectors. Compared with standard FV, SFV allows refining the representation and abstracting semantic information in a hierarchical way. Compared with recent mid-level based action representations, SFV need not to mine discriminative action parts but can preserve mid-level information through Fisher vector encoding in higher layer. We evaluate the proposed methods on three challenging datasets, namely Youtube, J-HMDB, and HMDB51. Experimental results demonstrate the effectiveness of SFV, and the combination of the traditional FV and SFV outperforms state-of-the-art methods on these datasets with a large margin. Keywords: Action recognition, Fisher vectors, stacked Fisher vectors, max-margin dimensionality reduction LNCS 8693, p. 581 ff. lncs@springer.com
|