ECCV 2014 - LNCS 8689-8695

Modeling Video Dynamics with Deep Dynencoder

Xing Yan, Hong Chang, Shiguang Shan, and Xilin Chen

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
xing.yan@vipl.ict.ac.cn
hong.chang@vipl.ict.ac.cn
shiguang.shan@vipl.ict.ac.cn
xilin.chen@vipl.ict.ac.cn

Abstract. Videos always exhibit various pattern motions, which can be modeled according to dynamics between adjacent frames. Previous methods based on linear dynamic system can model dynamic textures but have limited capacity of representing sophisticated nonlinear dynamics. Inspired by the nonlinear expression power of deep autoencoders, we propose a novel model named dynencoder which has an autoencoder at the bottom and a variant of it at the top (named as dynpredictor). It generates hidden states from raw pixel inputs via the autoencoder and then encodes the dynamic of state transition over time via the dynpredictor. Deep dynencoder can be constructed by proper stacking strategy and trained by layer-wise pre-training and joint fine-tuning. Experiments verify that our model can describe sophisticated video dynamics and synthesize endless video texture sequences with high visual quality. We also design classification and clustering methods based on our model and demonstrate the efficacy of them on traffic scene classification and motion segmentation.

Keywords: Video Dynamics, Deep Model, Autoencoder, Time Series, Dynamic Textures

LNCS 8692, p. 215 ff.

Full article in PDF | BibTeX