LNCS Homepage
ContentsAuthor IndexSearch

Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment

Jie Zhang1, 2, Shiguang Shan1, Meina Kan1, and Xilin Chen1

1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
jie.zhang@vipl.ict.ac.cn
shiguang.shan@vipl.ict.ac.cn
meina.kan@vipl.ict.ac.cn
xilin.chen@vipl.ict.ac.cn

2University of Chinese Academy of Sciences, Beijing 100049, China

Abstract. Accurate face alignment is a vital prerequisite step for most face perception tasks such as face recognition, facial expression analysis and non-realistic face re-rendering. It can be formulated as the nonlinear inference of the facial landmarks from the detected face region. Deep network seems a good choice to model the nonlinearity, but it is nontrivial to apply it directly. In this paper, instead of a straightforward application of deep network, we propose a Coarse-to-Fine Auto-encoder Networks (CFAN) approach, which cascades a few successive Stacked Auto-encoder Networks (SANs). Specifically, the first SAN predicts the landmarks quickly but accurately enough as a preliminary, by taking as input a low-resolution version of the detected face holistically. The following SANs then progressively refine the landmark by taking as input the local features extracted around the current landmarks (output of the previous SAN) with higher and higher resolution. Extensive experiments conducted on three challenging datasets demonstrate that our CFAN outperforms the state-of-the-art methods and performs in real-time(40+fps excluding face detection on a desktop).

Keywords: Face Alignment, Nonlinear, Deep Learning, Stacked Auto-encoder, Coarse-to-Fine, Real-time

LNCS 8690, p. 1 ff.

Full article in PDF | BibTeX


lncs@springer.com
© Springer International Publishing Switzerland 2014