IEEE SMC 2009

[P549] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
Author(s): Tsang-Long Pao, Wen-Yuan Liao, Tsan-Nung Wu, Ching-Yi Lin

Abstract
Automatic speech recognition (ASR) by machine has been a goal and an attractive research area for past several decades. In recent years, there have been many automatic speech-reading systems proposed that combine audio and visual speech features. In the proposed approach, we extract the visual features of the lips using the proposed automatic extraction approach. These features are important to the audio-visual speech recognition system, especially in noisy condition. The extraction algorithm segments the lip region by making use of both color and edge information. We then establish a set of visual speech parameters and incorporate them into the recognizer. The WD-KNN classifier is used as the recognition engine in this paper. We study recognition performance using various visual features to explore their impact on the recognition accuracy. These features include the geometric and the motion of the lips. The experimental results based on Mandarin database demonstrate that the visual information is highly effective for improving the recognition performance.

Systems, Man, and Cybernetics