| 2009 IEEE International Conference on Systems, Man, and Cybernetics |   | 
Abstract 
In this article, three visual feature extraction methods, namely discrete cosine transform, Gabor transform and discrete wavelet transform, are studied. These three methods are used to extract low-level visual feature vectors from images in a given database separately, then these features vectors are mapped to high level semantic words to annotate images with labels in a given semantic label set. As image annotation can be posed as classification problem, and Gaussian mixture models have been proved to be useful when describing distribution of natural data, our goal is find out which feature extraction method can fit the Gaussian mixture model better in image annotation. In the experiment, a hierarchical extension of expectation-maximization method is used to speed up the annotation process. The performance of three different kinds of feature extraction method is fully analyzed, and we find that discrete cosine transform method is more suitable for Gaussian mixture model in image annotation.