2009 IEEE International Conference on
Systems, Man, and Cybernetics |
![]() |
Abstract
Supervised learning is well-known and widely applied in many domains including bioinformatics, cheminformatics and financial forecasting. However, the interference from irrelevant features may lead to the poor accuracy of classifiers. As a popular feature selection model, GA-SVM is desirable in many of those cases to filter out irrelevant features and improve the learning performance subsequently. However, the high computational cost strongly discourages the application of GA-SVM in large-scale datasets. In this paper, an HPC-enabled GA-SVM (HGA-SVM) is proposed by integrating data parallelization, multithreading and heuristic techniques with the ultimate goal of robustness and low computational cost. Our proposed model is comprised of four improvement strategies: 1) GA Parallelization, 2) SVM Parallelization, 3) Neighbor Search and 4) Evaluation Caching. All the four strategies improve various aspects of the feature selection model and contribute collectively towards higher computational throughput.