2009 IEEE International Conference on
Systems, Man, and Cybernetics |
![]() |
Abstract
There exist problems of slow convergence and local optimum in standard Q-learning algorithm. Truncated TD estimate returns efficiency and simulated annealing algorithm increase the chance of exploration. To accelerate the algorithm convergence speed and to avoid results in local optimum, this paper combines Q-learning algorithm, truncated TD estimation and simulated annealing algorithm. We apply improved Q-learning algorithm using into the imperfect information game (SiGuo military chess game), and realize a self-learning of imperfect information game system. Experimental outcomes show that this system can dynamically adjust each weight which describes game state according to the results. Further, it speeds up the process of learning, effectively simulate human intelligence and make reasonable step, and significantly improve system performance.