This paper presents an enhanced least-squares approach for solving reinforcement learning control problems. Model-free least-squares policy iteration (LSPI) method has been successfully used for this learning domain. Although LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning, it faces challenging issues in terms of the selection of basis functions and training samples. Inspired by orthogonal least-squares regression (OLSR) method for selecting the centers of RBF neural network, we propose a new hybrid learning method. The suggested approach combines LSPI algorithm with OLSR strategy and uses simulation as a tool to guide the "feature processing" procedure. The results on the learning control of cart-pole system illustrate the effectiveness of the presented scheme.
H. Li and C. H. Dagli, "An Enhanced Least-squares Approach for Reinforcement Learning," Proceedings of the International Joint Conference on Neural Networks, 2003, Institute of Electrical and Electronics Engineers (IEEE), Jan 2003.
The definitive version is available at http://dx.doi.org/10.1109/IJCNN.2003.1224032
International Joint Conference on Neural Networks, 2003
Engineering Management and Systems Engineering
Keywords and Phrases
RBF Neural Network; Adaptive Control; Cart-Pole System; Enhanced Least-Squares Approach; Hybrid Learning Method; Learning (Artificial Intelligence); Learning Systems; Least Squares Approximations; Linear Approximator Architecture; Model-Free Least-Squares Policy Iteration Method; Policy Optimization; Reinforcement Learning Control Problems
International Standard Serial Number (ISSN)
Article - Conference proceedings
© 2003 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.