This paper presents an enhanced least-squares approach for solving reinforcement learning control problems. Model-free least-squares policy iteration (LSPI) method has been successfully used for this learning domain. Although LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning, it faces challenging issues in terms of the selection of basis functions and training samples. Inspired by orthogonal least-squares regression (OLSR) method for selecting the centers of RBF neural network, we propose a new hybrid learning method. The suggested approach combines LSPI algorithm with OLSR strategy and uses simulation as a tool to guide the "feature processing" procedure. The results on the learning control of cart-pole system illustrate the effectiveness of the presented scheme.

Meeting Name

International Joint Conference on Neural Networks, 2003


Engineering Management and Systems Engineering

Keywords and Phrases

RBF Neural Network; Adaptive Control; Cart-Pole System; Enhanced Least-Squares Approach; Hybrid Learning Method; Learning (Artificial Intelligence); Learning Systems; Least Squares Approximations; Linear Approximator Architecture; Model-Free Least-Squares Policy Iteration Method; Policy Optimization; Reinforcement Learning Control Problems

International Standard Serial Number (ISSN)


Document Type

Article - Conference proceedings

Document Version

Final Version

File Type





© 2003 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Full Text Link