Hybrid Least-Squares Methods for Reinforcement Learning
Abstract
Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. the suggested method uses simulation as a tool to guide the "feature configuration" process. the results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.
Recommended Citation
H. Li and C. H. Dagli, "Hybrid Least-Squares Methods for Reinforcement Learning," Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), vol. 2718, pp. 471 - 480, Springer, Jan 2003.
The definitive version is available at https://doi.org/10.1007/3-540-45034-3_47
Department(s)
Engineering Management and Systems Engineering
International Standard Book Number (ISBN)
978-354040455-2
International Standard Serial Number (ISSN)
0302-9743
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Springer, All rights reserved.
Publication Date
01 Jan 2003