Hybrid Least-Squares Methods for Reinforcement Learning

Abstract

Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. the suggested method uses simulation as a tool to guide the "feature configuration" process. the results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.

Department(s)

Engineering Management and Systems Engineering

International Standard Book Number (ISBN)

978-354040455-2

International Standard Serial Number (ISSN)

0302-9743

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2024 Springer, All rights reserved.

Publication Date

01 Jan 2003

Share

 
COinS