Abstract
The semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon. API techniques have attracted significant interest in the literature recently. We first present and analyze an extension of an existing API algorithm for discounted reward that can handle continuous reward rates. Then, we also consider its average reward counterpart, which requires an updating based on the stochastic shortest path (SSP). We study the convergence properties of the algorithm that does not require the SSP update. © 2011 Published by Elsevier Ltd.
Recommended Citation
A. Gosavi, "Approximate Policy Iteration for Semi-Markov Control Revisited," Procedia Computer Science, vol. 6, pp. 249 - 255, Elsevier, Jan 2011.
The definitive version is available at https://doi.org/10.1016/j.procs.2011.08.046
Department(s)
Engineering Management and Systems Engineering
Publication Status
Open Access
Keywords and Phrases
Approximate policy iteration; Average reward; Reinforcement learning; Semi-Markov
International Standard Serial Number (ISSN)
1877-0509
Document Type
Article - Conference proceedings
Document Version
Final Version
File Type
text
Language(s)
English
Rights
© 2024 Elsevier, All rights reserved.
Creative Commons Licensing
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Publication Date
01 Jan 2011