Abstract
This paper studies the semi-Markov decision process (SMDP) under the long-run average reward criterion in the simulation-based context. Using dynamic programming, a straightforward approach for solving this problem involves policy iteration; a value iteration approach for this problem involves a transformation that induces an additional computational burden. In the simulation-based context, however, where one seeks to avoid the transition probabilities needed in dynamic programming, value iteration forms a more convenient route for solution purposes. In this paper, hence, we present (to the best of knowledge for the first time) a relative value iteration algorithm for solving average reward SMDPs via simulation. The algorithm is a semi-Markov extension of an algorithm in the literature for the Markov decision process. Our numerical results with the new algorithm are very encouraging. © 2013 IEEE.
Recommended Citation
A. Gosavi, "Relative Value Iteration for Average Reward Semi-Markov Control Via Simulation," Proceedings of the 2013 Winter Simulation Conference - Simulation: Making Decisions in a Complex World, WSC 2013, pp. 623 - 630, article no. 6721456, Institute of Electrical and Electronics Engineers, Dec 2013.
The definitive version is available at https://doi.org/10.1109/WSC.2013.6721456
Department(s)
Engineering Management and Systems Engineering
International Standard Book Number (ISBN)
978-147993950-3
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
01 Dec 2013