Abstract
Reinforcement learning (RL) is a simulation-Based technique to solve Markov decision problems or processes (MDPs). It is especially useful if the transition probabilities in the MDP are hard to find or if the number of states in the problem is too large. in this paper, we present a new model-Based RL algorithm that builds the transition probability model without the generation of the transition probabilities; the literature on model-Based RL attempts to compute the transition probabilities. We also present a variance-penalized Bellman equation and an RL algorithm that uses it to solve a variance-penalized MDP. We conclude with some numerical experiments with these algorithms. ©2009 IEEE.
Recommended Citation
A. Gosavi, "Reinforcement Learning for Model Building and Variance-penalized Control," Proceedings - Winter Simulation Conference, pp. 373 - 379, article no. 5429344, Institute of Electrical and Electronics Engineers, Dec 2009.
The definitive version is available at https://doi.org/10.1109/WSC.2009.5429344
Department(s)
Engineering Management and Systems Engineering
International Standard Book Number (ISBN)
978-142445770-0
International Standard Serial Number (ISSN)
0891-7736
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
01 Dec 2009