Abstract

Reinforcement learning (RL) is an exciting area within the domain of Markov Decision Processes (MDPs) in which the underlying optimization problem is solved either in a simulator of the real-world system or via direct interaction with the real-world system, when its underlying transition probabilities are difficult to estimate. The latter is commonly true of large-scale, real-world MDPs with complex underlying transition dynamics. RL is currently being widely researched in the world of medicine/neuroscience after some spectacular success stories demonstrating super-human behavior in computer games. In this paper, we propose a new actor-critic-based RL algorithm for approximately solving continuous state/action MDPs in which the Q-function is used for the critic, in contrast to the usual value function of dynamic programming, and a new model-adaptive random search (MARS) method is employed for the actor. The algorithm is formulated using function approximation and referred to as the MARS actor critic. Further, a discretized version of the same algorithm using exemplars or representative state-action pairs, which is suitable for a tabular setting and referred to as the Tabular Exemplar Approximation (TEA) version, is also proposed. The MARS version is analyzed mathematically for its convergence properties using a two-timescale approach. Both the MARS and the TEA versions are tested numerically: the MARS version is tested on a classical inventory-control problem, while the TEA version is tested on a real-world case study from the domain of remanufacturing.

Department(s)

Engineering Management and Systems Engineering

Comments

National Science Foundation, Grant CMMI-2027452

Keywords and Phrases

Actor critics; Markov decision processes; Reinforcement learning; Remanufacturing

International Standard Serial Number (ISSN)

1572-9338; 0254-5330

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2024 Springer, All rights reserved.

Publication Date

01 Jan 2024

Share

 
COinS