"A Model-adaptive Random Search Actor Critic: Convergence Analysis and " by Yuehan Luo, Jiaqiao Hu et al.
 

Abstract

Reinforcement learning (RL) is an exciting area within the domain of Markov Decision Processes (MDPs) in which the underlying optimization problem is solved either in a simulator of the real-world system or via direct interaction with the real-world system, when its underlying transition probabilities are difficult to estimate. The latter is commonly true of large-scale, real-world MDPs with complex underlying transition dynamics. RL is currently being widely researched in the world of medicine/neuroscience after some spectacular success stories demonstrating super-human behavior in computer games. In this paper, we propose a new actor-critic-based RL algorithm for approximately solving continuous state/action MDPs in which the Q-function is used for the critic, in contrast to the usual value function of dynamic programming, and a new model-adaptive random search (MARS) method is employed for the actor. The algorithm is formulated using function approximation and referred to as the MARS actor critic. Further, a discretized version of the same algorithm using exemplars or representative state-action pairs, which is suitable for a tabular setting and referred to as the Tabular Exemplar Approximation (TEA) version, is also proposed. The MARS version is analyzed mathematically for its convergence properties using a two-timescale approach. Both the MARS and the TEA versions are tested numerically: the MARS version is tested on a classical inventory-control problem, while the TEA version is tested on a real-world case study from the domain of remanufacturing.

Department(s)

Engineering Management and Systems Engineering

Comments

National Science Foundation, Grant CMMI-2027452

Keywords and Phrases

Actor critics; Markov decision processes; Reinforcement learning; Remanufacturing

International Standard Serial Number (ISSN)

1572-9338; 0254-5330

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2024 Springer, All rights reserved.

Publication Date

01 Jan 2024

Plum Print visual indicator of research metrics
PlumX Metrics
  • Usage
    • Downloads: 49
    • Abstract Views: 1
  • Captures
    • Readers: 1
see details

Share

 
COinS
 
 
 
BESbswy