Engineering Management and Systems Engineering Faculty Research & Creative Works

A Model-adaptive Random Search Actor Critic: Convergence Analysis and Inventory-control Case Studies

Yuehan Luo
Jiaqiao Hu
Abhijit Gosavi, Missouri University of Science and TechnologyFollow

Abstract

Reinforcement learning (RL) is an exciting area within the domain of Markov Decision Processes (MDPs) in which the underlying optimization problem is solved either in a simulator of the real-world system or via direct interaction with the real-world system, when its underlying transition probabilities are difficult to estimate. The latter is commonly true of large-scale, real-world MDPs with complex underlying transition dynamics. RL is currently being widely researched in the world of medicine/neuroscience after some spectacular success stories demonstrating super-human behavior in computer games. In this paper, we propose a new actor-critic-based RL algorithm for approximately solving continuous state/action MDPs in which the Q-function is used for the critic, in contrast to the usual value function of dynamic programming, and a new model-adaptive random search (MARS) method is employed for the actor. The algorithm is formulated using function approximation and referred to as the MARS actor critic. Further, a discretized version of the same algorithm using exemplars or representative state-action pairs, which is suitable for a tabular setting and referred to as the Tabular Exemplar Approximation (TEA) version, is also proposed. The MARS version is analyzed mathematically for its convergence properties using a two-timescale approach. Both the MARS and the TEA versions are tested numerically: the MARS version is tested on a classical inventory-control problem, while the TEA version is tested on a real-world case study from the domain of remanufacturing.

Recommended Citation

Y. Luo et al., "A Model-adaptive Random Search Actor Critic: Convergence Analysis and Inventory-control Case Studies," Annals of Operations Research, Springer, Jan 2024.

The definitive version is available at https://doi.org/10.1007/s10479-024-06284-y

Department(s)

Engineering Management and Systems Engineering

Comments

National Science Foundation, Grant CMMI-2027452

Keywords and Phrases

Actor critics; Markov decision processes; Reinforcement learning; Remanufacturing

International Standard Serial Number (ISSN)

1572-9338; 0254-5330

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Jan 2024

Download

Full Text Link

Included in

Operations Research, Systems Engineering and Industrial Engineering Commons

COinS

Engineering Management and Systems Engineering Faculty Research & Creative Works

A Model-adaptive Random Search Actor Critic: Convergence Analysis and Inventory-control Case Studies

Abstract

Recommended Citation

Department(s)

Comments

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Engineering Management and Systems Engineering Faculty Research & Creative Works

A Model-adaptive Random Search Actor Critic: Convergence Analysis and Inventory-control Case Studies

Author

Abstract

Recommended Citation

Department(s)

Comments

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations