Engineering Management and Systems Engineering Faculty Research & Creative Works

Deep Reinforcement Learning for Approximate Policy Iteration: Convergence Analysis and a Post-Earthquake Disaster Response Case Study

Abhijit Gosavi, Missouri University of Science and TechnologyFollow
L. (Lesley) H. Sneed, Missouri University of Science and TechnologyFollow
L. A. Spearing

Abstract

Approximate Policy Iteration (API) is a Class of Reinforcement Learning (RL) Algorithms that Seek to Solve the Long-Run Discounted Reward Markov Decision Process (MDP), Via the Policy Iteration Paradigm, Without Learning the Transition Model in the Underlying Bellman Equation. Unfortunately, These Algorithms Suffer from a Defect Known as Chattering in Which the Solution (Policy) Delivered in Each Iteration of the Algorithm Oscillates between Improved and Worsened Policies, Leading to Sub-Optimal Behavior. Two Causes for This that Have Been Traced to the Crucial Policy Improvement Step Are: (I) the Inaccuracies in the Policy Improvement Function and (Ii) the Exploration/exploitation Tradeoff Integral to This Step, Which Generates Variability in Performance. Both of These Defects Are Amplified by Simulation Noise. Deep RL Belongs to a Newer Class of Algorithms in Which the Resolution of the Learning Process is Refined Via Mechanisms Such as Experience Replay And/or Deep Neural Networks for Improved Performance. in This Paper, a New Deep Learning Approach is Developed for API Which Employs a More Accurate Policy Improvement Function, Via an Enhanced Resolution Bellman Equation, Thereby Reducing Chattering and Eliminating the Need for Exploration in the Policy Improvement Step. Versions of the New Algorithm for Both the Long-Run Discounted MDP and Semi-MDP Are Presented. Convergence Properties of the New Algorithm Are Studied Mathematically, and a Post-Earthquake Disaster Response Case Study is Employed to Demonstrate Numerically the Algorithm's Efficacy.

Recommended Citation

A. Gosavi et al., "Deep Reinforcement Learning for Approximate Policy Iteration: Convergence Analysis and a Post-Earthquake Disaster Response Case Study," Optimization Letters, Springer, Jan 2023.

The definitive version is available at https://doi.org/10.1007/s11590-023-02062-0

Department(s)

Engineering Management and Systems Engineering

Second Department

Civil, Architectural and Environmental Engineering

Keywords and Phrases

Approximate policy iteration; Deep reinforcement learning; Disaster response; Model building; Noise reduction

International Standard Serial Number (ISSN)

1862-4480; 1862-4472

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Jan 2023

Download

Full Text Link

Included in

Operations Research, Systems Engineering and Industrial Engineering Commons, Structural Engineering Commons, Structural Materials Commons

COinS

Engineering Management and Systems Engineering Faculty Research & Creative Works

Deep Reinforcement Learning for Approximate Policy Iteration: Convergence Analysis and a Post-Earthquake Disaster Response Case Study

Abstract

Recommended Citation

Department(s)

Second Department

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations

Engineering Management and Systems Engineering Faculty Research & Creative Works

Deep Reinforcement Learning for Approximate Policy Iteration: Convergence Analysis and a Post-Earthquake Disaster Response Case Study

Author

Abstract

Recommended Citation

Department(s)

Second Department

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations