Electrical and Computer Engineering Faculty Research & Creative Works

Online Model-Free N-Step HDP with Stability Analysis

Seaar Al-Dabooni
Donald C. Wunsch, Missouri University of Science and TechnologyFollow

Abstract

Because of a powerful temporal-difference (TD) with λ [TD(λ)] learning method, this paper presents a novel n-step adaptive dynamic programming (ADP) architecture that combines TD(λ) with regular TD learning for solving optimal control problems with reduced iterations. In contrast with a backward view learning of TD(λ) that is required an extra parameter named eligibility traces to update at the end of each episode (offline training), the new design in this paper has forward view learning, which is updated at each time step (online training) without needing the eligibility trace parameter in various applications without mathematical models. Therefore, the new design is called the online model-free n-step action-dependent (AD) heuristic dynamic programming [NSHDP(λ)]. NSHDP(λ) has three neural networks: the critic network (CN) with regular one-step TD [TD(0)], the CN with n-step TD learning [or TD(λ)], and the actor network (AN). Because the forward view learning does not require any extra eligibility traces associated with each state, the NSHDP(λ) architecture has low computational costs and is memory efficient. Furthermore, the stability is proven for NSHDP(λ) under certain conditions by using Lyapunov analysis to obtain the uniformly ultimately bounded (UUB) property. We compare the results with the performance of HDP and traditional action-dependent HDP(λ) [ADHDP(λ)] with different λ values. Moreover, a complex nonlinear system and 2-D maze problem are two simulation benchmarks in this paper, and the third one is an inverted pendulum simulation benchmark, which is presented in the supplemental material part of this paper. NSHDP(λ) performance is examined and compared with other ADP methods.

Recommended Citation

S. Al-Dabooni and D. C. Wunsch, "Online Model-Free N-Step HDP with Stability Analysis," IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 4, pp. 1255 - 1269, Institute of Electrical and Electronics Engineers (IEEE), Apr 2020.

The definitive version is available at https://doi.org/10.1109/TNNLS.2019.2919614

Department(s)

Electrical and Computer Engineering

Research Center/Lab(s)

Intelligent Systems Center

Second Research Center/Lab

Center for High Performance Computing Research

Comments

This work was supported in part by the Missouri University of Science and Technology Intelligent Systems Center, in part by the Mary K. Finley Missouri Endowment, in part by the National Science Foundation, in part by the Lifelong Learning Machines program from DARPA/Microsystems Technology Office, in part by the Army Research Laboratory (ARL) under Contract W911NF-18-2-0260.

Keywords and Phrases

λ-Return; Action-Dependent (AD) Heuristic Dynamic Programming (ADHDP); Adaptive Dynamic Programming (ADP); Lyapunov Stability; Uniformly Ultimately Bounded (UUB)

International Standard Serial Number (ISSN)

2162-237X

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Apr 2020

Link to Full Text

COinS

Electrical and Computer Engineering Faculty Research & Creative Works

Online Model-Free N-Step HDP with Stability Analysis

Abstract

Recommended Citation

Department(s)

Research Center/Lab(s)

Second Research Center/Lab

Comments

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Electrical and Computer Engineering Faculty Research & Creative Works

Online Model-Free N-Step HDP with Stability Analysis

Author

Abstract

Recommended Citation

Department(s)

Research Center/Lab(s)

Second Research Center/Lab

Comments

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations