The Boundedness Conditions for Model-Free HDP(λ)

Seaar Al-Dabooni
Donald C. Wunsch, Missouri University of Science and TechnologyFollow

Abstract

This paper provides the stability analysis for a model-free action-dependent heuristic dynamic programing (HDP) approach with an eligibility trace long-term prediction parameter ( λ ). HDP( λ ) learns from more than one future reward. Eligibility traces have long been popular in Q-learning. This paper proves and demonstrates that they are worthwhile to use with HDP. In this paper, we prove its uniformly ultimately bounded (UUB) property under certain conditions. Previous works present a UUB proof for traditional HDP [HDP( λ =0 )], but we extend the proof with the λ parameter. By using Lyapunov stability, we demonstrate the boundedness of the estimated error for the critic and actor neural networks as well as learning rate parameters. Three case studies demonstrate the effectiveness of HDP( λ ). The trajectories of the internal reinforcement signal nonlinear system are considered as the first case. We compare the results with the performance of HDP and traditional temporal difference [TD( λ )] with different λ values. The second case study is a single-link inverted pendulum. We investigate the performance of the inverted pendulum by comparing HDP( λ ) with regular HDP, with different levels of noise. The third case study is a 3-D maze navigation benchmark, which is compared with state action reward state action, Q( λ ), HDP, and HDP( λ ). All these simulation results illustrate that HDP( λ ) has a competitive performance; thus this contribution is not only UUB but also useful in comparison with traditional HDP.

Recommended Citation

S. Al-Dabooni and D. C. Wunsch, "The Boundedness Conditions for Model-Free HDP(λ)," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 7, pp. 1928 - 1942, Institute of Electrical and Electronics Engineers (IEEE), Jul 2019.

The definitive version is available at https://doi.org/10.1109/TNNLS.2018.2875870

Department(s)

Electrical and Computer Engineering

Research Center/Lab(s)

Center for High Performance Computing Research

Keywords and Phrases

Action Dependent (AD); Approximate Dynamic Programing (ADP); Heuristic Dynamic Programing (HDP); Lyapunov Stability; Model Free; Uniformly Ultimately Bounded (UUB); Î»-Return

International Standard Serial Number (ISSN)

2162-237X

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Jul 2019

Electrical and Computer Engineering Faculty Research & Creative Works

The Boundedness Conditions for Model-Free HDP(λ)

Abstract

Recommended Citation

Department(s)

Research Center/Lab(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Electrical and Computer Engineering Faculty Research & Creative Works

The Boundedness Conditions for Model-Free HDP(λ)

Author

Abstract

Recommended Citation

Department(s)

Research Center/Lab(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations