Convergence of Recurrent Neuro-Fuzzy Value-Gradient Learning with and Without an Actor
In recent years, a gradient of the $n$-step temporal-difference [TD(λ)] learning has been developed to present an advanced adaptive dynamic programming (ADP) algorithm, called value-gradient learning [VGL(λ)]. In this paper, we improve the VGL(λ) architecture, which is called the 'single adaptive actor network [SNVGL(λ)]' because it has only a single approximator function network (critic) instead of dual networks (critic and actor) as in VGL(λ). Therefore, SNVGL(λ) has lower computational requirements when compared to VGL(λ). Moreover, in this paper, a recurrent hybrid neuro-fuzzy (RNF) and a first-order Takagi-Sugeno RNF (TSRNF) are derived and implemented to build the critic and actor networks. Furthermore, we develop the novel study of the theoretical convergence proofs for both VGL(λ) and SNVGL(λ) under certain conditions. In this paper, mobile robot simulation model (model based) is used to solve the optimal control problem for affine nonlinear discrete-time systems. Mobile robot is exposed various noise levels to verify the performance and to validate the theoretical analysis.
S. Al-Dabooni and D. C. Wunsch, "Convergence of Recurrent Neuro-Fuzzy Value-Gradient Learning with and Without an Actor," IEEE Transactions on Fuzzy Systems, vol. 28, no. 4, pp. 658 - 672, Institute of Electrical and Electronics Engineers (IEEE), Apr 2020.
The definitive version is available at https://doi.org/10.1109/TFUZZ.2019.2912349
Electrical and Computer Engineering
Center for High Performance Computing Research
Keywords and Phrases
Adaptive Dynamic Programming (ADP); Convergence Analysis; Eligibility Traces; Mobile Robot; Recurrent Neuro-Fuzzy (RNF); Takagi-Sugeno (T-S) Neuro-Fuzzy
International Standard Serial Number (ISSN)
Article - Journal
© 2020 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.
01 Apr 2020