Online Barrier-Actor-Critic Learning for H∞ Control with Full-State Constraints and Input Saturation
Abstract
This paper develops a novel adaptive optimal control design method with full-state constraints and input saturation in the presence of external disturbance. First, to consider the full-state constraints, a barrier function is developed for system transformation. Moreover, it is shown that, with the barrier-function-based system transformation, the stabilization of the transformed system is equivalent to the original constrained control problem. Second, the disturbance attenuation problem is formulated within the zero-sum differential games framework. To determine the optimal control and the worst-case disturbance, a novel barrier-actor-critic algorithm is presented for adaptive optimal learning while guaranteeing the full-state constraints and input saturation. It is proven that the closed-loop signals remain bounded during the online learning phase. Finally, simulation studies are conducted to demonstrate the effectiveness of the presented barrier-actor-critic learning algorithm.
Recommended Citation
Y. Yang et al., "Online Barrier-Actor-Critic Learning for H∞ Control with Full-State Constraints and Input Saturation," Journal of the Franklin Institute, vol. 357, no. 6, pp. 3316 - 3344, Elsevier Ltd, Apr 2020.
The definitive version is available at https://doi.org/10.1016/j.jfranklin.2019.12.017
Department(s)
Electrical and Computer Engineering
Research Center/Lab(s)
Center for High Performance Computing Research
Keywords and Phrases
E-Learning, Actor-Critic Algorithm; Actor-Critic Learning; Adaptive Optimal Control; Closed-Loop Signals; Constrained Controls; Disturbance Attenuation; External Disturbances; Zero-Sum Differential Games, Learning Algorithms
International Standard Serial Number (ISSN)
0016-0032
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2020 The Franklin Institute, All rights reserved.
Publication Date
01 Apr 2020
Comments
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61903028, No. 61873028 and No. 61333002, in part by the China Post-Doctoral Science Foundation under Grant 2018M641197, in part by the Fundamental Research Funds for the Central Universities under grant FRF-TP-18-031A1 and FRF-BD-19-002A, in part by the Lifelong Learning Machines program from DARPA/Microsystems Technology Office and in part by the Army Research Laboratory under Cooperative Agreement Number W911NF-18-2-0260.