Abstract

This paper addresses the infinite horizon optimal tracking control problem for partially uncertain control-affine nonlinear discrete-time (DT) systems, where the control input dynamics are known. Multi-layer critic and actor neural networks (MNNs) are utilized for online estimation of the infinite horizon value function and optimal control input. The NN weights are tuned online using a direct temporal difference error (TDE)-driven learning approach, which modifies the singular values of the gradient with respect to the NN weights to accelerate their convergence. The critic NN uses a novel experience replay technique to improve sample efficiency without introducing biased TDEs and guarantee the persistence of excitation (PE) condition. The tracking error and weight estimation errors are shown to be uniformly ultimately bounded (UUB) using Lyapunov analysis. The performance of the optimal tracking control scheme with experience replay is evaluated on a two-link robot manipulator and contrasted with model predictive control scheme with known dynamics.

Department(s)

Electrical and Computer Engineering

Second Department

Computer Science

Comments

Office of Naval Research, Grant N00014-21-1-2232

Keywords and Phrases

Experience replay; Optimal control; Reinforcement learning; Robotics

International Standard Serial Number (ISSN)

0743-1619

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2025 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Jan 2025

Share

 
COinS