Volume 33, Issue 6 pp. 3807-3825
RESEARCH ARTICLE

Reinforcement learning-based optimal trajectory tracking control of surface vessels under input saturations

Ziping Wei

Ziping Wei

School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China

Search for more papers by this author
Jialu Du

Corresponding Author

Jialu Du

School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China

Correspondence Jialu Du, School of Marine Engineering, Dalian Maritime University, Dalian, Liaoning 116026, China.

Email: [email protected]

Search for more papers by this author
First published: 18 January 2023
Citations: 1

Funding information: Dalian Science and Technology Innovation Fund, Grant/Award Number: 2020JJ26GX020; National Natural Science Foundation of China, Grant/Award Number: 51079013

Abstract

This paper develops a reinforcement learning (RL)-based optimal trajectory tracking control scheme of surface vessels with unknown dynamics, unknown disturbances, and input saturations of surface vessels. The control scheme is designed by combining the optimal control theory, adaptive neural networks, and the RL method in a unified actor-critic NN framework. A hyperbolic-type penalty function of the control input is designed so as to deal with the input saturations of surface vessels. An actor-critic NN-based RL mechanism is established to learn the optimal trajectory tracking control law without the knowledge of the surface vessel dynamics and disturbances, where NN weights are tuned online on the basis of devised tuning laws. Theoretical analysis and simulation results prove that the proposed RL-based optimal trajectory tracking control scheme can ensure surface vessels track the desired trajectory, while guaranteeing the boundedness of all signals in the surface vessel optimal trajectory tracking closed-loop control system.

CONFLICT OF INTEREST

The authors declare that they have no conflict of interest.

DATA AVAILABILITY STATEMENT

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.