International Journal of Robust and Nonlinear Control

Volume 33, Issue 6 pp. 3807-3825

RESEARCH ARTICLE

Reinforcement learning-based optimal trajectory tracking control of surface vessels under input saturations

Ziping Wei,

Ziping Wei

orcid.org/0000-0002-8133-1076

School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China

Search for more papers by this author

Jialu Du,

Corresponding Author

Jialu Du

[email protected]

orcid.org/0000-0002-9809-0862

School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China

Correspondence Jialu Du, School of Marine Engineering, Dalian Maritime University, Dalian, Liaoning 116026, China.

Email: [email protected]

Search for more papers by this author

Ziping Wei,

Ziping Wei

orcid.org/0000-0002-8133-1076

School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China

Search for more papers by this author

Jialu Du,

Corresponding Author

Jialu Du

[email protected]

orcid.org/0000-0002-9809-0862

School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China

Correspondence Jialu Du, School of Marine Engineering, Dalian Maritime University, Dalian, Liaoning 116026, China.

Email: [email protected]

Search for more papers by this author

First published: 18 January 2023

https://doi.org/10.1002/rnc.6597

Citations: 1

Funding information: Dalian Science and Technology Innovation Fund, Grant/Award Number: 2020JJ26GX020; National Natural Science Foundation of China, Grant/Award Number: 51079013

Share a link

Email
Wechat
Bluesky

Abstract

This paper develops a reinforcement learning (RL)-based optimal trajectory tracking control scheme of surface vessels with unknown dynamics, unknown disturbances, and input saturations of surface vessels. The control scheme is designed by combining the optimal control theory, adaptive neural networks, and the RL method in a unified actor-critic NN framework. A hyperbolic-type penalty function of the control input is designed so as to deal with the input saturations of surface vessels. An actor-critic NN-based RL mechanism is established to learn the optimal trajectory tracking control law without the knowledge of the surface vessel dynamics and disturbances, where NN weights are tuned online on the basis of devised tuning laws. Theoretical analysis and simulation results prove that the proposed RL-based optimal trajectory tracking control scheme can ensure surface vessels track the desired trajectory, while guaranteeing the boundedness of all signals in the surface vessel optimal trajectory tracking closed-loop control system.

CONFLICT OF INTEREST

The authors declare that they have no conflict of interest.

Open Research

DATA AVAILABILITY STATEMENT

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

REFERENCES

1Deng Y, Zhang X, Im N, Zhang G, Zhang Q. Adaptive fuzzy tracking control for underactuated surface vessels with unmodeled dynamics and input saturation. ISA Trans. 2020; 103: 52-62.
10.1016/j.isatra.2020.04.010
PubMed Web of Science® Google Scholar
2Wang N, Er M. Direct adaptive fuzzy tracking control of marine vehicles with fully unknown parametric dynamics and uncertainties. IEEE Trans Control Syst Technol. 2016; 24(5): 1845-1852.
10.1109/TCST.2015.2510587
Web of Science® Google Scholar
3Weng Y, Wang N. Data-driven robust backstepping control of unmanned surface vehicles. Int J Robust Nonlinear Control. 2020; 30(9): 3624-3638.
10.1002/rnc.4956
Web of Science® Google Scholar
4Yu R, Zhu Q, Xia G, Liu Z. Sliding mode tracking control of an underactuated surface vessel. IET Control Theory Appl. 2012; 6(3): 461-466.
10.1049/iet-cta.2011.0176
Web of Science® Google Scholar
5Shen Z, Bi Y, Wang Y, Guo C. MLP neural network-based recursive sliding mode dynamic surface control for trajectory tracking of fully actuated surface vessel subject to unknown dynamics and input saturation. Neurocomput. 2020; 377: 103-112.
10.1016/j.neucom.2019.08.090
Web of Science® Google Scholar
6Zhao Z, He W, Ge S. Adaptive neural network control of a fully actuated marine surface vessel with multiple output constraints. IEEE Trans Control Syst Technol. 2014; 22(4): 1536-1543.
10.1109/TCST.2013.2281211
Web of Science® Google Scholar
7Qin J, Du J. Robust adaptive asymptotic trajectory tracking control for underactuated surface vessels subject to unknown dynamics and input saturation. J Mar Sci Technol. 2021; 27: 307-319.
10.1007/s00773-021-00835-9
Web of Science® Google Scholar
8Liu J, Du J. Composite learning tracking control for underactuated autonomous underwater vehicle with unknown dynamics and disturbances in three-dimension space. Appl Ocean Res. 2021; 112:10286.
10.1016/j.apor.2021.102686
Web of Science® Google Scholar
9Yang X, He H. Adaptive critic designs for event-triggered robust control of nonlinear systems with unknown dynamics. IEEE Trans Cybern. 2019; 49(6): 2255-2267.
10.1109/TCYB.2018.2823199
PubMed Web of Science® Google Scholar
10Yang X, He H. Decentralized event-triggered control for a class of nonlinear-interconnected systems using reinforcement learning. IEEE Trans Cybern. 2021; 51(2): 635-648.
10.1109/TCYB.2019.2946122
PubMed Web of Science® Google Scholar
11Sutton R, Barto A. Introduction to Reinforcement Learning. Vol 135. MIT Press; 1998.
10.1109/TNN.1998.712192
Google Scholar
12Vamvoudakis K, Safaei F, Hespanha J. Robust event-triggered output feedback learning algorithm for voltage source inverters with unknown load and parameter variations. Int J Robust Nonlinear Control. 2019; 29(11): 3502-3517.
10.1002/rnc.4565
Web of Science® Google Scholar
13Zhao J, Na J, Gao G. Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties. Neurocomput. 2020; 395: 56-65.
10.1016/j.neucom.2020.02.025
Web of Science® Google Scholar
14Vamvoudakis K, Miranda M, Hespanha J. Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation. IEEE Trans Neural Netw Learn Syst. 2016; 27(11): 2386-2398.
10.1109/TNNLS.2015.2487972
PubMed Web of Science® Google Scholar
15Kokolakis N, Vamvoudakis K. Safety-aware pursuit-evasion games in unknown environments using gaussian processes and finite-time convergent reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2022. doi: 10.1109/TNNLS.2022.3203977
10.1109/TNNLS.2022.3203977
PubMed Web of Science® Google Scholar
16Vamvoudakis K, Ferraz H. Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance. Automatica. 2018; 87: 412-420.
10.1016/j.automatica.2017.03.013
Web of Science® Google Scholar
17Wen G, Chen C, Ge S, Yang H, Liu X. Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy. IEEE Trans Industr Inform. 2019; 15(9): 4969-4977.
10.1109/TII.2019.2894282
Web of Science® Google Scholar
18Zhang K, Zhang H, Gao Z, Su H. Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures. J Frank Inst Eng Appl Math. 2018; 355(15): 6947-6968.
10.1016/j.jfranklin.2018.07.009
Web of Science® Google Scholar
19Vamvoudakis K, Mojoodi A, Ferraz H. Event-triggered optimal tracking control of nonlinear systems. Int J Robust Nonlinear Control. 2017; 27(4): 598-619.
10.1002/rnc.3587
Web of Science® Google Scholar
20Wang D, Ha M, Zhao M. The intelligent critic framework for advanced optimal control. Artif Intell Rev. 2022; 55(1): 1-22.
10.1007/s10462-021-10118-9
CAS Web of Science® Google Scholar
21Wang D, Ren J, Ha M. Value-iteration-based affine nonlinear optimal control involving admissibility discussion. Int J Robust Nonlinear Control. 2022; 32(13): 7290-7303.
10.1002/rnc.6213
Web of Science® Google Scholar
22Wang D, Hu L, Qiao J. Multi-event-triggered adaptive critic control with guaranteed cost for discrete-time nonlinear nonzero-sum games. Int J Robust Nonlinear Control. 2022;32(18):10292-10308.
10.1002/rnc.6372
PubMed Web of Science® Google Scholar
23Na J, Wang B, Li G, Zhan S, He W. Nonlinear constrained optimal control of wave energy converters with adaptive dynamic programming. IEEE Trans Ind Electron. 2019; 66(10): 7904-7915.
10.1109/TIE.2018.2880728
Web of Science® Google Scholar
24Wang D, Zhao M, Qiao J. Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. Int J Robust Nonlinear Control. 2021; 31(14): 6773-6787.
10.1002/rnc.5639
Web of Science® Google Scholar
25Yin Z, He W, Yang C, Sun C. Control design of a marine vessel system using reinforcement learning. Neurocomput. 2018; 311: 353-362.
10.1016/j.neucom.2018.05.061
Web of Science® Google Scholar
26Xia Y, Xu K, Wang W, Xu G, Xiang X, Li Y. Optimal robust trajectory tracking control of a X-rudder AUV with velocity sensor failures and uncertainties. Ocean Eng. 2020; 198:106949.
Web of Science® Google Scholar
27Wen G, Ge S, Chen C, Tu F, Wang S. Adaptive tracking control of surface vessel using optimized backstepping technique. IEEE Trans Cybern. 2019; 49(9): 3420-3431.
10.1109/TCYB.2018.2844177
PubMed Web of Science® Google Scholar
28Zhang J, Yang G. Fault-tolerant leader-follower formation control of marine surface vessels with unknown dynamics and actuator faults. Int J Robust Nonlinear Control. 2018; 28(14): 4188-4208.
10.1002/rnc.4228
Web of Science® Google Scholar
29Vrabie D, Lewis F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 2009; 22(3): 237-246.
10.1016/j.neunet.2009.03.008
PubMed Web of Science® Google Scholar
30Song R, Lewis F, Wei Q, Zhang H, Jiang Z, Levine D. Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Trans Neural Netw Learn Syst. 2015; 26(4): 851-865.
10.1109/TNNLS.2015.2399020
PubMed Web of Science® Google Scholar
31Zhao W, Liu H, Lewis F, Wang X. Data-driven optimal formation control for quadrotor team with unknown dynamics. IEEE Trans Cybern. 2021;52(8):7889-7898.
Web of Science® Google Scholar
32Wu H, Song S, You K, Wu C. Depth control of model-free auvs via reinforcement learning. IEEE Trans Syst Man Cybern Syst. 2019; 49(12): 2499-2510.
10.1109/TSMC.2017.2785794
Web of Science® Google Scholar
33Jiang P, Song S, Huang G. Attention-based meta-reinforcement learning for tracking control of auv with time-varying dynamics. IEEE Trans Neural Netw Learn Syst. 2021;33(11):6388-6401.
Google Scholar
34Na J, Li G, Wang B, Herrmann G, Zhang S. Robust optimal control of wave energy converters based on adaptive dynamic programming. IEEE Trans Sustain Energy. 2019; 10(2): 961-970.
10.1109/TSTE.2018.2856802
Web of Science® Google Scholar
35Zhao W, Liu H, Lewis F. Robust formation control for cooperative underactuated quadrotors via reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2021; 32(10): 4577-4587.
10.1109/TNNLS.2020.3023711
PubMed Web of Science® Google Scholar
36Hu X, Du J, Zhu G, Sun Y. Robust adaptive NN control of dynamically positioned vessels under input constraints. Neurocomput. 2018; 318: 201-212.
10.1016/j.neucom.2018.08.056
Web of Science® Google Scholar
37Du J, Hu X, Krstic M, Sun Y. Robust dynamic positioning of ships with disturbances under input saturation. Automatica. 2016; 73: 207-214.
10.1016/j.automatica.2016.06.020
Web of Science® Google Scholar
38Mu D, Wang G, Fan Y. Trajectory tracking control for underactuated unmanned surface vehicle subject to uncertain dynamics and input saturation. Neural Comput Appl. 2021; 33(19): 12777-12789.
10.1007/s00521-021-05922-x
Web of Science® Google Scholar
39Hu X, Du J. Robust nonlinear control design for dynamic positioning of marine vessels with thruster system dynamics. Nonlinear Dyn. 2018; 94: 365-376.
10.1007/s11071-018-4364-1
Web of Science® Google Scholar
40Ma J, Ge S, Zheng Z, Hu D. Adaptive NN control of a class of nonlinear systems with asymmetric saturation actuators. IEEE Trans Neural Netw Learn Syst. 2015; 26(7): 1532-1538.
10.1109/TNNLS.2014.2344019
PubMed Web of Science® Google Scholar
41Zheng Z, Huang Y, Xie L, Zhu B. Adaptive trajectory tracking control of a fully actuated surface vessel with asymmetrically constrained input and output. IEEE Trans Control Syst Technol. 2018; 26(5): 1851-1859.
10.1109/TCST.2017.2728518
Web of Science® Google Scholar
42Yang X, Liu D, Luo B, Li C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf Sci. 2016; 369: 731-747.
10.1016/j.ins.2016.07.051
Web of Science® Google Scholar
43Yang X, He H. Event-triggered robust stabilization of nonlinear input-constrained systems using single network adaptive critic designs. IEEE Trans Syst Man Cybern Syst. 2020; 50(9): 3145-3157.
10.1109/TSMC.2018.2853089
Web of Science® Google Scholar
44Du J, Hu X, Sun Y. Adaptive robust nonlinear control design for course tracking of ships subject to external disturbances and input saturation. IEEE Trans Syst Man Cybern Syst. 2020; 50(1): 193-202.
10.1109/TSMC.2017.2761805
Web of Science® Google Scholar
45Li J, Du J, Hu X. Robust adaptive prescribed performance control for dynamic positioning of ships under unknown disturbances and input constraints. Ocean Eng. 2020; 206: 107254.
Web of Science® Google Scholar
46Abu-Khalaf M, Lewis F. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica. 2005; 41(5): 779-791.
10.1016/j.automatica.2004.11.034
Web of Science® Google Scholar
47Wu X, Wang C. Model-free optimal tracking control for an aircraft skin inspection robot with constrained-input and input time-delay via integral reinforcement learning. Int J Control Autom Syst. 2020; 18(1): 245-257.
10.1007/s12555-019-0351-7
Web of Science® Google Scholar
48Tassa Y, Erez T. Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Trans Neural Netw. 2007; 18(4): 1031-1041.
10.1109/TNN.2007.899249
PubMed Web of Science® Google Scholar
49Modares H, Lewis F, Jiang Z. H-infinity tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2015; 26(10): 2550-2562.
10.1109/TNNLS.2015.2441749
PubMed Web of Science® Google Scholar
50Kiumarsi B, Lewis F. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst. 2015; 26(1): 140-151.
10.1109/TNNLS.2014.2358227
PubMed Web of Science® Google Scholar
51Modares H, Sistani M, Lewis F. A policy iteration approach to online optimal control of continuous-time constrained-input systems. ISA Trans. 2013; 52(5): 611-621.
10.1016/j.isatra.2013.04.004
PubMed Web of Science® Google Scholar
52Beard R, Saridis G, Wen J. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica. 1997; 33(12): 2159-2177.
10.1016/S0005-1098(97)00128-3
Web of Science® Google Scholar
53Lewis F, Vrabie D, Syrmos V. Reinforcement learning and optimal adaptive control. Optimal Control. Third ed. John Wiley & Sons, Inc; 2012.
10.1002/9781118122631.ch11
Google Scholar
54Rudin W. Principles of Mathematical Analysis. Vol 3. McGraw-Hill; 1976.
Google Scholar
55Lewis F, Jagannathan S, Yesildirak A. Neural Network Control of Robot Manipulators and Non-Linear Systems. CRC Press; 2020.
10.1201/9781003062714
Google Scholar
56Zhao W, Liu H, Wang B. Model-free attitude synchronization for multiple heterogeneous quadrotors via reinforcement learning. Int J Intell Syst. 2021; 36(6): 2528-2547.
10.1002/int.22392
Web of Science® Google Scholar
57Vamvoudakis K, Lewis F. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica. 2010; 46(5): 878-888.
10.1016/j.automatica.2010.02.018
Web of Science® Google Scholar
58Vamvoudakis K, Vrabie D, Lewis F. Online adaptive algorithm for optimal control with integral reinforcement learning. Int J Robust Nonlinear Control. 2014; 24(17): 2686-2710.
10.1002/rnc.3018
Web of Science® Google Scholar
59Hornik K, Stinchcombe M, White H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 1990; 3(5): 551-560.
10.1016/0893-6080(90)90005-6
Web of Science® Google Scholar
60Na J, Zhao J, Gao G, Li Z. Output-feedback robust control of uncertain systems via online data-driven learning. IEEE Trans Neural Netw Learn Syst. 2021; 32(6): 2650-2662.
10.1109/TNNLS.2020.3007414
PubMed Web of Science® Google Scholar
61Fossen T, Kokotovic P. Adaptive maneuvering, with experiments, for a model ship in a marine control laboratory. Automatica. 2005; 41(2): 289-298.
10.1016/j.automatica.2004.10.006
Web of Science® Google Scholar
62Zhu G, Du J. Global robust adaptive trajectory tracking control for surface ships under input saturation. IEEE J Ocean Eng. 2020; 45(2): 442-450.
10.1109/JOE.2018.2877895
Web of Science® Google Scholar

Citing Literature

Volume33, Issue6

April 2023

Pages 3807-3825

Reinforcement learning-based optimal trajectory tracking control of surface vessels under input saturations

Abstract

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Reinforcement learning-based optimal trajectory tracking control of surface vessels under input saturations

Abstract

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

References

Related

Information