Reinforcement learning-based optimal trajectory tracking control of surface vessels under input saturations
Ziping Wei
School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China
Search for more papers by this authorCorresponding Author
Jialu Du
School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China
Correspondence Jialu Du, School of Marine Engineering, Dalian Maritime University, Dalian, Liaoning 116026, China.
Email: [email protected]
Search for more papers by this authorZiping Wei
School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China
Search for more papers by this authorCorresponding Author
Jialu Du
School of Marine Electrical Engineering, Dalian Maritime University, Dalian, China
Correspondence Jialu Du, School of Marine Engineering, Dalian Maritime University, Dalian, Liaoning 116026, China.
Email: [email protected]
Search for more papers by this authorFunding information: Dalian Science and Technology Innovation Fund, Grant/Award Number: 2020JJ26GX020; National Natural Science Foundation of China, Grant/Award Number: 51079013
Abstract
This paper develops a reinforcement learning (RL)-based optimal trajectory tracking control scheme of surface vessels with unknown dynamics, unknown disturbances, and input saturations of surface vessels. The control scheme is designed by combining the optimal control theory, adaptive neural networks, and the RL method in a unified actor-critic NN framework. A hyperbolic-type penalty function of the control input is designed so as to deal with the input saturations of surface vessels. An actor-critic NN-based RL mechanism is established to learn the optimal trajectory tracking control law without the knowledge of the surface vessel dynamics and disturbances, where NN weights are tuned online on the basis of devised tuning laws. Theoretical analysis and simulation results prove that the proposed RL-based optimal trajectory tracking control scheme can ensure surface vessels track the desired trajectory, while guaranteeing the boundedness of all signals in the surface vessel optimal trajectory tracking closed-loop control system.
CONFLICT OF INTEREST
The authors declare that they have no conflict of interest.
Open Research
DATA AVAILABILITY STATEMENT
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
REFERENCES
- 1Deng Y, Zhang X, Im N, Zhang G, Zhang Q. Adaptive fuzzy tracking control for underactuated surface vessels with unmodeled dynamics and input saturation. ISA Trans. 2020; 103: 52-62.
- 2Wang N, Er M. Direct adaptive fuzzy tracking control of marine vehicles with fully unknown parametric dynamics and uncertainties. IEEE Trans Control Syst Technol. 2016; 24(5): 1845-1852.
- 3Weng Y, Wang N. Data-driven robust backstepping control of unmanned surface vehicles. Int J Robust Nonlinear Control. 2020; 30(9): 3624-3638.
- 4Yu R, Zhu Q, Xia G, Liu Z. Sliding mode tracking control of an underactuated surface vessel. IET Control Theory Appl. 2012; 6(3): 461-466.
- 5Shen Z, Bi Y, Wang Y, Guo C. MLP neural network-based recursive sliding mode dynamic surface control for trajectory tracking of fully actuated surface vessel subject to unknown dynamics and input saturation. Neurocomput. 2020; 377: 103-112.
- 6Zhao Z, He W, Ge S. Adaptive neural network control of a fully actuated marine surface vessel with multiple output constraints. IEEE Trans Control Syst Technol. 2014; 22(4): 1536-1543.
- 7Qin J, Du J. Robust adaptive asymptotic trajectory tracking control for underactuated surface vessels subject to unknown dynamics and input saturation. J Mar Sci Technol. 2021; 27: 307-319.
- 8Liu J, Du J. Composite learning tracking control for underactuated autonomous underwater vehicle with unknown dynamics and disturbances in three-dimension space. Appl Ocean Res. 2021; 112:10286.
- 9Yang X, He H. Adaptive critic designs for event-triggered robust control of nonlinear systems with unknown dynamics. IEEE Trans Cybern. 2019; 49(6): 2255-2267.
- 10Yang X, He H. Decentralized event-triggered control for a class of nonlinear-interconnected systems using reinforcement learning. IEEE Trans Cybern. 2021; 51(2): 635-648.
- 11Sutton R, Barto A. Introduction to Reinforcement Learning. Vol 135.
MIT Press; 1998.
10.1109/TNN.1998.712192 Google Scholar
- 12Vamvoudakis K, Safaei F, Hespanha J. Robust event-triggered output feedback learning algorithm for voltage source inverters with unknown load and parameter variations. Int J Robust Nonlinear Control. 2019; 29(11): 3502-3517.
- 13Zhao J, Na J, Gao G. Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties. Neurocomput. 2020; 395: 56-65.
- 14Vamvoudakis K, Miranda M, Hespanha J. Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation. IEEE Trans Neural Netw Learn Syst. 2016; 27(11): 2386-2398.
- 15Kokolakis N, Vamvoudakis K. Safety-aware pursuit-evasion games in unknown environments using gaussian processes and finite-time convergent reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2022. doi: 10.1109/TNNLS.2022.3203977
- 16Vamvoudakis K, Ferraz H. Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance. Automatica. 2018; 87: 412-420.
- 17Wen G, Chen C, Ge S, Yang H, Liu X. Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy. IEEE Trans Industr Inform. 2019; 15(9): 4969-4977.
- 18Zhang K, Zhang H, Gao Z, Su H. Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures. J Frank Inst Eng Appl Math. 2018; 355(15): 6947-6968.
- 19Vamvoudakis K, Mojoodi A, Ferraz H. Event-triggered optimal tracking control of nonlinear systems. Int J Robust Nonlinear Control. 2017; 27(4): 598-619.
- 20Wang D, Ha M, Zhao M. The intelligent critic framework for advanced optimal control. Artif Intell Rev. 2022; 55(1): 1-22.
- 21Wang D, Ren J, Ha M. Value-iteration-based affine nonlinear optimal control involving admissibility discussion. Int J Robust Nonlinear Control. 2022; 32(13): 7290-7303.
- 22Wang D, Hu L, Qiao J. Multi-event-triggered adaptive critic control with guaranteed cost for discrete-time nonlinear nonzero-sum games. Int J Robust Nonlinear Control. 2022;32(18):10292-10308.
- 23Na J, Wang B, Li G, Zhan S, He W. Nonlinear constrained optimal control of wave energy converters with adaptive dynamic programming. IEEE Trans Ind Electron. 2019; 66(10): 7904-7915.
- 24Wang D, Zhao M, Qiao J. Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. Int J Robust Nonlinear Control. 2021; 31(14): 6773-6787.
- 25Yin Z, He W, Yang C, Sun C. Control design of a marine vessel system using reinforcement learning. Neurocomput. 2018; 311: 353-362.
- 26Xia Y, Xu K, Wang W, Xu G, Xiang X, Li Y. Optimal robust trajectory tracking control of a X-rudder AUV with velocity sensor failures and uncertainties. Ocean Eng. 2020; 198:106949.
- 27Wen G, Ge S, Chen C, Tu F, Wang S. Adaptive tracking control of surface vessel using optimized backstepping technique. IEEE Trans Cybern. 2019; 49(9): 3420-3431.
- 28Zhang J, Yang G. Fault-tolerant leader-follower formation control of marine surface vessels with unknown dynamics and actuator faults. Int J Robust Nonlinear Control. 2018; 28(14): 4188-4208.
- 29Vrabie D, Lewis F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 2009; 22(3): 237-246.
- 30Song R, Lewis F, Wei Q, Zhang H, Jiang Z, Levine D. Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Trans Neural Netw Learn Syst. 2015; 26(4): 851-865.
- 31Zhao W, Liu H, Lewis F, Wang X. Data-driven optimal formation control for quadrotor team with unknown dynamics. IEEE Trans Cybern. 2021;52(8):7889-7898.
- 32Wu H, Song S, You K, Wu C. Depth control of model-free auvs via reinforcement learning. IEEE Trans Syst Man Cybern Syst. 2019; 49(12): 2499-2510.
- 33Jiang P, Song S, Huang G. Attention-based meta-reinforcement learning for tracking control of auv with time-varying dynamics. IEEE Trans Neural Netw Learn Syst. 2021;33(11):6388-6401.
- 34Na J, Li G, Wang B, Herrmann G, Zhang S. Robust optimal control of wave energy converters based on adaptive dynamic programming. IEEE Trans Sustain Energy. 2019; 10(2): 961-970.
- 35Zhao W, Liu H, Lewis F. Robust formation control for cooperative underactuated quadrotors via reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2021; 32(10): 4577-4587.
- 36Hu X, Du J, Zhu G, Sun Y. Robust adaptive NN control of dynamically positioned vessels under input constraints. Neurocomput. 2018; 318: 201-212.
- 37Du J, Hu X, Krstic M, Sun Y. Robust dynamic positioning of ships with disturbances under input saturation. Automatica. 2016; 73: 207-214.
- 38Mu D, Wang G, Fan Y. Trajectory tracking control for underactuated unmanned surface vehicle subject to uncertain dynamics and input saturation. Neural Comput Appl. 2021; 33(19): 12777-12789.
- 39Hu X, Du J. Robust nonlinear control design for dynamic positioning of marine vessels with thruster system dynamics. Nonlinear Dyn. 2018; 94: 365-376.
- 40Ma J, Ge S, Zheng Z, Hu D. Adaptive NN control of a class of nonlinear systems with asymmetric saturation actuators. IEEE Trans Neural Netw Learn Syst. 2015; 26(7): 1532-1538.
- 41Zheng Z, Huang Y, Xie L, Zhu B. Adaptive trajectory tracking control of a fully actuated surface vessel with asymmetrically constrained input and output. IEEE Trans Control Syst Technol. 2018; 26(5): 1851-1859.
- 42Yang X, Liu D, Luo B, Li C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf Sci. 2016; 369: 731-747.
- 43Yang X, He H. Event-triggered robust stabilization of nonlinear input-constrained systems using single network adaptive critic designs. IEEE Trans Syst Man Cybern Syst. 2020; 50(9): 3145-3157.
- 44Du J, Hu X, Sun Y. Adaptive robust nonlinear control design for course tracking of ships subject to external disturbances and input saturation. IEEE Trans Syst Man Cybern Syst. 2020; 50(1): 193-202.
- 45Li J, Du J, Hu X. Robust adaptive prescribed performance control for dynamic positioning of ships under unknown disturbances and input constraints. Ocean Eng. 2020; 206: 107254.
- 46Abu-Khalaf M, Lewis F. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica. 2005; 41(5): 779-791.
- 47Wu X, Wang C. Model-free optimal tracking control for an aircraft skin inspection robot with constrained-input and input time-delay via integral reinforcement learning. Int J Control Autom Syst. 2020; 18(1): 245-257.
- 48Tassa Y, Erez T. Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Trans Neural Netw. 2007; 18(4): 1031-1041.
- 49Modares H, Lewis F, Jiang Z. H-infinity tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2015; 26(10): 2550-2562.
- 50Kiumarsi B, Lewis F. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst. 2015; 26(1): 140-151.
- 51Modares H, Sistani M, Lewis F. A policy iteration approach to online optimal control of continuous-time constrained-input systems. ISA Trans. 2013; 52(5): 611-621.
- 52Beard R, Saridis G, Wen J. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica. 1997; 33(12): 2159-2177.
- 53Lewis F, Vrabie D, Syrmos V. Reinforcement learning and optimal adaptive control. Optimal Control. Third ed.
John Wiley & Sons, Inc; 2012.
10.1002/9781118122631.ch11 Google Scholar
- 54Rudin W. Principles of Mathematical Analysis. Vol 3. McGraw-Hill; 1976.
- 55Lewis F, Jagannathan S, Yesildirak A. Neural Network Control of Robot Manipulators and Non-Linear Systems.
CRC Press; 2020.
10.1201/9781003062714 Google Scholar
- 56Zhao W, Liu H, Wang B. Model-free attitude synchronization for multiple heterogeneous quadrotors via reinforcement learning. Int J Intell Syst. 2021; 36(6): 2528-2547.
- 57Vamvoudakis K, Lewis F. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica. 2010; 46(5): 878-888.
- 58Vamvoudakis K, Vrabie D, Lewis F. Online adaptive algorithm for optimal control with integral reinforcement learning. Int J Robust Nonlinear Control. 2014; 24(17): 2686-2710.
- 59Hornik K, Stinchcombe M, White H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 1990; 3(5): 551-560.
- 60Na J, Zhao J, Gao G, Li Z. Output-feedback robust control of uncertain systems via online data-driven learning. IEEE Trans Neural Netw Learn Syst. 2021; 32(6): 2650-2662.
- 61Fossen T, Kokotovic P. Adaptive maneuvering, with experiments, for a model ship in a marine control laboratory. Automatica. 2005; 41(2): 289-298.
- 62Zhu G, Du J. Global robust adaptive trajectory tracking control for surface ships under input saturation. IEEE J Ocean Eng. 2020; 45(2): 442-450.