Optimal trajectory tracking control for a class of nonlinear nonaffine systems via generalized N-step value gradient learning
Mingming Zhao
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Search for more papers by this authorCorresponding Author
Ding Wang
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Correspondence Ding Wang, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
Email: [email protected]
Search for more papers by this authorJunfei Qiao
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Search for more papers by this authorLingzhi Hu
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Search for more papers by this authorMingming Zhao
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Search for more papers by this authorCorresponding Author
Ding Wang
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Correspondence Ding Wang, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
Email: [email protected]
Search for more papers by this authorJunfei Qiao
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Search for more papers by this authorLingzhi Hu
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China
Search for more papers by this authorFunding information: National Key Research and Development Program of China, Grant/Award Number: 2021ZD0112302; Beijing Natural Science Foundation, Grant/Award Number: JQ19013; National Natural Science Foundation of China, Grant/Award Numbers: 62222301; 61890930-5; 62021003
Summary
In this paper, the tracking control problem of unknown nonlinear systems is solved by using the generalized N-step value gradient learning algorithm with parameter [GNSVGL()]. The GNSVGL() algorithm can provide optimal tracking decisions faster than traditional ones. Initialized by different positive semi-definite functions, the monotonicity and convergence properties of the proposed algorithm are proven. Under some conditions, the stability analysis of the value-iteration-based algorithm is provided. The one-return and -return critic neural networks are constructed to approximate the gradient of the one-return and -return cost functions. The action neural network is employed to approximate the control law of the error system. It is emphasized that one-return and -return critic networks are combined to train the action neural network. Finally, via conducting simulation studies and comparisons, the excellent tracking performance of the proposed algorithm is confirmed.
CONFLICT OF INTEREST
We confirm that there are no conflict of interests for this article.
Open Research
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
REFERENCES
- 1 Zhao M, Wang D, Ha M, Qiao J. Evolving and incremental value iteration schemes for nonlinear discrete-time zero-sum games. IEEE Trans Cybern. 2022. doi:10.1109/TCYB.2022.3198078
- 2Liu S, Niu B, Zong G, Zhao X, Xu N. Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dyn. 2022; 109: 891-909.
- 3Werbos PJ. Approximate dynamic programming for real-time control and neural modeling. In: DA White, DA Sofge, eds. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold; 1992 ch. 13.
- 4Prokhorov DV, Wunsch DC. Adaptive critic designs. IEEE Trans Neural Netw. 1997; 8(5): 997-1007.
- 5Lewis FL, Vrabie D, Vamvoudakis KG. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag. 2012; 32(6): 76-105.
- 6Wang D, Qiao J, Cheng L. An approximate neuro-optimal solution of discounted guaranteed cost control design. IEEE Trans Cybern. 2022; 52(1): 77-86.
- 7 Zhang H, Zhao X, Wang H, Zong G, Xu N. Hierarchical sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with unknown perturbation. IEEE Trans Neural Netw Learn Syst. 2022. doi:10.1109/TNNLS.2022.3183991
- 8 Yang Q, Cao W, Meng W, Si J. Reinforcement-learning-based tracking control of waste water treatment process under realistic system conditions and control performance requirements. IEEE Trans Syst Man Cybern Syst. 2022; 52(8): 5284-5294.
- 9Wang D, Ha M, Qiao J. Data-driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Trans Ind Electron. 2021; 68(8): 7362-7369.
- 10Bo Y, Qiao J. Heuristic dynamic programming using echo state network for multivariable tracking control of wastewater treatment process. Asian J Control. 2015; 17(5): 1654-1666.
- 11Liu D, Xu Y, Wei Q, Liu X. Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming. IEEE/CAA J Autom Sin. 2018; 5(1): 36-46.
- 12Wei Q, Liu D, Liu Y, Song R. Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming. IEEE/CAA J Autom Sin. 2017; 4(2): 168-176.
10.1109/JAS.2016.7510262 Google Scholar
- 13Jiang C, Ni Z, Guo Y, He H. Learning human-robot interaction for robot-assisted pedestrian flow optimization. IEEE Trans Syst Man Cybern Syst. 2019; 49(4): 797-813.
- 14Al-Tamimi A, Lewis FL, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern–Part B Cybern. 2008; 38(4): 943-949.
- 15Zhang H, Luo Y, Liu D. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw. 2009; 20(9): 1490-1503.
- 16Wang D, Liu D, Wei Q, Zhao D, Jin N. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica. 2012; 48(8): 1825-1832.
- 17 Li T, Yang D, Xie X, Zhang H. Event-triggered control of nonlinear discrete-time system with unknown dynamics based on HDP (). IEEE Trans Cybern. 2022; 52(7): 6046-6058.
- 18Sutton RS. Learning to predict by the methods of temporal differences. Mach Learn. 1988; 3(1): 9-44.
10.1007/BF00115009 Google Scholar
- 19Fairbank M, Alonso E, Value-gradient learning. Paper presented at: 2012 International Joint Conference on Neural Networks; 2012: 1–8.
- 20Al-Dabooni S, Wunsch DC. An improved -step value gradient learning adaptive dynamic programming algorithm for online learning. IEEE Trans Neural Netw Learn Syst. 2020; 31(4): 1155-1169.
- 21Wang D, Ha M, Zhao M. The intelligent critic framework for advanced optimal control. Artif Intell Rev. 2022; 55(1): 1-22.
- 22Luo B, Liu D, Huang T, Wang D. Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst. 2016; 27(10): 2134-2144.
- 23Niu B, Liu J, Wang D, Zhao X, Wang H. Adaptive decentralized asymptotic tracking control for large-scale nonlinear systems with unknown strong interconnections. IEEE/CAA J Autom Sin. 2022; 9(1): 173-186.
10.1109/JAS.2021.1004246 Google Scholar
- 24Luo X, Si J. Stability of direct heuristic dynamic programming for nonlinear tracking control using PID neural network. Paper presented at: 2013 International Joint Conference on Neural Networks; 2013: 1–7.
- 25 Xue S, Luo B, Liu D, Gao Y. Event-triggered ADP for tracking control of partially unknown constrained uncertain systems. IEEE Trans Cybern. 2022; 52(9): 9001-9012.
- 26Mu C, Zhang Y, Gao Z, Sun C. ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties. IEEE Trans Syst Man Cybern Syst. 2020; 50(11): 4056-4067.
- 27Dong H, Zhao X, Luo B. Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP. IEEE Trans Syst Man Cybern Syst. 2022; 52(1): 561-573.
- 28Modares H, Lewis FL. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica. 2014; 50(7): 1780-1792.
- 29Ha M, Wang D, Liu D. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA J Autom Sin. 2022; 9(7): 1262-1272.
10.1109/JAS.2022.105692 Google Scholar
- 30Hou J, Wang D, Liu D, Zhang Y. Model-free optimal tracking control of constrained nonlinear systems via an iterative adaptive learning algorithm. IEEE Trans Syst Man Cybern Syst. 2020; 50(11): 4097-4108.
- 31Kiumarsi B, Lewis FL. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst. 2015; 26(1): 140-151.
- 32Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani M-B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica. 2014; 50: 1167-1175.
- 33Li C, Ding J, Lewis FL, Chai T. A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica. 2021; 129:109687.
- 34Wang D, Zhao M, Qiao J. Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. Int J Robust Nonlinear Control. 2021; 31(14): 6773-6787.
- 35Ha M, Wang D, Liu D. Data-based nonaffine optimal tracking control using iterative DHP approach,” IFAC World Congress, In Proceedings of 21st IFAC World Congress, Berlin, Germany; 2020: 4246–4251.
- 36Wang D, Zhao M, Ha M, Hu L. Adaptive-critic-based hybrid intelligent optimal tracking for a class of nonlinear discrete-time systems. Eng Appl Artif Intell. 2021; 105: 104443:1-104443:11.
- 37Lu J, Wei Q, Wang F. Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J Autom Sin. 2020; 7(6): 1662-1674.
10.1109/JAS.2020.1003426 Google Scholar