International Journal of Robust and Nonlinear Control

RESEARCH ARTICLE

Optimal trajectory tracking control for a class of nonlinear nonaffine systems via generalized N-step value gradient learning

Mingming Zhao

orcid.org/0000-0002-6405-4652

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Search for more papers by this author

Ding Wang,

Corresponding Author

Ding Wang

[email protected]

orcid.org/0000-0002-7149-5712

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Correspondence Ding Wang, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.

Email: [email protected]

Search for more papers by this author

Junfei Qiao,

Junfei Qiao

orcid.org/0000-0001-9652-3364

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Search for more papers by this author

Lingzhi Hu,

Lingzhi Hu

orcid.org/0000-0002-7357-0042

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Search for more papers by this author

Mingming Zhao,

Mingming Zhao

orcid.org/0000-0002-6405-4652

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Search for more papers by this author

Ding Wang,

Corresponding Author

Ding Wang

[email protected]

orcid.org/0000-0002-7149-5712

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Correspondence Ding Wang, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.

Email: [email protected]

Search for more papers by this author

Junfei Qiao,

Junfei Qiao

orcid.org/0000-0001-9652-3364

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Search for more papers by this author

Lingzhi Hu,

Lingzhi Hu

orcid.org/0000-0002-7357-0042

Faculty of Information Technology, Beijing University of Technology, Beijing, China

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, China

Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, China

Search for more papers by this author

First published: 27 December 2022

https://doi.org/10.1002/rnc.6569

Citations: 3

Funding information: National Key Research and Development Program of China, Grant/Award Number: 2021ZD0112302; Beijing Natural Science Foundation, Grant/Award Number: JQ19013; National Natural Science Foundation of China, Grant/Award Numbers: 62222301; 61890930-5; 62021003

Share a link

Email
Wechat
Bluesky

Summary

In this paper, the tracking control problem of unknown nonlinear systems is solved by using the generalized N-step value gradient learning algorithm with parameter $\lambda$ [GNSVGL( $\lambda$ )]. The GNSVGL( $\lambda$ ) algorithm can provide optimal tracking decisions faster than traditional ones. Initialized by different positive semi-definite functions, the monotonicity and convergence properties of the proposed algorithm are proven. Under some conditions, the stability analysis of the value-iteration-based algorithm is provided. The one-return and $\lambda$ -return critic neural networks are constructed to approximate the gradient of the one-return and $\lambda$ -return cost functions. The action neural network is employed to approximate the control law of the error system. It is emphasized that one-return and $\lambda$ -return critic networks are combined to train the action neural network. Finally, via conducting simulation studies and comparisons, the excellent tracking performance of the proposed algorithm is confirmed.

CONFLICT OF INTEREST

We confirm that there are no conflict of interests for this article.

Open Research

DATA AVAILABILITY STATEMENT

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

REFERENCES

1 Zhao M, Wang D, Ha M, Qiao J. Evolving and incremental value iteration schemes for nonlinear discrete-time zero-sum games. IEEE Trans Cybern. 2022. doi:10.1109/TCYB.2022.3198078
10.1109/TCYB.2022.3198078
Web of Science® Google Scholar
2Liu S, Niu B, Zong G, Zhao X, Xu N. Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dyn. 2022; 109: 891-909.
10.1007/s11071-022-07459-7
Web of Science® Google Scholar
3Werbos PJ. Approximate dynamic programming for real-time control and neural modeling. In: DA White, DA Sofge, eds. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold; 1992 ch. 13.
Web of Science® Google Scholar
4Prokhorov DV, Wunsch DC. Adaptive critic designs. IEEE Trans Neural Netw. 1997; 8(5): 997-1007.
10.1109/72.623201
CAS PubMed Web of Science® Google Scholar
5Lewis FL, Vrabie D, Vamvoudakis KG. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag. 2012; 32(6): 76-105.
10.1109/MCS.2012.2214134
Web of Science® Google Scholar
6Wang D, Qiao J, Cheng L. An approximate neuro-optimal solution of discounted guaranteed cost control design. IEEE Trans Cybern. 2022; 52(1): 77-86.
10.1109/TCYB.2020.2977318
PubMed Web of Science® Google Scholar
7 Zhang H, Zhao X, Wang H, Zong G, Xu N. Hierarchical sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with unknown perturbation. IEEE Trans Neural Netw Learn Syst. 2022. doi:10.1109/TNNLS.2022.3183991
10.1109/TNNLS.2022.3183991
Web of Science® Google Scholar
8 Yang Q, Cao W, Meng W, Si J. Reinforcement-learning-based tracking control of waste water treatment process under realistic system conditions and control performance requirements. IEEE Trans Syst Man Cybern Syst. 2022; 52(8): 5284-5294.
10.1109/TSMC.2021.3122802
Web of Science® Google Scholar
9Wang D, Ha M, Qiao J. Data-driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Trans Ind Electron. 2021; 68(8): 7362-7369.
10.1109/TIE.2020.3001840
Web of Science® Google Scholar
10Bo Y, Qiao J. Heuristic dynamic programming using echo state network for multivariable tracking control of wastewater treatment process. Asian J Control. 2015; 17(5): 1654-1666.
10.1002/asjc.994
Web of Science® Google Scholar
11Liu D, Xu Y, Wei Q, Liu X. Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming. IEEE/CAA J Autom Sin. 2018; 5(1): 36-46.
10.1109/JAS.2017.7510739
CAS Google Scholar
12Wei Q, Liu D, Liu Y, Song R. Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming. IEEE/CAA J Autom Sin. 2017; 4(2): 168-176.
10.1109/JAS.2016.7510262
Google Scholar
13Jiang C, Ni Z, Guo Y, He H. Learning human-robot interaction for robot-assisted pedestrian flow optimization. IEEE Trans Syst Man Cybern Syst. 2019; 49(4): 797-813.
10.1109/TSMC.2017.2725300
Web of Science® Google Scholar
14Al-Tamimi A, Lewis FL, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern–Part B Cybern. 2008; 38(4): 943-949.
10.1109/TSMCB.2008.926614
PubMed Web of Science® Google Scholar
15Zhang H, Luo Y, Liu D. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw. 2009; 20(9): 1490-1503.
10.1109/TNN.2009.2027233
PubMed Web of Science® Google Scholar
16Wang D, Liu D, Wei Q, Zhao D, Jin N. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica. 2012; 48(8): 1825-1832.
10.1016/j.automatica.2012.05.049
Web of Science® Google Scholar
17 Li T, Yang D, Xie X, Zhang H. Event-triggered control of nonlinear discrete-time system with unknown dynamics based on HDP ( $\lambda$ ). IEEE Trans Cybern. 2022; 52(7): 6046-6058.
10.1109/TCYB.2020.3044595
PubMed Web of Science® Google Scholar
18Sutton RS. Learning to predict by the methods of temporal differences. Mach Learn. 1988; 3(1): 9-44.
10.1007/BF00115009
Google Scholar
19Fairbank M, Alonso E, Value-gradient learning. Paper presented at: 2012 International Joint Conference on Neural Networks; 2012: 1–8.
Google Scholar
20Al-Dabooni S, Wunsch DC. An improved $n$ -step value gradient learning adaptive dynamic programming algorithm for online learning. IEEE Trans Neural Netw Learn Syst. 2020; 31(4): 1155-1169.
10.1109/TNNLS.2019.2919338
PubMed Web of Science® Google Scholar
21Wang D, Ha M, Zhao M. The intelligent critic framework for advanced optimal control. Artif Intell Rev. 2022; 55(1): 1-22.
10.1007/s10462-021-10118-9
CAS Web of Science® Google Scholar
22Luo B, Liu D, Huang T, Wang D. Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst. 2016; 27(10): 2134-2144.
10.1109/TNNLS.2016.2585520
PubMed Web of Science® Google Scholar
23Niu B, Liu J, Wang D, Zhao X, Wang H. Adaptive decentralized asymptotic tracking control for large-scale nonlinear systems with unknown strong interconnections. IEEE/CAA J Autom Sin. 2022; 9(1): 173-186.
10.1109/JAS.2021.1004246
Google Scholar
24Luo X, Si J. Stability of direct heuristic dynamic programming for nonlinear tracking control using PID neural network. Paper presented at: 2013 International Joint Conference on Neural Networks; 2013: 1–7.
Google Scholar
25 Xue S, Luo B, Liu D, Gao Y. Event-triggered ADP for tracking control of partially unknown constrained uncertain systems. IEEE Trans Cybern. 2022; 52(9): 9001-9012.
10.1109/TCYB.2021.3054626
PubMed Web of Science® Google Scholar
26Mu C, Zhang Y, Gao Z, Sun C. ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties. IEEE Trans Syst Man Cybern Syst. 2020; 50(11): 4056-4067.
10.1109/TSMC.2019.2895692
Web of Science® Google Scholar
27Dong H, Zhao X, Luo B. Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP. IEEE Trans Syst Man Cybern Syst. 2022; 52(1): 561-573.
10.1109/TSMC.2020.3003797
Web of Science® Google Scholar
28Modares H, Lewis FL. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica. 2014; 50(7): 1780-1792.
10.1016/j.automatica.2014.05.011
Web of Science® Google Scholar
29Ha M, Wang D, Liu D. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA J Autom Sin. 2022; 9(7): 1262-1272.
10.1109/JAS.2022.105692
Google Scholar
30Hou J, Wang D, Liu D, Zhang Y. Model-free ${H}_{\infty }$ optimal tracking control of constrained nonlinear systems via an iterative adaptive learning algorithm. IEEE Trans Syst Man Cybern Syst. 2020; 50(11): 4097-4108.
10.1109/TSMC.2018.2863708
Web of Science® Google Scholar
31Kiumarsi B, Lewis FL. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst. 2015; 26(1): 140-151.
10.1109/TNNLS.2014.2358227
PubMed Web of Science® Google Scholar
32Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani M-B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica. 2014; 50: 1167-1175.
10.1016/j.automatica.2014.02.015
Web of Science® Google Scholar
33Li C, Ding J, Lewis FL, Chai T. A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica. 2021; 129:109687.
10.1016/j.automatica.2021.109687
Web of Science® Google Scholar
34Wang D, Zhao M, Qiao J. Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. Int J Robust Nonlinear Control. 2021; 31(14): 6773-6787.
10.1002/rnc.5639
Web of Science® Google Scholar
35Ha M, Wang D, Liu D. Data-based nonaffine optimal tracking control using iterative DHP approach,” IFAC World Congress, In Proceedings of 21st IFAC World Congress, Berlin, Germany; 2020: 4246–4251.
Google Scholar
36Wang D, Zhao M, Ha M, Hu L. Adaptive-critic-based hybrid intelligent optimal tracking for a class of nonlinear discrete-time systems. Eng Appl Artif Intell. 2021; 105: 104443:1-104443:11.
10.1016/j.engappai.2021.104443
Web of Science® Google Scholar
37Lu J, Wei Q, Wang F. Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J Autom Sin. 2020; 7(6): 1662-1674.
10.1109/JAS.2020.1003426
Google Scholar

Citing Literature

Volume33, Issue6

April 2023

Pages 3471-3490

Optimal trajectory tracking control for a class of nonlinear nonaffine systems via generalized N-step value gradient learning

Summary

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Optimal trajectory tracking control for a class of nonlinear nonaffine systems via generalized N-step value gradient learning

Summary

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

References

Related

Information