Although solving the robust control problem with offline manner has been studied, it is not easy to solve it using the online method, especially for uncertain systems. In this paper, a novel approach based on an online data-driven learning is suggested to address the robust control problem for uncertain systems. To this end, the robust control problem of uncertain systems is first transformed into an optimal problem of the nominal systems via selecting an appropriate value function that denotes the uncertainties, regulation, and control. Then, a data-driven learning framework is constructed, where Kronecker’s products and vectorization operations are used to reformulate the derived algebraic Riccati equation (ARE). To obtain the solution of this ARE, an adaptive learning law is designed; this helps to retain the convergence of the estimated solutions. The closed-loop system stability and convergence have been proved. Finally, simulations are given to illustrate the effectiveness of the method.

1. Introduction

Existing achievements of control techniques are mostly acquired under the assumption that there are no dynamical uncertainties in the controlled plants. Nevertheless, in practical control systems, there are many external disturbances and/or model uncertainties, so the system lifetimes are always affected by those uncertainties. The factors of uncertainties must be taken into consideration in the design of the controller such that the closed-loop systems must have good responses even in the presence of such uncertain dynamics. We say a controller is robust if it works even though the practical system deviates from its nominal model. Therefore, it creates the problem of robust control design, which has been widely studied during the past decades [1, 2]. The latest research [1, 3] shows that the robust control problem can be addressed via using the optimal control approach for the nominal system. Nevertheless, the online solution for the derived optimal control problem is not handled in [1].

Considering optimal control problems, recently, many approaches have been presented [4, 5]. A linear system optimal control problem is described to address the associated linear quadratic regulator (LQR) problem, where the optimal control law can be obtained. The theory of dynamic programming (DP) has been proposed to study the optimal control problem in the past years [6]; however, there is an obvious disadvantage for DP, i.e., with the increase in the dimensions of system state and control input, there is an alarming increase in the amount of computation and storage, which is called “curse of dimensionality.” To overcome this problem, the neural network (NN) is used to approximate the optimal control problem [7], which leads to recent research work on adaptive/approximate dynamic programming (ADP); the tricky optimal problem can be tackled via ADP method; thus, we can get the online solution of the optimal cost function [8]. Recently, robust control design based on adaptive critic idea has gradually become one of the research hotspots in the field of ADP. Many methods have been proposed one after another, which are collectively referred to as the robust adaptive critic control. A basic approach is to transform the problem to establish a close relationship between robustness and optimality [9]. In these literatures, the closed-loop system generally satisfies the uniformly ultimately bounded (UUB). These results fully show that the ADP method is suitable for the robust control design of complex systems in uncertain environment. Since many previous ADP results are not focus on the robust performance of the controller, the emergence of robust adaptive critic control greatly expands the application scope of ADP methods. Then, considering the commonness in dealing with system uncertainties, the self-learning optimization method combined with ADP and sliding mode control technology provides a new research direction for robust adaptive critic control [10]. In addition, the robust ADP method is another important achievement in this field. It is worth mentioning that the application of robust ADP methods in power systems has attracted special attention [11], leading to a higher application value in industrial systems.

Based on the above facts, we develop a robust control design for uncertain systems via using an online data-driven learning method. For this purpose, the robust control problem of uncertain systems is first transformed into an optimal control problem of the nominal systems with an appropriate cost function. Then, a data-driven technique is developed, where Kronecker’s products and vectorization operations are used to reformulate the derived ARE. To solve this ARE, a novel adaptive law is designed, where the online solution of ARE can be approximated. Simulations are given to indicate the validity of the developed method.

The major contributions of this paper include the following:

(1)
To address the robust control problem, we transform the robust control problem of uncertain systems into an optimal control problem of the nominal system. It provides an approach to address the robust control problem
(2)
Kronecker’s products and vectorization operations are used to reformulate the derived ARE, which can help to rewrite the original ARE into a linear parametric form. It gives a new pathway to online solve the ARE
(3)
A newly developed adaptation algorithm driven by the parameter estimation errors is used to online learn the unknown parameters. The convergence of the estimated unknown parameters to the true values can be guaranteed

This paper is organized as follows: In Section 2, we introduce the robust control problem and transform the robust control problem into an optimal control problem. In Section 3, we design an ADP-based data-driven learning method to online solve the derived ARE, where Kronecker’s products and vectorization operations are used. Section 4 gives some simulation results to illustrate the effectiveness of the proposed method. Some conclusions are stated in Section 5.

2. Preliminaries and Problem Formulation

A continuous-time (CT) uncertain system can be written as

(1)

where x ∈ ℝⁿ and u ∈ ℝ^m are the system state and the control action, respectively. A ∈ ℝ^n×n is the system matrix and B ∈ ℝ^n×m is the input matrix. d ∈ Ω denotes the uncertain parameter involved in the system, and D(x) denotes the bounded nonlinearities. The purpose of this paper is designing a controller to make the system (1) asymptotically stable under uncertainties d ∈ Ω.

In this paper, we study the case, i.e., the matching condition is satisfied; in other words, the uncertainty is in the range of B; thus, the uncertainty is in matrix A which can be rewritten as A(d) − A(d₀) = Bω(d) for uncertain ω(d), where d₀ ∈ Ω is the nominal value of d. Let F denote the upper bound of ω(d); then, for all d ∈ Ω, we have ω^T(d)ω(d) ≤ F. In this paper, we will resolve following problem, i.e., realize the online solution for robust control with uncertain system (1). Then, the above robust control problem can be rewritten as

(2)

To obtain the robust control solution, the classical method is linear matrix inequality (LMI) [12] in an offline; online resolving the robust control problem is not easy. To overcome this problem, the authors in [1, 9] reported that the robust control problem of uncertain systems can be transformed into an optimal control problem of nominal systems, which provides a new pathway to address the robust control problem. Hence, consider the nominal plant of the system (1).

(3)

The aim is to find a control action u to minimize the following continuous cost function:

(4)

where Q = Q^T > 0 ∈ ℝ^n×n and R = R^T > 0 ∈ ℝ^m×m are the weight matrices.

It should also be noted that the upper bound F of the uncertainties ω(d) is involved in the cost function (4) to address their effects. The following Lemma summarizes the equivalence between the robust control of the system (1) or (2) and the optimal control of the system (3) with cost function (4).

Lemma 1 (see [9].)If the solution to the optimal control problem of the nominal system (3) with cost function (4) exists, then it is a solution to the robust control problem for system.

Lemma 1 exploits the relationship between the robust control and optimal control and thus provides a new way to address the robust control.

To address the optimal control problem of (3), an Algebraic Riccati equation (ARE) can be derived via the cost function (4) as

(5)

where P^∗ is the solution of (5), Q_T = Q + F, and A = A(d₀). Then, based on the optimal principle, its optimal action can be given as

(6)

3. Online Solution to Robust Control via Data-Driven Learning

This section will propose a data-driven learning method to resolve the robust control, the schematic of the proposed control method as given Figure 1.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Schematic of the proposed control method.

To this end, the system states are multiplied on both sides of ARE (5); we have

(7)

We apply two operations (vec(⋅) and ⊗) on (7) yielding

(8)

Since the vec(P^∗ ⊗ P^∗) is involved in (8), then the dimension of (8) is very high. To overcome this issue, a dimensionality reduction operation on (7) is given

(9)

then we can apply two operations (vec(⋅)and ⊗) on (9) yielding

(10)

Hence, above equation (10) can be rewritten as a compact form

(11)

where , ϑ = [2(x ⊗ Ax), −vec(R) ⊗ (x ⊗ x)], and ϕ = (x ⊗ x)vec(Q_T).

3.1. Online Solution of Robust Control

From (11), we have that only variable W is unknown due to involving the unknown matrices P^∗ and K; thus, the next operation is design an online learning method to update the unknown variable W. Consequently, the unknown matrices P^∗ and K can be online estimated based on the estimate

of W. To this end, we define two auxiliary variables, i.e., Y ∈ ℝ^l×l and N ∈ ℝ^las

(12)

with ℓ > 0 being the learning parameter. Then, its solution can be calculated as

(13)

To realize the online estimation for

based on the estimation error

, an auxiliary vector M ∈ ℝ^l is defined as

(14)

After taking (11) into (13), we have N = −YW; thus, we can rewrite (14) as

(15)

with

being the estimation error. Then, we can design the adaptive learning law as

(16)

with κ being the learning gain.

For adaptive law (16), auxiliary vector M of (14) obtained based on and ϑ using (15) contains the information on the parameter estimation error . Thus, M can be used to drive parameter estimation. Consequently, parameter estimation can be updated along with the estimation error extracted by using the measurable system states x. Thus, this adaptive algorithm clearly differs to the gradient descent algorithms used in other ADP literatures.

Since the fact is true, then we can obtain the following lemma as follows.

Lemma 2 (see [13], [14].)Assume that the variable ϑ provided in (12) meets persistently excited, then the matrix Y given in (12) can be considered as positive definite, which means that λ_min(Y) > σ > 0 for any positive constant σ.

Lemma 2 shows the positivity of the variable ϑ, then we can summarize the convergence of proposed adaptive learning law (16) as follows.

Theorem 3. Consider (11) with adaptive learning law (16), when variable ϑ provided in (11) satisfies PE condition, then the estimation error is convergence to the origin.

Proof. A Lyapunov function can be chosen as , then we can calculate its as

(17)

with μ = 2σ/λ_max(κ⁻¹) > 0. Hence, we have the estimated error

. This completes the proof.☐

The step-by-step implementation of proposed learning algorithm is given as follows.

Algorithm 1: (Step-by-step Implementation for Online Robust Control Solution of Uncertain Systems).

1) (Initialization): given the initial parameter and gains κ, ℓ for adaptive learning law (16)
2) (Measurement): measure the system input\output data and construct the regressors ϕ, ϑ in (10) and (11)
3) (Online adaptation): solve Y, N, and M and learn the unknown parameter with (16) to obtain the control u
4) (Apply control): apply the derived output-feedback control u on the system

Remark 4. For the above designed adaptive learning law (16), which is derived by the estimation error. To this end, the control input u and system states x are used; this is clearly different to the existing results [15]. In particular, two operations vec(⋅) and ⊗ are applied to the derived ARE; this helps to realize the online learning. Consequently, faster convergence can be retained compared to the previous gradient method-based adaptive laws designed.

Remark 5. It is a fact that some ADP methods are applied to address the robust control problem successfully. However, most existing ADP techniques focus on H-infinity control problem. For proposed robust control problem in this paper, we know the uncertain parameter d are involved in system matrix A such that A(d), so we can consider the system contains unmolded dynamics. To obtain the uncertain term bound, we should do some operations such that A(d) − A(d₀), which will be used in the cost function (4). Assume that the system dynamics are completely unknown, the uncertain bound may not be used in cost function as expected. Hence, the system matrix must be known in this paper; future work will try to solve the output-feedback robust control under completely unknown dynamics.

3.2. Stability Analysis

Before the stability analysis of the closed-loop system, we first define the practical optimal control as

(18)

with

being the estimated P.

Taking (18) into (3), we have the closed-loop system dynamics

(19)

To complete the stability analysis, we use the following assumptions as follows.

Assumption 6. The dynamic matrices A ≤ b_A and B ≤ b_B for b_A, b_B > 0, the estimated matrix for b_P > 0.

In fact, the above assumptions are not stringent in practical systems and have been widely used in many results [13, 14, 16].

Now, some results can be included as follows.

Theorem 7. Consider the system (3) with adaptive learning law (16), if the variable ϑ is PE, then the parameter estimation error converges to zero, and the derived control is convergence to its optimal control, i.e., ‖u − u^∗‖⟶0.

Proof. Consider a Lyapunov function as

(20)

where V^∗ is the optimal cost function provided in (4) and Γ₁ > 0 and K₁ > 0 are the positive constants.

From (17), we have as

(21)

Then, the can be derived from systems (3) and (19) as

(22)

Thus, based on (21) and (22), we have as

(23)

Then, the parameters Γ₁ and K₁ can be chosen fulfilling following conditions

(24)

Therefore, we can rewrite (23) as

(25)

where a₁, a₂, and a₃ are represented as

(26)

Thus, we have J⟶0 for t⟶∞ via Lyapunov theorem, then the estimation error converges to zero, i.e., . Consequently, we can obtain the error between u and u^∗as

(27)

This implies the practical optimal control convergence to 0 is true. This completes the proof.☐

4. Simulation

4.1. Example 1: Second-Order System

We consider a CT second-order system as

(28)

where d ∈ [−0.3, 0.3] denotes the uncertainties in system and

is the state variable. The purpose of the paper is to design a control u making the system (28) stable. In this paper, we define d₀ = 0, then based on the stated in Section 2, we can rewrite the system (28) as

(29)

then we can extract the uncertain term as ω(d) = [d, d]. Thus, the upper bound F can be calculated as

(30)

To complete the simulation, we set the initial system states as x = [0.5,−0.5]^T, the weights matrices are Q = I, R = 1, and learning gains are ℓ = 8.9 and κ = 96.5. To show the effectiveness of the proposed algorithm, the offline solution of ARE is given as

(31)

Figure 2 gives the estimation of the matrix with online adaptive learning law (16); based on the ideal solution in (31), we have that the estimated solution is convergence to its optimal solution P^∗. This is also found in Figure 3, where the normal error, i.e., , is provided. The good convergence will contribute to the rapid convergence of the system states, which can be found in Figure 4, the system states are bounded and smooth. Since the estimated fast convergence to P^∗, then the system response is quite fast; this also can be found in Figure 4. The corresponding control input is given in Figure 5, which is bounded.

4.2. Example 2: Power System Application

This section will provide a power system to test the proposed learning algorithm; thus, we choose x = [ς₁, ς₂, ς₃] ∈ ℝ³ as system states, where x₁ = ς₁ is the incremental change of the frequency deviation, x₂ = ς₂ defines the generator output, and x₃ = ς₃ denotes the governor value position. Therefore, the state-space expression of this power system can be given as

(32)

then we can give some parameters of the proposed power system as follows.

T_G = 5(Hz/MW) is the time of the governor, T_t = 10(s) denotes the time of the turbine model, T_g = 10(Hz/MW) is the time of the generator model, F_r = 0.5(s) indicates the feedback regulation constant, K_t = 1(s) is the gain constant of the turbine model, and K_g = 1(s) shows the gain constant of the generator model.

In order to complete this simulation, one assumes that this system is disturbed by an uncertain term as example 1. The initial system states are set as x₀ = [−0.3, 0.5, 1]^T, Q = I, and R = 1; the learning parameters are given as ℓ = 0.5 and κ = 100. Similar to example 1, the offline solution of ARE can be given as

(33)

Figure 6 shows the convergence of estimated matrix P; based on the offline solution given in (33), we have that the estimated solution P can converge to its optimal solution P^∗; this in turn affects the system state response (as shown in Figure 7). Figure 7 gives the system state response, which is smooth and bounded. The system control input is given in Figure 8.

5. Conclusion

In this paper, an online data-driven ADP method is proposed to solve the robust control problem for continuous-time systems with uncertainties. The robust control problem can be transformed into the optimal control problem. A new online ADP scheme is then introduced to obtain the solution of ARE via using the vectorization operator and Kronecker product. Finally, the closed-loop system stability and the convergence of the robust control solution are all analyzed. Simulation results are presented to validate the effectiveness of the proposed algorithm. It is worth noting that the research results are satisfied to the matched uncertainty condition. In our future work, we will extend the proposed idea to address the robust tracking control problem, which allows to carry out practical experimental validations based on existing test-rigs in our lab.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Shandong Provincial Natural Science Foundation (grant no. ZR2019BEE066) and Applied Basic Research Project of Qingdao (grant no. 19-6-2-68-cg).

Open Research

Data Availability

Data were curated by the authors and are available upon request.

References

1 Lin F., Brandt R. D., and Sun J., Robust control of nonlinear systems: compensating for uncertainty, International Journal of Control. (1992) 56, no. 6, 1453–1459, https://doi.org/10.1080/00207179208934374, 2-s2.0-0011750464.
10.1080/00207179208934374
Web of Science® Google Scholar
2 Zhou K. and Doyle J., Essentials of robust control, Prentice Hall NJ. (1998) 38.
Google Scholar
3 Zhao J., Na J., Gao G., and Zhang Y., Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming, Neurocomputing. (2021) 471, 21–30.
10.1016/j.neucom.2021.10.081
Web of Science® Google Scholar
4 Abu-Khalaf M. and Lewis F. L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica. (2005) 41, no. 5, 779–791, https://doi.org/10.1016/j.automatica.2004.11.034, 2-s2.0-14844340822.
10.1016/j.automatica.2004.11.034
Web of Science® Google Scholar
5 Wang D., Liu D., Wei Q., Zhao D., and Jin N., Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica. (2012) 48, no. 8, 1825–1832, https://doi.org/10.1016/j.automatica.2012.05.049, 2-s2.0-84864489666.
10.1016/j.automatica.2012.05.049
Web of Science® Google Scholar
6 Werbos P. J., Advanced forecasting methods for global crisis warning and models of intelligence, General Systems Yearbook. (1977) 22, no. 6, 25–38.
Web of Science® Google Scholar
7 Xu B., Yang C., and Shi Z., Reinforcement learning output-feedback NN control using deterministic learning technique, IEEE Transactions on Neural Networks and Learning Systems. (2014) 25, no. 3, 635–641, https://doi.org/10.1109/TNNLS.2013.2292704, 2-s2.0-84897663275, 24807456.
10.1109/TNNLS.2013.2292704
CAS PubMed Web of Science® Google Scholar
8 Wang F., Parallel control: a method for data-driven and computational control, Acta Automatica Sinica. (2013) 39, no. 4, 293–302.
10.3724/SP.J.1004.2013.00293
Google Scholar
9 Lin F., Robust Control Design: An Optimal Control Approach, 2007, John Wiley& Sons.
10.1002/9780470059579
Google Scholar
10 Wang D., Liu D., Mu C., and Zhang Y., Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties, IEEE Transactions on Neural Networks and Learning Systems. (2018) 29, no. 4, 1342–1351, https://doi.org/10.1109/TNNLS.2017.2749641, 2-s2.0-85030784613, 28976325.
10.1109/TNNLS.2017.2749641
PubMed Web of Science® Google Scholar
11 Zhao J., Na J., and Gao G., Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties, Neurocomputing. (2020) 395, 56–65, https://doi.org/10.1016/j.neucom.2020.02.025.
10.1016/j.neucom.2020.02.025
Web of Science® Google Scholar
12 Gahinet P., Nemirovskii A., Laub A. J., and Chilali M., The LMI control toolbox, In Proceedings of 1994 33rd IEEE Conference on Decision and Control. (1994) 3, 2038–2041.
Google Scholar
13 Na J., Wang B., Li G., Zhan S., and He W., Nonlinear constrained optimal control of wave energy converters with adaptive dynamic programming, IEEE Transactions on Industrial Electronics. (2019) 66, no. 10, 7904–7915, https://doi.org/10.1109/TIE.2018.2880728, 2-s2.0-85056746023.
10.1109/TIE.2018.2880728
Web of Science® Google Scholar
14 Lv Y., Na J., Yang Q., Wu X., and Guo Y., Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics, International Journal of Control. (2016) 89, no. 1, 99–112, https://doi.org/10.1080/00207179.2015.1060362, 2-s2.0-84938632734.
10.1080/00207179.2015.1060362
Web of Science® Google Scholar
15 Modares H. and Lewis F. L., Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning, IEEE Transactions on Automatic Control. (2014) 59, no. 11, 3051–3056.
10.1109/TAC.2014.2317301
Web of Science® Google Scholar
16 Na J. and Guido H., Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems, IEEE/CAA Journal of Automatica Sinica. (2014) 1, no. 4, 412–422, https://doi.org/10.1109/JAS.2014.7004668, 2-s2.0-84921381070.
10.1109/JAS.2014.7004668
Google Scholar

All articles

Adaptive Robust Control for Uncertain Systems via Data-Driven Learning

Abstract

1. Introduction

2. Preliminaries and Problem Formulation