Volume 2024, Issue 1 4499481

Research Article

Open Access

Simultaneous Model Change Detection in Multivariate Linear Regression With Application to Indonesian Economic Growth Data

Wayan Somayasa,

Corresponding Author

Wayan Somayasa

[email protected]

orcid.org/0000-0002-1604-3457

Department of Mathematic, Halu Oleo University, Jl. H.E.A. Mokodompit, Kendari 93232, Indonesia uho.ac.id

Search for more papers by this author

Muhammad Kabil Djafar,

Muhammad Kabil Djafar

orcid.org/0009-0006-6872-8737

Department of Mathematic, Halu Oleo University, Jl. H.E.A. Mokodompit, Kendari 93232, Indonesia uho.ac.id

Search for more papers by this author

Norma Muhtar,

Norma Muhtar

orcid.org/0009-0008-8275-6261

Department of Mathematic, Halu Oleo University, Jl. H.E.A. Mokodompit, Kendari 93232, Indonesia uho.ac.id

Search for more papers by this author

Desak Ketut Sutiari,

Desak Ketut Sutiari

orcid.org/0000-0003-2194-3588

Department of Medical Electrical Engineering, Mandala Waluya University, Anduonohu, Kendari 93232, Indonesia

Search for more papers by this author

Wayan Somayasa,

Corresponding Author

Wayan Somayasa

[email protected]

orcid.org/0000-0002-1604-3457

Department of Mathematic, Halu Oleo University, Jl. H.E.A. Mokodompit, Kendari 93232, Indonesia uho.ac.id

Search for more papers by this author

Muhammad Kabil Djafar,

Muhammad Kabil Djafar

orcid.org/0009-0006-6872-8737

Department of Mathematic, Halu Oleo University, Jl. H.E.A. Mokodompit, Kendari 93232, Indonesia uho.ac.id

Search for more papers by this author

Norma Muhtar,

Norma Muhtar

orcid.org/0009-0008-8275-6261

Department of Mathematic, Halu Oleo University, Jl. H.E.A. Mokodompit, Kendari 93232, Indonesia uho.ac.id

Search for more papers by this author

Desak Ketut Sutiari,

Desak Ketut Sutiari

orcid.org/0000-0003-2194-3588

Department of Medical Electrical Engineering, Mandala Waluya University, Anduonohu, Kendari 93232, Indonesia

Search for more papers by this author

First published: 10 May 2024

https://doi.org/10.1155/2024/4499481

Academic Editor: Arvind Kumar Misra

Share a link

Email
Wechat
Bluesky

Abstract

In this paper, we study asymptotic model change detection in multivariate linear regression by using the Kolmogorov–Smirnov function of the partial sum process of recursive residuals. We approximate the rejection region and also the power function of the test by establishing a functional central limit theorem for the sequence of the partial sum processes of the recursive residuals of the observations. When the assumed model is true, the limit process is given by the standard multivariate Brownian motion which does not depend on the regression functions. However, when the assumed model is not true (some models change), the limit process is represented by a vector of deterministic trend plus the standard multivariate Brownian motion. The finite sample size rejection region and the power of the test are investigated by means of Monte Carlo simulation. The simulation study shows evidence that the proposed test is consistent in the sense that it attains the power larger than the size of the test when the hypothesis is not true. We also demonstrate the application of the proposed test method to Indonesian economic growth data in which we test the adequacy of three-variate low-order polynomial model. The test result shows that the growth of the Indonesian economy is neither simultaneously constant nor linear. The test has successfully detect the appearance of a change in the model which is mainly caused by the COVID-19 pandemic in 2020.

1. Introduction

As stated in Fujikoshi [1], He et al. [2], Somayasa [3], and Somayasa, Ruslan, and Sutiari [4], multivariate linear regression has been applied in many fields of study including agriculture, economics, geology, biology, and chemistry, among others. One important part of statistical modelling using multivariate linear regression is checking whether each component of the mean of the response vector can be adequately represented by a set of regression functions or whether we need more functions to represent some components of the mean of the response vector. This intention can be realized in practice by conducting simultaneous detection whether there are some changes in regression model. Cotos-Yéñez, Pérez-González, and González-Manteiga [5], Fujikoshi [1], and Zimmerman [6] called this step as model check. In other literatures, it also frequently called goodness of fit test for regression (see Das [7]).

Methods for change detection in multivariate linear regression has been intensively studied. They are commonly developed based on the residuals of the responses. Likelihood ratio test using modified Wilk’s lambda statistic has been documented in the literatures of multivariate linear regression (cf. [1, 2]). This method is applicable only when the response vector is normally distributed. Somayasa et al. [8] developed asymptotic model check method for multivariate linear regression with spatial observations on a compact rectangle based on the partial sum process of the vector of ordinary least squares residuals. The limit process obtained in [8] has been derived analytically by using some transformation theorems and the linear property of the partial sum operator. However, as it can be seen therein, the limit process appeared as an intricate function of the standard multivariate Brownian motion. It depends not only on the design of experiment but also on the assumed regression functions. By this reason, the quantiles of the Kolmogorov–Smirnov as well as the Cramér–von Mises type test statistics can not be computed analytically. This clearly restricted the application of the proposed method in practice. Although the limit process has a complex structure, nevertheless it is geometrically interpretable as a projection of the standard multivariate Brownian motion on its kernel space. This simple structure gives advantage in analyzing the power of the test when the assumed model is not true (see Somayasa [9]).

Partial sum method has been initially investigated for the purpose of quality monitoring in production process. The application of the method has been continuously developed to the problem of monitoring structural change for regression by defining a test based on the partial sum process of recursive residuals. Recent publications related to this context are Jiang and Kurozumi [10], Dao [11], and Otto and Breitung [12], among others. Groen, Kapetanios, and Price [13] pioneered the extension of the application of the method for monitoring structural change in multivariate linear regression based on the partial sum process of the vector of recursive residuals. However, their approaches required a normal distribution assumption attached to the vector of the errors, so that they are not purely asymptotic in the strictly sense. When the vector of random errors in the model is multivariate normally distributed, then the vector of recursive residuals builds a sequence of mutually independent random vectors, so that the limit distribution of the corresponding sequence of the partial sum processes of the vector of recursive residuals can be immediately obtained. This means that the application of the method in the practice must be started with a goodness of fit test or diagnostic check regarding the normality of the vector of recursive residuals.

In this paper, we study the application of the partial sum process of multivariate recursive residuals in asymptotic model check for multivariate linear regression defined on a closed and bounded experimental region. For that, we need to establish a functional central limit theorem for the partial sum process of recursive residuals obtained from multivariate linear regression under more general setting then that defined in Groen, Kapetanios, and Price [13]. It is worth mentioning that the results presented in [10–13] have been derived solely for univariate times series linear regression, where the experimental region is given by the set of positive integers which is not closed and bounded. In contrast to time series, when the experimental region is compact, we have triangular arrays of observations. For that, we need to establish a functional central limit theorem applied to triangular arrays of observations drawn from a multivariate linear regression model. To the knowledge of the authors, this result has not been yet documented elsewhere. It will require more effort on one hand by the existence of the correlation among the components of the response vector, but on the other hand, linear regression is attractive statistical tool which is applied in many areas such as in response surface methodology, chemical industry, and mining industry, among others.

The paper is organized as follows. In Section 2, we discuss literature review. Multivariate regression model with univariate experimental region together with the definition of the test statistic for checking the adequacy of the model is defined in Section 3. The limit process of the sequence of the partial sum processes of recursive residuals is investigated in Section 4 for the situation under H₀ as well as under H₁. The proofs of the results are postponed to the appendix. Section 5 discusses the results of numerical simulation. In Section 6, we demonstrate the application of the proposed test method to real data which is the Indonesian economic growth data. Some conclusions and remarks are presented in Section 7.

2. Literature Review

Model check or model change detection plays important role in every serious empirical model building for a random phenomenon. As in geostatistics, Bassani and Costa [14] informed that optimum prediction (kriging) of the spatial process in unobserved geographical positions depends on the adequateness of a proposed regression model. Myers, Montgomery, and Anderson-Cook [15] stated that in response surface methodology, the validity of an assumed regression model determines the level of accuracy of the optimum condition of a production process under study. This means that to be able to obtain accurate prediction result, we have to carry out model change detection before the assumed model is used in the prediction (see also Huang and He [16]).

As documented in Fahrmeir et al. [17] and Zimmerman [6], model check or model change detection in multivariate liner regression can be conducted in several various ways: graphical method, Akaike information criterion (AIC), and likelihood ratio test by making use of Wilk’s lambda statistic. All methods have similar approach in that they are conducted by investigating the residuals of the observations. As a classical method, the decision obtained by graphical method is subjective so that this method is rarely used in the practice. AIC and likelihood ratio test are two inferential methods that have drawback in the application in that they are conducted under the assumption that the observations are normally distributed (see Fujikoshi [1] and He et al. [2]).

Bischoff and Gegg [18] proposed a purely asymptotic test method for detecting change in multivariate linear regression based on the partial sum process of ordinary least squares residuals. This innovative approach successfully incorporates the theory of high-dimensional stochastic process especially high-dimensional Gaussian process in the statistical inference. Many authors proposed test method based on the partial sums of recursive residuals instead of ordinary least squares residuals. Recently, Dao [11] studied the application of this approach for condition monitoring and fault diagnosis of wind turbines. Jiang and Kurozumi [10] investigated the power properties of the test based on the modified partial sum process of recursive residuals. Otto and Breitung [12] applied the partial sum method for testing and monitoring structural change of COVID-19.

To the best knowledge of the authors, the only work that investigated the application of the partial sum process of recursive residuals obtained from multivariate linear regression observed over time is that written by Groen, Kapetanios, and Price [13]. As it has been already mentioned in Section 1, they derived the limit process under multivariate normally distributed random error which clearly restricted the application of the method. Motivated by [13], in this work, we propose asymptotic test method based on the partial sum process of recursive residuals with application to simultaneous model change detection in multivariate linear regression of Indonesian economic growth data.

As defined in Bonokeling et al. [19], economic growth is an increase in the production of economic goods and services in one period of time compared with a previous period. Economic growth of a country reflects and measures the ability of the government in developing the economy of the corresponding country. Economic growth is commonly measured in terms of the increase in aggregated market value of additional goods and services produced which is measured based on the gross domestic product (GDP).

According to the study documented in Sari [20] and Febriyanti [21], there are at least four variables that frequently influence the economic growth of a country, namely, investment, government expenditure, export, and import. Each variable has a different impact on economic growth. While investment, government spending, and export positively affect economic growth, import negatively affects economic growth.

As it has been quoted in [19], other factor that can cause negative influence to the economic growth is disaster, such as COVID-19 outbreaks. The COVID-19 has destroyed world economy in 2020 which has brought the world to worst economic recession. Indonesia is one of more than 210 countries in the world that has been hit by the COVID-19 pandemic. Muhyiddin and Nugroho [22] reported that this situation has caused the Indonesian economic grew negatively in the second, third, and fourth quarters of 2020 after a positive growth achieved in the first quarter of 2020. In this work, we aim to check asymptotically that the COVID-19 causes change in the model of the Indonesian economic growth so that it can not be modelled anymore using low-degree polynomial over time.

3. Model Definition

We consider a nonparametric multivariate regression

(1)

where

is the random response vector,

is the true but unknown vector of regression functions whose components are assumed to be continuous and of bounded variation on

, and

is the random error vector with

and

. We assume throughout that Σ is a positive definite matrix. Let s₁, ⋯, s_m be linearly independent regression functions in L₂(P₀), where P₀ is the Lebesgue measure on the measure space

. Our goal is to find an asymptotic simultaneous monitoring procedure to check whether or not there are some changes in the regression models. More specifically, we aim to investigate a test procedure for the hypotheses H₀ : g_i ∈ W∀ i ∈ {1, ⋯, p} versus H₁ : ∃i ∈ {1, ⋯, p}, such that

, where W is a linear subspace in L₂(P₀) generated by the regression functions {s₁, ⋯, s_m}. In contrast to the classical inference method for multivariate linear regression, in this work, we do not need normal assumption for the error vector.

Suppose Model 1 is observed independently over an equidistant experimental design on D, with triangular array of design points, given by

( )

Correspondingly, let Y_nj≔Y(x_nj), g_nj≔g(x_nj), and

. Then, the triangular array of observations Y_nj satisfies the model

(2)

Next, for j = m + 1, ⋯, n, let γ_nj be the subset of γ_n consisting of the first j design points. Associated with γ_nj, we define the following j-dimensional vectors as follows:

( )

with

and Cov(ε⁽ⁱ⁾(γ_nj)) = σ_iiI_j, where I_j is the j × j identity matrix. The realization of Model 1 as well as Model 2 when observed on γ_nj can be written as

(3)

where

( )

with

and

Let W_j×m be the design matrix of Model 3. That is, W_j×m is a j × m matrix, defined by

( )

whose k-th column is given by s_k(γ_nj), where we define

, for k = 1, ⋯, m. If H₀ is true, then we have the following multivariate linear regression model

(4)

where B is the m × p matrix of unknown parameters, defined by

( )

The least squares estimator of B in Model 4 based on the first j vector of observations can be computed by the following formula:

(5)

Hence, by the definition of the component-wise projection, we have for i = 1, ⋯, p,

(6)

We note that index j in

as well as in

presented in Equations (5) and (6) means that the estimation is based on the first j observations. By following the univariate case, we define the p-dimensional recursive residual of Model 4 as

(7)

where

( )

with

For the purpose of testing the hypotheses defined above, we investigate the sequence of the partial sum processes of the p-dimensional recursive residuals (Equation 7) by transforming the random matrix

( )

into a sequence of p-dimensional stochastic processes {Q_n−m(U_{p×(n − m)})(x): x ∈ D}, defined by

(8)

where

and

, for j = 1, ⋯, m. Let us call {Q_n−m(U_{p×(n − m)})(x): x ∈ D} throughout the paper p-dimensional recursive residual partial sum process (RRPSP). By the definition, for every n ≥ 1, Q_n−m(U_{p×(n − m)}) builds a stochastic process with sample path in the space of p-dimensional vector of continuous functions

. We define a test using the Kolmogorov–Smirnov functional, given by

(9)

It is clear that the wider the dispersion of the assumed model to the true-unknown model, the larger the value of will be. This means that the statistic in Equation (9) measures the discrepancy between the true and the assumed model. By this reason, can be reasonably used as a test statistic in detecting the occurrence of some changes in the model. We will reject H₀ when is large. For that, the limit distribution of under H₀ as well as under H₁ needs to be investigated.

We notice that the partial sum process (Equation 8) differs to those defined in [13] in that they did not include the rest term of Equation (8) in their definition. Consequently, the process defined in [13] has sample path in the space of p-dimensional right continuous functions on D with left limit, denoted by , instead of the space .

4. Approximation to the Test Statistics

Since the exact distribution of is mathematically not tractable, we investigate their limit distribution. We firstly obtain the limit process of Q_n−m(U_{p×(n − m)}) by applying Theorem 7.5 of Billingsley [23] (see also Theorem 1.5.4 of Van der Vaart and Wellner [24]).

By substituting

( )

into Equation (7), the vector of the recursive residuals can be expressed as follows:

( )

Since we have

and

, then by substituting these two equations into the preceding one, we get

( )

Hence, for j = m + 1, ⋯, n, it holds

(10)

Expression (10) shows that for every j = m + 1, ⋯, n, there exists a column vector

, defined by

( )

where

( )

The column vector

satisfies

(11)

So the p-dimensional vector of recursive residuals u_nj can be written as

(12)

By Equation (12), we get E(u_nj) = 0, and by Equation (11), it holds

( )

Thus, builds a sequence of uncorrelated p-dimensional random vectors with and . If the random error vector is distributed as N_p(0, Σ), then u_nj is also distributed as N_p(0, Σ). Hence, under such condition, the set constitutes a sequence of independent and identically distributed random vectors.

Furthermore, by Equation (12), there exists an (n − m) × ndimensional lower triangular matrix C, say, where

( )

with the properties CC^Τ = I_n−m, such that

(13)

For i = 1, ⋯, n − m and k = 1, ⋯, n, let c_ik be the entry of C in the i-th row and k-th column. Then, by the definition, c_ik can be concretely written as

(14)

for k = 1, ⋯, m + i, and c_ik = 0, for k = m + i + 1, ⋯, n.

Now we are in the position to state the limit distribution of under H₀. The proof is given in the appendix.

Theorem 1. Let the regression functions s₁, ⋯, s_m be linearly independent in L₂(P, D), continuous, and have bounded variation on D. Suppose that , for i = 1, ⋯, p, n ≥ 1 and j = m + 1, ⋯, n. If H₀ is true, then for n⟶∞, converges in distribution to B_p. Thereby, B_p is the standard p-variate Brownian motion on D. That is, a centered p-variate Gaussian process with the covariance function given by

( )

We consult the reader to Durrett [25] for the definition of the standard p-variate Brownian motion B_p.

By Theorem 1, it is clear for arbitrary fixed x ∈ D, and the sequence of random vectors

( )

converges in distribution to a p-variate normal distribution N_p(0, A_x), where

( )

So that by applying Theorem 2.7 in [23] or Theorem 1.3.6 in [24] (continuous mapping theorem), the quadratic form defined by Equation (15)

(15)

converges in distribution to a chi-square distribution with p degrees of freedoms, denoted by χ²(p). Furthermore, Theorem 1 gives us an approximation to the probability distribution of the Kolmogorov–Smirnov type test statistics

when H₀ is true allowing us in approximating the rejection region of the test. It is constructed based on the probability distribution of the statistic sup_x∈D‖B_p(x)‖. For α ∈ (0, 1), an asymptotic size α-test will reject H₀, if and only if

, where ν_1−α is a positive constant that satisfies the condition P{sup_x∈D‖B_p(x)‖ ≥ ν_1−α} = α. In practice, Σ is usually unknown. It is estimated under H₀ by a consistent estimator

, defined by

(16)

where for j = 1, ⋯, n,

( )

This means that Equation (16) is computed based on the p-dimensional ordinary least squares residuals.

To be able to assess the power of the test, we need to find out the limit distribution of the test statistic when H₀ is not true. For that, we consider the scaled version of Model 2, defined by

(17)

When H₀ is not true, the model can be written as

(18)

Let

be the vector of the recursive residuals when H₀ is not true. Then, by recalling Equations (10) and (17), we have

( )

By substituting

( )

obtained from Equation (18) into the preceding equation, we get

(19)

where u_nj is the recursive residuals under H₀. It is clear that when H₀ is true,

defined in Equation (19) coincides with u_nj. This can be obtained by replacing the terms

and

by B^⊤s(x_nj) and

, respectively.

The limit process of , for when H₀ is not true, is summarized in Theorem 2. The proof is postponed to the appendix.

Theorem 2. Let the regression function s₁, ⋯, s_m be linearly independent in L₂(P₀, D), continuous, and have bounded variation on D. Suppose that the model

( )

is observed over γ_n. When H₀ is not true, then under the condition of Theorem 1,

converges in distribution to the process h_G + B_p, as n⟶∞, where for every x ∈ D,

( )

The convergence result presented in Theorem 2 provides an approximation to the power function of the test. Let

be the power of the test evaluated in a regression function vector g. That is,

(20)

Then, by Theorem 2 and the well-known continuous mapping theorem,

can be approximated by the following boundary crossing probability:

(21)

When H₀ is true, the power computed in Equations (20) and (21) will reduce to the probability of type I error of the test of size α. Conversely, when H₀ is not true, both determine the probability of the rejection of H₀ provided that g is a true regression function vector. In other words, and Ψ(g) supply information regarding the ability of the test in detecting the existence of model change. A good test should own the general property stated in Lehmann and Romano [26] and Rasch and Schott [27]. That is, the larger the power under H₁ is, the better is the test.

5. Numerical Simulation

In this section, we report on and discuss numerical simulation to study the finite sample size performance of the convergence result and the behavior of the test investigated in the preceding section. To be more specific, we consider p-variate polynomial model defined by s_k(x)≔x^k−1, for k = 1, ⋯, m, with experimental region restricted to the unit interval [0,1] and equidistance design points of size n. So that the design matrix of the j-th part of the model for j = m + 1, ⋯, n is given by

( )

Polynomial model clearly satisfies the condition of Theorem 1.

5.1. Simulation Under H₀

We simulate the convergence result formulated in Theorem 1 by demonstrating the finite sample size attitude of Equation (15) based on two scenarios under H₀. In the first scenario, we generate the samples from the three-variate polynomial model of degree 1 (m = 2), that is, three-variate straight line regression model. The three-dimensional error vectors are generated independently from the three-variate centered normal distribution with the covariance matrix

( )

The simulation result for n = 40 is devoted in Figure 1, where Figure 1(a) is for x = 0.5 and Figure 1(b) is for x = 1. The graph of the empirical cumulative distribution function (ECDF) of the quadratic form of the three-variate RRPSP is scattered using the step line. The curve indicated by the dotted line is for the cumulative distribution function (CDF) of χ²(3). All graphs are generated using R under 10000 runs. In the second scenario, we simulate three-variate quadratic model (m = 3) defined on the unit interval [0, 1]. The error vectors are generated independently from the three-variate centered normal distribution having the same covariance matrix as in the first case. The graphs of the ECDF of the quadratic form of the RRPSP simulated for n = 40, x = 0.5, and x = 1 together with the graphs of the CDF of χ²(3) are presented in Figures 2(a) and 2(b), respectively.

Details are in the caption following the image — **Figure 1 (a) n = 40 and x = 0.5**
Open in figure viewer PowerPoint

The graphs of the ECDF of the quadratic form of the three-variate RRPSP for first-order model (step line) and the CDF of χ²(3) (dotted line).

The simulation results show that independent to the proposed regression model and to the chosen value of x ∈ {0.5, 1}, χ²(3) gives a good approximation to the distribution of the quadratic form of the three-variate RRPSP.

Next we approximate the finite sample size lower quantile of the Kolmogorov–Smirnov type statistic

under H₀ by Monte Carlo simulation. For this purpose, we generate the samples based on the p-variate polynomial model of degree one (m = 2) and two (m = 3), for p = 2, 3, 4, where the error vectors are generated independently from the p-variate normal distribution N_p(0, Σ_p), with the following corresponding covariance matrix:

( )

Table 1 consists of the simulated lower quantiles of for α = 65%, 75%, 85%, 90%, 95%, and 99%, simulated for n = 25, 30, 35, 40, 45, and 50, under 100000 runs. These quantiles are used in constructing the finite sample size rejection region of the test. The R coding for generating the graphs and the lower tail quantiles can be obtained by request to the authors.

Table 1. The approximated lower quantiles of

. The simulation results are based on 100000 runs.

n
p = 2
25	1.9229	2.1770	2.3591	2.6438	3.2097
30	1.9263	2.1794	2.3583	2.6389	3.2043
35	1.9270	2.1839	2.3680	2.6518	3.2108
40	1.9305	2.1822	2.3639	2.6415	3.1978
45	1.9362	2.1909	2.3712	2.6423	3.1923
50	1.9282	2.1829	2.3618	2.6444	3.1902
p = 3
25	2.2615	2.5203	2.6999	2.9738	3.5337
29	2.2611	2.5176	2.6971	2.9763	3.5235
35	2.2693	2.5232	2.7050	2.9893	3.5511
40	2.2643	2.5208	2.7018	2.9770	3.5278
45	2.2674	2.5241	2.7038	2.9840	3.5250
50	2.2690	2.5218	2.6988	2.9785	3.5211
p = 4
25	2.4970	2.7477	2.9298	3.2032	3.7301
30	2.4979	2.7499	2.9302	3.1978	3.7332
35	2.5054	2.7545	2.9303	3.2013	3.7448
40	2.5075	2.7605	2.9392	3.2172	3.7514
45	2.5121	2.7672	2.9456	3.2205	3.7544
50	2.5174	2.7699	2.9490	3.2157	3.7573

5.2. Simulation Under H₁

We simulate the case of testing H₀ : g_i ∈ W, for all i ∈ {1, 2, 3}, versus

, for some i ∈ {1, 2, 3}, where W = [s₁, s₂], with s_k(x) = x^k−1, for x ∈ [0, 1] and k = 1, 2. This means that under H₀, we assume a three-variate first-order model on the unit interval [0, 1]. In this simulation, the samples are generated based on the following three-variate scaled regression model

( )

where the error vector is generated independently from the three-variate normal distribution N₃(0, Σ), with Σ is given by

( )

When the constants ρ, γ, and δ are simultaneously set to zero, the condition under H₀ is fulfilled which means that there are no changes in the model. In such a case, the power of the test should be equal to the size of the test 5%. However, when at least one of these constants are nonzero, then there exists some i ∈ {1, 2, 3} such that . In other words, there exist some models that change simultaneously.

Some numerical empirical powers of size 5% test simulated for n = 30 and n = 40 with several chosen values of ρ, γ, and δ under 10000 runs are given in Tables 2 and 3, respectively. The tables present the empirical power of four types of alternative:

1.
H₁₁: (when ρ ≠ 0, γ = 0, and δ = 0)
2.
H₁₂: (when ρ = 0, γ ≠ 0, and δ = 0)
3.
H₁₃: (when ρ = 0, γ = 0, and δ ≠ 0)
4.
H₁₄: for all i = 1, 2, 3 attained when ρ ≠ 0, γ ≠ 0, and δ ≠ 0. It can be seen therein that when ρ = 0, γ = 0, and δ = 0, the power of the test attains the value 0.04967 for n = 30 and attains the value 0.05340 for n = 40, which are approximately equal to 5%. So the size of the test is achieved. The graphs of the simulated power function of size 5% test associated with the alternatives H₁₁, H₁₂, H₁₃, and H₁₄ are scattered in Figure 3 simulated for n = 40 under 10000 runs. The graphs indicate increasing power functions. They have a common feature in that the larger the values of ρ, γ, and δ are, the greater the powers. All powers reach the size of the test at the starting points, that is when ρ, γ, and δ are simultaneously fixed to zero. Tables 2 and 3 and Figure 3 show that the power of the test for such alternatives gets large as the model moves away from H₀. This means that the test has good power in detecting the existence of some changes in the model when some changes exist. By referring to Ghosh, Delampady, and Samanta [28], it can be concluded based on the simulation results that the test is unbiased

Table 2. Simulated rejection probabilities of size α = 0.05 test for three-variate first-order model computed for several varied values of ρ, γ, and δ with n = 30.

ρ	γ	δ		ρ	γ	δ
n = 30
0.0	0.0	0.0	0.0497	0.0	0.0	0.5	0.0495
4.5	0.0	0.0	0.0576	0.0	0.0	2.5	0.0609
10.5	0.0	0.0	0.0875	0.0	0.0	5.5	0.1091
12.5	0.0	0.0	0.1109	0.0	0.0	8.5	0.2079
14.5	0.0	0.0	0.1340	0.0	0.0	12.5	0.4353
16.5	0.0	0.0	0.1622	0.0	0.0	15.0	0.6061
20.0	0.0	0.0	0.2259	0.0	0.0	20.0	0.8722
25.0	0.0	0.0	0.3434	0.0	0.0	25.0	0.9801
30.0	0.0	0.0	0.4911	0.0	0.0	30.0	0.9985
40.0	0.0	0.0	0.7722	0.5	0.5	0.5	0.0526
0.0	0.5	0.0	0.0516	2.5	2.5	2.5	0.0948
0.0	4.0	0.0	0.0735	3.5	3.5	3.5	0.1405
0.0	10.5	0.0	0.2512	5.5	5.5	5.5	0.3228
0.0	15.0	0.0	0.4995	6.5	6.5	6.5	0.4430
0.0	20.0	0.0	0.7810	8.5	8.5	8.5	0.7035
0.0	30.0	0.0	0.9917	10.0	10.0	10.0	0.8540
0.0	35.0	0.0	0.9994	12.5	12.5	12.5	0.9725
0.0	40.0	0.0	0.9999	15.0	15.0	15.0	0.9973

Table 3. Simulated rejection probabilities of size α = 0.05 test for three-variate first-order model computed for several varied values of ρ, γ, and δ with n = 40.

ρ	γ	δ		ρ	γ	δ
n = 40
0.0	0.0	0.0	0.0534	0.0	0.0	0.5	0.0497
4.5	0.0	0.0	0.0576	0.0	0.0	2.5	0.0590
10.5	0.0	0.0	0.0913	0.0	0.0	5.5	0.1047
12.5	0.0	0.0	0.1063	0.0	0.0	8.5	0.1995
14.5	0.0	0.0	0.1290	0.0	0.0	12.5	0.4215
16.5	0.0	0.0	0.1586	0.0	0.0	15.0	0.5840
20.0	0.0	0.0	0.2153	0.0	0.0	20.0	0.8565
25.0	0.0	0.0	0.3366	0.0	0.0	25.0	0.9754
30.0	0.0	0.0	0.4747	0.0	0.0	30.0	0.9973
40.0	0.0	0.0	0.7576	0.5	0.5	0.5	0.0533
0.0	0.5	0.0	0.0493	2.5	2.5	2.5	0.0958
0.0	4.0	0.0	0.0742	3.5	3.5	3.5	0.1450
0.0	10.5	0.0	0.2431	5.5	5.5	5.5	0.3167
0.0	15.0	0.0	0.4744	6.5	6.5	6.5	0.4339
0.0	20.0	0.0	0.7723	8.5	8.5	8.5	0.6818
0.0	24.0	0.0	0.9244	10.5	10.5	10.5	0.8393
0.0	30.0	0.0	0.9912	12.5	12.5	12.5	0.9681
0.0	35.0	0.0	0.9988	15.0	15.0	15.0	0.9963

6. Application

In this section, we consider Indonesian economic growth data provided by the Central Bureau of Statistics of the Republic of Indonesia (Badan Pusat Statistik (BPS)) [29]. The data consists of quarterly simultaneous measurements of the total value of export, total value of investment, and the GDP of Indonesia measured starting from the first quarter of 2015 until the first quarter of 2022. The first observation was recorded at the end of March 2015, the second one was at the end of June 2015, and so on. The last one which is the 29th observation was recorded at the end of March 2022. All variables are measured in IDR trillion (see BPS [29]). The data considered in this work can also be obtained by request to the authors. The sample coefficient correlation matrix of the three variables based on the data of size 29 is given by

( )

The matrix indicates the existence of strong correlations among the three variables. These tendencies are also visualized graphically in Figure 4 which is the scatter matrix of the data. The strongest correlation appears between the total value of investment and the GDP, namely, 0.90174, whereas the weakest correlation is that between the total value of export and the GDP, that is, 0.67968. By reviewing this fact, the statistical analysis of the three variables must be handled simultaneously applying multivariate method.

We aim to model the data using three-variate polynomial regression model and conduct a test for checking the adequacy of a proposed model based on the partial sum process of the recursive residuals. For that we interpret, the data as a realization of 29 independent three-dimensional vector responses admitting three-variate regression model observed on equidistance design points {1/29, 2/29, ⋯, 1} over the unit interval [0,1]. The scatter plot of the total value of investment, the total value of export, and the GDP are presented in Figure 5. The existence of deep peaks in the scatter plot of the data appeared as the impact of the contraction of the Indonesia’s economic growth during the COVID-19 pandemic in 2020. In the normal situation, the total values of export, import, and the GDP should smoothly increase so that they can be modelled by means of lower degree polynomial functions. We infer that the COVID-19 pandemic will change the model in which they need to be modelled by means of higher degree polynomial model. The existence of these changes will be detected by using the proposed test procedure. The computation result of the test statistic for several low-order three-variate polynomial regression models is presented in Table 4. The R coding for computing can be obtained by request to the author. Since the lower 95% quantile of the distribution of for n = 29 and p = 3 takes the value 2.9763 (see Table 1), then the asymptotic size 5% test will reject constant model and first-order model, whereas second-order model and third-order model will not be rejected. The test result seems to be synchronous with the scatter plot of the data in that constant and first-order models are not plausible for the Indonesia’s economic growth data during the period of March 2020 until March 2022.

Table 4. The critical values of KS type test for constant, first-order, second-order, and third-order polynomial model.

Model	Critical value
Constant	9.3802
First order	3.0876
Second order	1.8136
Third order	2.8870

7. Conclusion

A limit theorem for the sequence of a random function defined by the RRPSP of multivariate linear regression has been established. The result can be applied in detecting the existence of changes in regression model. In the case of no change, we successfully obtain the limit. It has been given by the standard multivariate Brownian motion {B_p(x): x ∈ D} which is a model free limit process which depended only on the dimension of the response vector. When there exists at least one change, the limit process has been obtained as a vector of trends plus the standard multivariate Brownian motion, that is, {h_G(x) + B_p(x): x ∈ D}. We have built simulation to approximate the finite sample size critical values as well as the power of the Kolmogorov–Smirnov type test. The simulation showed that the test based on the multivariate RRPSP leads to an unbiased test with good power. This test method can be implemented in computer using statistical package like R, so that the computation is quite fast.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This work was supported in part by the Indonesian Ministry of Research, Technology and Higher Education through the KLN and Publikasi Internasional Research Grant 2019.

Appendices

Proof of Theorem 1

Without loss of generality, we assume for the rectangle D = [a, b], that a = 0 and b = 1, with x_nj = j/n, for j = 1, ⋯, n. According to the well-known Donsker theorem (cf. Billingsley [23] and Van der Vaart and Wellner [24]), we need to show that the finite dimensional distributions of

converges to those of B_p and that

is tight. For arbitrary q ≥ 1, let 0 ≤ x₁ ≤ ⋯≤x_q ≤ 1 be q different points in [0, 1] and κ₁, ⋯, κ_q be nonzero constants. We show that

converges in distribution to

which follows a centered p-variate normal distribution with the covariance matrix

. By recalling Equation (12) and by defining a notation

( )

we get the following expressions

( )

Hence, we have

(A.1)

where

( )

Thus, by Equation (A.1), the problem now reduces to that of showing

converges in distribution to

. Since

are independent, with

and

, by the well-known Lindeberg–Feller multivariate central limit theorem, it is suffices to show that the covariance of

converges to that of

and it satisfies Lindeberg–Feller condition. That is, for every ε > 0,

( )

It is clear that

and

( )

where

( )

By recalling Equation (11) and the fact that

and

converge to zero, then the right-hand side of the last equation converges as n⟶∞ to

(A.2)

Expression (A.2) is the formula for the covariance function of

. Next, let ε > 0 be arbitrary small number and let M≔max_m+1≤j≤nmax_1≤ℓ≤n|nc_jℓ|. By the definition, we have

( )

Then, we get by applying the well-known bounded convergence theorem

( )

Next we show that the process Q_n−m(U_{p×(n − m)}) is tight. Since the modulus of continuity of the sequence Q_n−m(U_{p×(n − m)}) satisfies

( )

the process Q_n−m(U_{p×(n − m)}) is tight only if

is tight, for all i = 1, ⋯, p. By some characterizations of tightness in the space

, we only need to show that

. The result follows by the assumption that

, for i = 1, ⋯, p, n ≥ 1, and j = m + 1, ⋯, n. The proof finishes.

Proof of Theorem 2

By recalling the definition of the operator Q_n−m, we have

(A.3)

where for j = 1, ⋯, ⌊nx⌋, we define

( )

By conducting a little algebraic manipulation, Equation (A.3) can be further written as

(A.4)

Let P_n, n ≥ 1 be a sequence of discrete probability measure defined on the σ-field

associated with the sequence of the experimental design γ_n, given by

( )

Then, we can write Equation (A.4) in terms of the integrals with respect to P_n as follows:

(A.5)

It is clear that P_n converges in distribution to the Lebesgue measure P₀ mentioned in Section 2, which is defined by P₀((0, x])≔x. Moreover, since the components of g and s are bounded and continuous on D, q_n;x/n converges to zero and e_jj converges to one, as n⟶∞; then, by applying either Theorem 2.1 in [23] (Portmanteau theorem) or Theorem 1.3.4 in Van der Vaart and Wellner [24], all integrals presented in the right-hand side of Equation (A.5) converge as n⟶∞ to the integral with respect to P₀. So by recalling Theorem 1, we get

( )

We notice that the p × p symmetric matrix ∫_{[0, z]}s(u)s^⊤(u)P₀(du) is invertible since the columns are linearly independent (see also Somayasa and et al. [8]), establishing the proof.

Open Research

Data Availability Statement

The PDF data used to support the findings of this study were supplied by BPS under license and so cannot be made freely available. Requests for access to these data should be made to Wayan Somayasa ([email protected]).

References

1 Fujikoshi Y., Likelihood ratio tests in multivariate linear model, Applied Linear Algebra in Action. (2016) 139, https://doi.org/10.5772/62277.
10.5772/62277
Google Scholar
2 He Y., Jiang T., Wen T., and Xu G., Likelihood ratio test in multivariate linear regression: from low to high dimension, Statistica Sinica. (2021) 31, 1215–1238, https://doi.org/10.5705/ss.202019.0056.
10.5705/ss.202019.0056
Web of Science® Google Scholar
3 Somayasa W., Approximated optimum condition of second order response surface model with correlated observations, Journal of Physics: Conference Series. (2016) 725, article 012002, 1–10, https://doi.org/10.1088/1742-6596/725/1/012002, 2-s2.0-84987761897.
10.1088/1742?6596/725/1/012002
Google Scholar
4 Somayasa W., Ruslan R., and Sutiari D. K., Assessing the optimum condition of multivariate second order response surface model through the asymptotic inference of the eigenvalues, AIP Conference Proceedings. (2022) 2668, article 070005.
10.1063/5.0112594
Google Scholar
5 Cotos-Yéñez T. R., Pérez-González A., and González-Manteiga W., Model checks for nonparametric regression with missing data: a comparative study, Journal of Statistical Computation and Simulation. (2016) 86, no. 16, 3188–3204, https://doi.org/10.1080/00949655.2016.1156114, 2-s2.0-84981534751.
10.1080/00949655.2016.1156114
Web of Science® Google Scholar
6 Zimmerman D. L., Linear Model Theory, 2020, Springer Cham, Berlin.
Google Scholar
7 Das P., Linear regression model: goodness of fit and testing of hypothesis, Econometrics in Theory and Practice, 2019, Springer, Singapore.
Google Scholar
8 Somayasa W., Wibawa G. N. A., Hamimu L., and Ngkoimani L. O., Asymptotic theory in model diagnostic for general multivariate spatial regression, International Journal of Mathematics and Mathematical Sciences. (2016) 2016, 16, https://doi.org/10.1155/2016/2601601, 2-s2.0-84988699853, 2601601.
10.1155/2016/2601601
Google Scholar
9 Somayasa W., Accessing the power of tests based on set-indexed partial sums of multivariate regression residuals, Journal of Applied Mathematics. (2018) 2018, 13, https://doi.org/10.1155/2018/2071861, 2-s2.0-85053669133, 2071861.
10.1155/2018/2071861
Google Scholar
10 Jiang P. and Kurozumi E., Power properties of the modified CUSUM tests, Communications in Statistics - Theory and Methods. (2019) 48, no. 12, 2962–2981, https://doi.org/10.1080/03610926.2018.1473598, 2-s2.0-85057539237.
10.1080/03610926.2018.1473598
Web of Science® Google Scholar
11 Dao P. B., A CUSUM-based approach for condition monitoring and fault diagnosis of wind turbines, Energies. (2021) 14, no. 11, https://doi.org/10.3390/en14113236.
10.3390/en14113236
Web of Science® Google Scholar
12 Otto S. and Breitung J., Backward CUSUM for testing and monitoring structural change with an application to COVID-19 pandemic data, Econometric Theory. (2023) 39, no. 4, 659–692, https://doi.org/10.1017/S0266466622000159.
10.1017/S0266466622000159
Web of Science® Google Scholar
13 Groen J., Kapetanios G., and Price S., Multivariate methods for monitoring structural change, Journal of Applied Econometrics. (2013) 28, no. 2, 250–274, https://doi.org/10.1002/jae.1272, 2-s2.0-84874021075.
10.1002/jae.1272
Web of Science® Google Scholar
14 Bassani M. A. A. and Costa J. F. L., Geostatistics with Data of Different Support Applied to Mining Engineering, 2022, Springer Cham, Berlin, https://doi.org/10.1007/978-3-030-80193-9.
10.1007/978-3-030-80193-9
Google Scholar
15 Myers R. H., Montgomery D. C., and Anderson-Cook C. M., Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 2016, 4th edition, John Wiley & Sons, New York.
Google Scholar
16 Huang H., He Z., and College of Management and Economics, Tianjin University, Tianjin 300072, China, A global optimization method for multiple response optimization problems, Journal of Industrial and Management Optimization. (2023) 19, no. 3, 1755–1769, https://doi.org/10.3934/jimo.2022016.
10.3934/jimo.2022016
Web of Science® Google Scholar
17 Fahrmeir L., Kneib T., Lang S., and Marx B. D., Regression: Models, Methods and Applications, 2021, Springer Cham, Berlin, https://doi.org/10.1007/978-3-662-63882-8.
10.1007/978-3-662-63882-8
Google Scholar
18 Bischoff W. and Gegg A., Partial sums process to check regression models with multiple correlated response: with an application for testing a change-point in profile data, Journal of Multivariate Analysis. (2013) 102, 281–291, https://doi.org/10.1016/j.jmva.2010.08.014, 2-s2.0-78649444390.
10.1016/j.jmva.2010.08.014
Web of Science® Google Scholar
19 Bonokeling D. E., Sholeh M., and Mispandi, The effect of investment, national government expenditure, exports, and imports on Indonesia economic growth, Jurnal Ekonomi dan Pembangunan. (2022) 30, no. 1, 56–58, https://doi.org/10.14203/JEP.30.1.2022.56-69.
10.14203/JEP.30.1.2022.56-69
Google Scholar
20 Sari M. A., Impact of investment, labor, and infrastructure on Java Island economic growth 2011–2017, Efficient: Indonesian Journal of Development Economics. (2018) 1, no. 3, 230–241, https://doi.org/10.15294/efficient.v1i3.35151.
10.15294/efficient.v1i3.35151
Google Scholar
21 Febriyanti D. F., Effect of export and import of gross domestic product in Indonesia 2008–2017, Ecoplan. (2019) 2, no. 1, 10–20, https://doi.org/10.20527/ecoplan.v2i1.13.
10.20527/ecoplan.v2i1.13
Google Scholar
22 Muhyiddin and Nugroho H., Indonesia development update a year of Covid-19: a long road to recovery and acceleration of Indonesia’s development, The Indonesian Journal of Development Planning. (2021) 5, no. 1, 1–19.
Google Scholar
23 Billingsley P., Convergence of Probability Measures, 1999, 2nd edition, John Wiley & Sons, New York.
10.1002/9780470316962
Google Scholar
24 Van der Vaart A. W. and Wellner J. A., Weak Convergence and Empirical Processes with Applications to Statistics, 2023, Springer, Berlin.
10.1007/978-3-031-29040-4
Google Scholar
25 Durrett R., Probability: Theory and Examples, 2019, Cambridge University Press, London, https://doi.org/10.1017/9781108591034.
10.1017/9781108591034
Google Scholar
26 Lehmann E. L. and Romano J. P., Testing Statitical Hypotheses, 2022, 4th edition, Springer Cham, New York.
10.1007/978-3-030-70578-7
Google Scholar
27 Rasch D. and Schott D., Mathematical Statistics, 2018, John Wiley & Sons, New Jersey.
10.1002/9781119385295
Google Scholar
28 Ghosh J. K., Delampady M., and Samanta T., An Introduction to Bayesian Analysis: Theory and Methods, 2006, Springer Science Media, Berlin.
Google Scholar
29 BPS, Produk Domestik Bruto Indonesia Triwulanan, 2022, Badan Pusat Statistik, Jakarta.
Google Scholar

All articles

Simultaneous Model Change Detection in Multivariate Linear Regression With Application to Indonesian Economic Growth Data

Abstract

1. Introduction

2. Literature Review

3. Model Definition

4. Approximation to the Test Statistics

5. Numerical Simulation

5.1. Simulation Under H₀

5.2. Simulation Under H₁

6. Application

7. Conclusion

Conflicts of Interest

Funding

Appendices

Proof of Theorem 1

Proof of Theorem 2

Open Research

Data Availability Statement

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Simultaneous Model Change Detection in Multivariate Linear Regression With Application to Indonesian Economic Growth Data

Abstract

1. Introduction

2. Literature Review

3. Model Definition

4. Approximation to the Test Statistics

5. Numerical Simulation

5.1. Simulation Under H0

5.2. Simulation Under H1

6. Application

7. Conclusion

Conflicts of Interest

Funding

Appendices

Proof of Theorem 1

Proof of Theorem 2

Open Research

Data Availability Statement

References

Figures

References

Related

Information

5.1. Simulation Under H₀

5.2. Simulation Under H₁