A New Method of Hypothesis Test for Truncated Spline Nonparametric Regression Influenced by Spatial Heterogeneity and Application
Abstract
This study developed a new method of hypothesis testing of model conformity between truncated spline nonparametric regression influenced by spatial heterogeneity and truncated spline nonparametric regression. This hypothesis test aims to determine the most appropriate model used in the analysis of spatial data. The test statistic for model conformity hypothesis testing was constructed based on the likelihood ratio of the parameter set under H0 whose components consisted of parameters that were not influenced by the geographical factor and the set under the population parameter whose components consisted of parameters influenced by the geographical factor. We have proven the distribution of test statistics V and verified that each of the numerators and denominators in the statistic test V followed a distribution of χ2. Since there was a symmetric and idempotent matrix S, it could be proved that . Matrix D(ui, vi) was positive semidefinite and contained weighting matrix W(ui, vi) which had different values in every location; therefore matrix D(ui, vi) was not idempotent. If and D(ui, vi) was not idempotent and also was a N(0, I) distributed random vector, then there were constants k and r; hence ; therefore it was concluded that test statistic V followed an F distribution. The modeling is implemented to find factors that influence the unemployment rate in 38 areas in Java in Indonesia.
1. Introduction
This study examines theoretically the multivariate nonparametric regression influenced by spatial heterogeneity with truncated spline approach. The model is the development of truncated spline nonparametric regression that takes into account geographic or spatial factors. Truncated spline is a function constructed on the basis of polynomial components and truncated components; i.e., polynomial pieces that have knot points, which can overcome the pattern of changes in data behavior. Truncated spline approach is used as a solution to solve the problem of spatial data analysis modeling; that is, the relationship between the response variable and the predictor variable does not follow a certain pattern and there is a changing pattern in certain subintervals. The response variable in the model contains the predictor variables whose respective regression coefficients depend on the location where the data is observed, due to differences in environmental and geographic characteristics between the observation sites; therefore each observation has different variations (spatial heterogeneity). Spatial is one type of dependent data, where data at a location is influenced by the measurement of data at another location (spatial dependency).
This study determines the model conformity hypothesis test between multivariable nonparametric regression that is influenced by spatial heterogeneity with truncated spline approach and multivariable nonparametric regression in general. This hypothesis test aims to determine the model that is most suitable for spatial data analysis. The test statistic was derived using the maximum likelihood ratio test (MLRT) method. The first step in this study was formulating the hypothesis to be tested and then defining the set of parameters under H0 whose components consist of parameters that are not influenced by geographical factors and the set under population parameters whose components consist of parameters influenced by geographical factors. Likelihood ratio Λ was constructed based on the maximum ratio of the likelihood function under H0 as the numerator and set under the population as a denominator. Based on the likelihood ratio test statistic V was obtained. Furthermore, the distribution of test statistic V was determined. To prove the distribution of test statistic V, we first proved that each numerator and denominator are chi square distributed.
The purpose of this study is to obtain a new method for the determination of hypothesis test of model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity versus multivariate nonparametric truncated spline regression in general. This hypothesis test aims to determine what model is most suitable for spatial data analysis.
2. Truncated Spline Nonparametric Regression Influenced by Heterogeneity Spatial
yi is a response variable at i-th location, where i = 1,2, …, n.
xpi is a p-th predictor variable at i-th location with p = 1,2, …, l.
Kph is an h-th knot point in p-th predictor variable component with h = 1,2, …, r.
βpk(ui, vi) is a polynomial component parameter of a multivariate nonparametric truncated spline regression. βpk(ui, vi) is a k-th parameter from p-th predictor variable at i-th location. δp,m+h(ui, vi) is a truncated component from multivariate nonparametric truncated spline regression. δp,m+h(ui, vi) is an l + h-th parameter in h-th knot point and p-th predictor variable at i-th location.
Theorem 1. If the regression model (2) with an error εi normally distributed with zero mean and variance σ2(ui, vi) was given Maximum Likelihood Estimator (MLE), it is used to obtain estimator and as follows.
Corollary 2. If and are given by Theorem 1, then the estimator for the regression curve is given by
Estimator of regression curve contains the polynomial components represented by matrix X and truncated components represented by matrix P [3]. If the matrix P = 0, then the estimator multivariable of nonparametric regression curve in the Geographically Weighted Regression (GWR) models with truncated spline approach, , will change to estimator polynomial parametric regression curve in the GWR model. Furthermore, if P = 0 and matrix X contains a linear function, the estimator of the multivariable spline nonparametric regression curve in the GWR model, , will change to estimator of linear parametric regression curves in the GWR model or multiple linear regression in the GWR model developed by many researchers such as Brusdon and Fotheringham [4], Fotheringham, Brunsdon, and Charlton (2003), Demsar, Fotheringham, and Charlton [5], Yan Li, Yan Jiao, and Joan A. Browder [6], Shan-shan Wu, Hao Yang, Fei Guo, and Rui -Ming Han [7], and Benassi and Naccarato [8].
This study continued the previous research [2]; in this study the test statistics that will be used in the truncated spline nonparametric regression influenced by spatial heterogeneity modeling will be found; further research continued the distribution of test statistics and rejection areas.
3. Method
The hypothesis test for model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity with nonparametric truncated spline regression is derived.
Step 1. Formulating hypothetical model:
-
H0: βpk(ui, vi) = βpk and δp,m+h(ui, vi) = δp,m+h,
-
p = 1,2, …, l; k = 1,2, …, m; h = 1,2, …, r; i = 1,2, …, n
-
-
H1: at least, there is one of βpk(ui, vi) ≠ βpk or δpm+h(ui, vi) ≠ δp,m+h,
-
p = 1,2, …, l; k = 1,2, …, m; h = 1,2, …, r; i = 1,2, …, n.
-
Step 2. Defining the set of parameters under population Ω.
Step 3. Determining estimators and which are parameters in the space under population (Ω).
Step 4. Obtaining maximum likelihood function under population (Ω).
Step 5. Defining parameter space under H0, i.e., ω.
Step 6. Determining estimators and which are parameters under H0.
Step 7. Obtaining maximum likelihood function under space H0.
Step 8. Obtaining likelihood ratio Λ.
Step 9. Obtaining test statistic V from model conformity testing.
Step 10. Specifying the distribution of numerator τ from test statistic V.
Step 11. Specifying the distribution of denominator τ∗ from test statistic V.
Step 12. Specifying the distribution of test statistic V.
Step 13. Deciding the rejection area of H0 and writing the conclusion.
4. Parameter Estimation under Space H0 and Space Population in the Model
-
H0: βpk(ui, vi) = βpk and δp,m+h(ui, vi) = δp,m+h,
-
H1: at least, there is one of βpk(ui, vi) ≠ βpk or δpm+h(ui, vi) ≠ δp,m+h.
Lemma 3. If is a parameter under population (Ω) from nonparametric spline regression with spatial heterogeneity (2), then estimator is given by
Proof. To obtain estimator we form likelihood function under population parameter space L(Ω). Therefore yi has normal distribution with mean
Furthermore, estimator is shown in Lemma 4.
Lemma 4. If is a parameter in space under population (Ω) from nonparametric spline regression with spatial heterogeneity spatial (2), then estimator which is obtained from likelihood function:
Proof. Estimator is obtained using likelihood function:
Lemma 5. If and are parameters under H0 from multivariate nonparametric truncated spline influenced by spatial heterogeneity model (2), then estimator for is given by
Proof. To obtain estimators and we form likelihood function under parameter space H0L(ω). Therefore yi has normal distribution with mean
5. Statistics Test for Truncated Spline Nonparametric Regression with Spatial Heterogeneity
The test statistic for the model conformity hypothesis test can be obtained by using Lemmas 3, 4, and 5. In the next step, we show the likelihood ratio for test statistic presented in Lemma 6.
Lemma 6. If and , respectively, are given by (32) and (40), then the likelihood ratio Λ is given by
Proof. Based on Lemmas 3, 4, and 5, and also (32) and (40), the likelihood ratio is obtained:
Given test statistic for model conformity hypothesis is presented by Theorem 7.
Theorem 7. If likelihood ratio Λ is given by Lemma 6, then test statistic for H0 versus H1 in (2) is given by
Proof. Based on Lemma 6, the likelihood ratio is as follows:
Furthermore, the distribution of statistics test V will be found.
The statistics test given in Theorem 7 is test statistics developed from the spline truncated approach in the GWR model, different from the one developed by Leung, Mei, and Zhang [9], Leung, Mei, and Zhang [10], and Mennis and Jordan [11] using GWR without using the Truncated Spine approach.
6. Distribution of Test Statistic and Critical Area of Hypothesis
To prove the distribution of test statistic V, we first prove and . The proofs are presented in Theorems 8 and 9 as follows.
Theorem 8. If S is a matrix given by Lemma 6 then statistic is
Proof. To prove this Lemma, the following steps are taken.
Matrix S is shown which is a symmetric and idempotent matrix as follows:
Theorem 9. If D(ui, vi) is a matrix given by Lemma 6 then statistic is
Proof. Based on (24), we obtain
Corollary 10. If statistic V is given by Theorem 7, then
Proof. Based on Theorem 8, statistic is obtained:
The critical area for the model conformity hypothesis is derived which is given by Lemma 11.
Lemma 11. If given test statistic V is as in Theorem 7, then the critical area for H0 is given by
Proof. Based on Theorem 7, the following relationship is obtained:
After finding the hypothesis test formulation, the suitability of the model between the truncated spline nonparametric regression model which is influenced by spatial heterogeneity and nonparametric regression (global) will then be implemented on unemployment rate data in 38 regions in Java Indonesia.
7. Empirical Study on Unemployment Rate in Java Indonesia
7.1. Description of Research Data
In this study, the nonparametric truncated spline regression model influenced by spatial heterogeneity was applied to Open Unemployment Rate (OUR) data in province of Java, Indonesia, and some predictor variables that were suspected to affect it, i.e., population density (X1), percentage of the poor (X2), percentage of population with low education (X3), percentage of population working in agriculture sector (X4), area of agricultural land (X5), economic growth rate (X6), regional minimum wage (X7), and ratio number of large industries being number of labor force (X8). The amount of data used is 382 from 38 provinces and 8 predictor variables. Table 1 shows the description of our research data and the predictor variables.
Variable | Data | Minimum | Maximum | Rata-rata | Standard Deviation |
---|---|---|---|---|---|
Y | 38 | 0,61 | 8,59 | 4,3939 | 1,81385 |
X1 | 38 | 4,59 | 25,80 | 12,0963 | 4,99263 |
X2 | 38 | 0,2105 | 7,6445 | 2,631579 | 1,6160512 |
X3 | 38 | 0,07 | 7,19 | 5,5747 | 1,22772 |
X4 | 38 | 35,10 | 33.548,70 | 3.378,7658 | 6.497,23435 |
X5 | 38 | 851582 | 2507632 | 1516816.21 | 420415.074 |
X6 | 38 | 0,0074281 | 0,1640892 | 0,052024430 | 0,0474488599 |
X7 | 38 | 0,0405726 | 12,4704588 | 2,631578947 | 2,8535682531 |
X8 | 38 | 474 | 85122 | 28730,32 | 22372,865 |
- Source: BPS (2017a, 2017b, and 2017c).
The spread of Open Unemployment Rate in East Java is shown by Figure 1. It shows the percentage of East Java unemployment rate in 2015.

7.2. Spatial Heterogeneity Test
Each region has different characteristics and different parameters, as well as different functional forms; this is what proves spatial aspect. Breusch-Pagan testing is used to see the spatial heterogeneity of each location. Table 2 shows the Breusch-Pagan test.
Test | Significance Value | Decision |
---|---|---|
Breusch-Pagan | 0.002414 | Reject H0 |
Since spatial effect testing is fulfilled, i.e., there are effects of spatial heterogeneity, then the case can be solved by using the point approach. Furthermore, an analysis was performed using nonparametric truncated spline regression model influenced by spatial heterogeneity.
7.3. Model Conformity Test
-
H0: βpk(ui, vi) = βpk and δp,m+h(ui, vi) = δp,m+h,
-
p = 1,2, …, 8; k = 1; h = 1,2, 3; i = 1,2, …, 38
-
-
H1: at least, there is one of βpk(ui, vi) ≠ βpk or δpm+h(ui, vi) ≠ δp,m+h,
-
p = 1,2, …, 8; k = 1; h = 1,2, 3; i = 1,2, …, 38
-
The modeling application used Open Unemployment Rate (OUR) data in 38 districts/cities in East Java. The results of the empirical study showed that the OUR data has a geographical influence, namely, spatial heterogeneity, and based on the results of the model conformity hypothesis test, the appropriate model used is a multivariable nonparametric truncated spline regression model influenced by spatial heterogeneity with the weighted Gaussian kernel function. The modeling produced a coefficient of determination of 80.42%.
8. Conclusion
- (1)
The hypotheses for model conformity between multivariate nonparametric truncated spline regression model influenced by spatial heterogeneity and nonparametric truncated spline regression (global) are as follows:
-
H0: βpk(ui, vi) = βpk and δp,m+h(ui, vi) = δp,m+h,
-
p = 1,2, …, l; k = 1,2, …, m;h = 1,2, …, r; i = 1,2, …, n
-
-
H1: at least, there is one of βpk(ui, vi) ≠ βpk or δpm+h(ui, vi) ≠ δp,m+h,
-
p = 1,2, …, l; k = 1,2, …, m;h = 1,2, …, r; i = 1,2, …, n
-
-
-
Test statistic derived using Maximum Likelihood Ratio Test (MLRT) is obtained as follows:
(88) - (2)
The distribution of multivariate nonparametric truncated spline regression model influenced by spatial heterogeneity is as follows:
(89) -
with level of significance α; therefore H0 is rejected if
(90)
Conflicts of Interest
The authors of this work declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors thank The Ministry of Research, Technology and Higher Education, Republic of Indonesia/Kementerian Riset, Teknologi dan Pendidikan Tinggi Republik Indonesia (Kemenristekdikti RI) for funding this work.
Open Research
Data Availability
The data of 38 districts/cities in East Java used to support the findings of this study have been deposited in https://www.bps.go.id/.