Volume 2013, Issue 1 687151

Research Article

Open Access

Remodeling and Estimation for Sparse Partially Linear Regression Models

Yunhui Zeng

Shandong University Qilu Securities Institute for Financial Studies and School of Mathematical Science, Shandong University, Jinan 250100, China sdu.edu.cn

Supercomputing Center, Shandong Computer Science Center, Jinan 250014, China

Search for more papers by this author

Xiuli Wang,

Xiuli Wang

College of Mathematics Science, Shandong Normal University, Jinan 250014, China sdnu.edu.cn

Search for more papers by this author

Lu Lin,

Corresponding Author

Lu Lin

[email protected]

Shandong University Qilu Securities Institute for Financial Studies and School of Mathematical Science, Shandong University, Jinan 250100, China sdu.edu.cn

Search for more papers by this author

Yunhui Zeng,

Yunhui Zeng

Shandong University Qilu Securities Institute for Financial Studies and School of Mathematical Science, Shandong University, Jinan 250100, China sdu.edu.cn

Supercomputing Center, Shandong Computer Science Center, Jinan 250014, China

Search for more papers by this author

Xiuli Wang,

Xiuli Wang

College of Mathematics Science, Shandong Normal University, Jinan 250014, China sdnu.edu.cn

Search for more papers by this author

Lu Lin,

Corresponding Author

Lu Lin

[email protected]

Shandong University Qilu Securities Institute for Financial Studies and School of Mathematical Science, Shandong University, Jinan 250100, China sdu.edu.cn

Search for more papers by this author

First published: 06 February 2013

https://doi.org/10.1155/2013/687151

Academic Editor: Xiaodi Li

Share a link

Email
Wechat
Bluesky

Abstract

When the dimension of covariates in the regression model is high, one usually uses a submodel as a working model that contains significant variables. But it may be highly biased and the resulting estimator of the parameter of interest may be very poor when the coefficients of removed variables are not exactly zero. In this paper, based on the selected submodel, we introduce a two-stage remodeling method to get the consistent estimator for the parameter of interest. More precisely, in the first stage, by a multistep adjustment, we reconstruct an unbiased model based on the correlation information between the covariates; in the second stage, we further reduce the adjusted model by a semiparametric variable selection method and get a new estimator of the parameter of interest simultaneously. Its convergence rate and asymptotic normality are also obtained. The simulation results further illustrate that the new estimator outperforms those obtained by the submodel and the full model in the sense of mean square errors of point estimation and mean square prediction errors of model prediction.

1. Introduction

Consider the following partially linear regression model:

()

where Y is a scalar response variable, X and Z are, respectively, p-dimensional and q-dimensional continuous-valued covariates with p being finite and p ≪ q, β is the parameter vector of interest and γ is the nuisance parameter vector which is supposed to be sparse in the sense that ∥γ∥₂ is small, f(·) is an unknown function satisfying Ef(U) = 0 for identification, ε is the random error satisfying E(ε∣X, Z, U) = 0. For simplicity, we assume that U is univariate. Let (Y_i, X_i, Z_i, U_i), i = 1, …, n, be i.i.d. observations of (Y, X, Z, U) obtained from the above model.

A feature of the model is that the parametric part contains both the parameter vector of interest and nuisance parameter vector. The reason for this coefficient separation is as follows. In practice we often use such a model to distinguish the main treatment variables of interest from the state variables. For instance, in a clinical trial, X consists of treatment variables and can be easily controlled, Z is a vector of many clinical variables, such as patient ages and body weights. The variables in Z may have an impact on Y but are not of primary interest and the effects may be small. In order to make up for potentially nonnegligible effects on the response Y, the nuisance covariate Z are introduced into model (1); see Shen et al. [1]. Model (1) contains all relevant covariates and in this paper we call it full model.

The purpose of this paper is to estimate β, the parameter of interest, when γ^TZ is removed from the model. The main idea is remodeling based on the following working model:

()

As is known, E(η∣X = x, U = u) is a nonzero function if γ^TE(Z∣X, U) ≠ 0, which relies on two elements, one is E(Z∣X, U), related with the correlation between the covariates of Z and (X, U), the other is γ, determined by the nuisance parameter in the removed part. Thus the least squares estimator based on model (2) may be inconsistent. In the following, we will make use of the above two elements. Specifically, in the first stage, we shall construct a remodeled model by a multistep-adjustment to correct the submodel bias based on the correlation information between the covariates. This adjustment is motivated by Gai et al. [2]. In the paper, they proposed a nonparametric adjustment by adding a univariate nonparametric estimation to the working model (2), and it can dramatically reduce the bias of the working model. But this only holds in a subset of the covariates, although the subset may be fairly large. In order to obtain a globally unbiased working model for linear regression model, Zeng et al. [3] adjusted the working model by multiple steps. Because only those variables in Z correlated with (X, U) may have impact on estimation of β, in each step a univariate nonparametric part was added to the working model and consequently a globally unbiased working model was obtained.

However, when many components of Z are correlated with (X, U), the number of nonparametric functions added in the above working model is large. Such a model is improper in practice. Thus, in the second stage, we further simplify the above adjusted model by a semiparametric variable selection procedure proposed by Zhao and Xue [4]. Their method can select significant parametric and nonparametric components simultaneously under sparsity condition for semiparametric varying coefficient partially linear models. The relevant papers include Fan and Li [5], Wang et al. [6, 7], among others. After two-stage remodeling, the final model is conditionally unbiased. Based on this model, the estimation and model prediction are significantly improved.

The rest of this paper is organized as follows. In Section 2, a multistep adjustment and remodeled models are firstly proposed, then the models are further simplified via the semiparametric SCAD variable selection procedure. A new estimator of the parameter of interest based on the simplified model is derived, its convergence rate and asymptotic normality are also obtained. Simulations are given in Section 3. A short conclusion and some remarks are contained in Section 4. Some regular conditions and theoretical proofs are presented in the appendix.

2. New Estimator for the Parameter of Interest

In this paper, we suppose that covariate Z has zero mean, p is finite and p ≪ q, E(ε∣X, Z, U) = 0 and Var (ε∣X, Z, U) = σ². We also assume that covariates X and U and parameter β are prespecified, so that the submodel (2) is a fixed model.

2.1. Multistep-Adjustment by Correlation

In this subsection, we first adjust the submodel to be conditionally unbiased by a multistep-adjustment.

When Z is normally distributed, the principal component analysis (PCA) method will be used. Let Σ_Z be the covariance matrix of Z, then there exists an orthogonal q × q matrix Q such that QΣ_ZQ^T = Λ, where Λ is the diagonal matrix diag (λ₁, λ₂, …, λ_q) with λ₁ ≥ λ₂ ≥ ⋯≥λ_q ≥ 0 being eigenvalues of Σ_Z. Denote Q^T = (τ₁, τ₂, …, τ_q) and .

When Z is centered but nonnormally distributed, we shall apply independent component analysis (ICA) method. Assume that Z is generated by a nonlinear combination of independent components , that is , where F(·) is an unknown nonlinear mapping from R^q to R^q, is an unknown random vector with independent components. By imposing some constraints on the nonlinear mixing mapping F or the independent components , the independent components can be properly estimated. See Simas Filho and Seixas [8] for an overview of the main statistical principles and some algorithms for estimating the independent components. For simplicity, in this paper we suppose that Z = (Z⁽¹⁾, …, Z^(q)) ^T with , l = 1, …, q, and F_lj(·) are scalar functions.

In the above two cases, ′s are independent of each other. Set K₀ to be the size of set . Without loss of generality, let M₀ = {1, …, K₀}.

We construct the following adjusted model:

()

where

, j = 1, …, K₀ and

. The model (3) is based on Z′s population and depends on the distributions of X, U and Z. It is easy to see that model (3) is conditionally unbiased, that is,

The adjusted model (3) is an additive partially linear model, in which β^TX is the parametric part, f(U) and , j = 1, …, K₀, are the nonparametric parts and is the random error. Compared with the submodel (2), the nonparametric parts , j = 1, …, K₀, may be regarded as bias-corrected terms for the random error η. For centered Z, , j = 1, …, K₀, the nonparametric components can be properly identified. In fact, centered Z can be relaxed to any Z such that satisfies γ^TE(Z) = 0.

When Z is centered and normally distributed, the nonparametric parts

, j = 1, …, K₀. So the multistep adjusted model (3) is really a partially linear model

()

with

and

. Specially, when f(U) ≡ 0, the full model is a linear model, the multistep adjusted model is also a linear model

()

But when the variables in Z are not jointly normal, the nonparametric parts g_j can be highly nonlinear, which are similar to the results of marginal regression; see Fan et al. [9].

2.2. Model Simplification

When the most of the features in the full model are correlated, then K₀ is very large and even is close to q. In this case, the adjusted model (3) is improper in practice, so we shall use the group SCAD regression procedure, proposed by Wang et al. [6], and the semiparametric variable selection procedure, proposed by Zhao and Xue [4], to further simplify the model.

Let s = |ℳ_*| with

, and assume that the model (3) is sparse, that is, s is small. We define the semiparametric penalized least squares as

()

where

, and p_λ(·) is the SCAD penalty function with λ being a tuning parameter defined as

()

with a > 2, w > 0 and p_λ(0) = 0. In (6), g(·) denotes the set

. Because g_j are nonparametric functions, thus they cannot be directly applied for minimization. Here we will replace f(·) and g(·) by basis function approximations. For 1 ≤ j ≤ K₀, let {Ψ_jk, k = 1, …, L} be orthogonal basis functions satisfying

()

where

is the density function of

. Similarly, let {Ψ_0k, k = 1, …, L} be orthogonal basis functions satisfying the above condition which is only replaced by the support and density function of U. Denote

. Then

and f(U) can be approximated by

()

Denote

, invoking that

the identity matrix, we get

()

where

, Ψ_0i ≡ Ψ₀(U_i).

Denote by , and the least squares estimators based on the penalized function (10), that is . Let and , then is an estimator of , is an estimator of f(U).

Let

and

. For simplicity, we assume that ℳ_* = {1,2, …, s} and

. So we get the following simplified working model

()

where

, j = 1, …, K_n and

. Under the assumption of sparsity, the model (11) contains all of significant nonparametric functions and fully utilizes both the correlation information of covariates and the model sparsity on nuisance covariate.

If Z is centered and normally distributed with covariance matrix Σ_Z = I_q the identity matrix, then τ_j = e_j, j = 1, …, q, where e_j denotes the unit vector with 1 at position j, and α is sparse with α_j = γ^Tτ_j = γ_j. So the model (4) is sparse. For model (5), the special case of model (4), we can apply the SCAD penalty method proposed by Fan and Li [5] to select variables in

and estimate parameters α and β simultaneously. The selected covariate and the corresponding parameter are denoted by

and

, the resulting parameter estimators are denoted by

and

, respectively. Finally, we can use the simplified model

()

for model prediction. Under the condition of sparsity, its model size is much smaller than those of the multistep adjusted model (5) and the full model (1).

2.3. Asymptotic Property of Point Estimator

Let β₀, θ₀, ν₀, and g_j0(·), f₀(·) be the true values of β, θ, ν, and g_j(·), f(·), respectively, in model (3). Without loss of generality, we assume that , j = s + 1, …, K₀, and , j = 1, …, s, are all nonzero components.

We suppose that , j = 1, …, K₀ can be expressed as and f(U) can be expressed as , θ_j and ν belong to the Sobolev ellipsoid .

The following theorem gives the consistency of the penalized SCAD estimators.

Theorem 1. Suppose that the regularity conditions (C1)–(C5) in the appendix hold and the number of terms L = O_p(n^1/(2r+1)). Then,

(i)
,
(ii)
,
(iii)
,

where .

From the last paragraph of Section 2.2 we know that, for linear regression model and normally distributed Z, the multistep adjusted model (5) is a linear model. By orthogonal basis functions, such as power series, we have r = ∞, then , implying the estimator has the same convergence rate as that of the SCAD estimator in Fan and Li [5].

Theorem 2. Suppose that the regularity conditions (C1)–(C6) in the appendix hold and the number of terms L = O_p(n^1/(2r+1)). Let λ_max = max _j{λ_j} and λ_min = min _j{λ_j}. If λ_max → 0 and n^r/(2r+1)λ_min → ∞ as n → ∞, then, with probability tending to 1, , j = s + 1, …, K₀.

Remark 3. By Remark 1 of Fan and Li [5], we have that, if λ_max → 0 as n → ∞, then a_n → 0. Hence from Theorems 1 and 2, by choosing proper tuning parameters, the variable selection method is consistent and the estimators of nonparametric components achieve the optimal convergence rate as if the subset of true zero coefficients was already known; see Stone [10].

Let

be the nonzero components of θ, corresponding covariates are denoted by

. In addition, let

()

where

for homoscedastic case,

Theorem 4. Suppose that the regularity conditions (C1)–(C6) in the appendix hold and the number of terms L = O_p(n^1/(2r+1)). If Σ is invertible, then

()

where “

” denotes the convergence in distribution.

Remark 5. From Theorems 1 and 4, it can be found that the penalized estimators have the oracle property. Furthermore, the estimator of the parameter of interest has the same asymptotic distribution as that based on the correct submodel.

2.4. Some Issues on Implementation

In the adjusted model (4), τ_j, j = 1, …, K₀ are used. When the population distribution is not available, they need to be approximated by estimators. When Z is normally distributed and eigenvalues r_j, j = 1, …, q of the covariance matrix Σ_Z are different from each other, then is asymptotically N(0, V_j) with , where u_j is the jth eigenvector of with ; see Anderson [11]. For the case when the population size is large and comparable with the sample size, if the covariance matrix is sparse, we can use the method in Rütimann and Bühlmann [12] or Cai and Liu [13] to estimate the covariance matrix. So we can use u_j to approximate τ_j. When τ_j in model (4) are replaced by these consistent estimators, one can see that the approximation error can be neglected without changing the asymptotic property.

The nonparametric parts in the adjusted model depend on the univariate variable , for l = 1, …, K₀. So it needs to choose the steps K₀ firstly. In real implementation, we compute all the q multiple correlation coefficients of (l = 1, …, q) with X and U. Then we choose the components for given small number δ > 0, where mcorr(u, V) denotes the multicorrelation coefficient between u and V and can be approximated by its sample form; see Anderson [11].

There are some tuning parameters needing to choose in order to implement the two-stage remodeling procedure. Fan and Li [5] showed that the SCAD penalty with a = 3.7 performs well in a variety of situations. Hence, we use their suggestion throughout this paper. We still need to choose the positive integer L for basis functions and the tuning parameter λ_j of the penalty functions. Similar to the adaptive lasso of Zou [14], we suggest taking , where is initial estimator of θ_j by using ordinary least squares method based on the first term in (10). So the two remaining parameters L and λ can be selected simultaneously using the leave-one-out CV or GCV method; see Zhao and Xue [4] for more details.

3. Simulation Studies

In this section, we investigate the behavior of the newly proposed method by simulation studies.

3.1. Linear Model with Normally Distributed Covariates

The dimensions of the full model (1) and the submodel (2) are chosen to be 100 and 5, respectively. We set β = (0.5,3.5,2.5,1.5,4.0)^T and , where γ₂ ~ Unif[−0.5,0.5]³⁰, a 30-dimensional uniform distribution on [−0.5,0.5]³⁰, and γ₁ is chosen in the following ways:

Case (I). γ₁ ~ Unif[0.5,1.0]¹⁰.

Case (II). γ₁ = (1.0,1.0,1.0,1.5,1.5,1.5,2.0,2.0,2.0,2.0).

We assume that

, where

()

with c = 0.5 or c = 0.8. The error term ε is assumed to be normally distributed as N(0,0.3²).

Here we denote the submodel (2) as model (I), the multistep adjusted linear model (5) as model (II), the two-stage model (12) as model (III), and the full model (1) as model (IV). We compare mean square errors (MSEs) of the new two-stage estimator based on model (III) with the estimator based on model (I), the multistep estimator based on model (II), the SCAD estimator and the least squares estimator based on model (IV). We also compare mean square prediction errors (MSPEs) of the above mentioned models with corresponding estimators.

The data are simulated from the full model (1) with sample size n = 100 and simulation times m = 1000. We use the sample-based PCA approximations to substitute τ_j′s. The parameter a in the SCAD penalty function is set to be 3.7 and λ is selected by leave-one-out CV method.

Table 1 reports the MSEs of point estimators on the parameter β and the MSPEs of model predictions. From the table, we have the following findings: (1) has the largest MSEs and takes the second place, nearly all the new estimator has the smallest MSEs. (2) When c = 0.5, the MSEs of are smaller than those of , while when c = 0.8 they are larger than those of . These show that if the correlation between the covariates is strong, the MSEs of are larger than those of , the multistep-adjustment is necessary, so the estimations and model predictions based on two-stage model are significantly improved. (3) In case (I) and (II) the simulation results have the similar performance. (4) Similar to the trend of the MSEs of the five estimators, the MSPE of the two-stage adjusted model is the smallest among the mentioned five models.

Table 1. MSEs on the parameter β and MSPEs of the two-stage adjusted linear model (12) compared with the submodel, the SCAD-penalized model, the multistep adjusted model and the full model.

No.	Item
		0.3079	0.0457	0.0660	0.0571	1.6105 × 10³
		0.1763	0.0206	0.0346	0.0176	1.0940 × 10³
Case (I)	MSEs	0.1396	0.0481	0.0631	0.0461	4.2049 × 10³
c = 0.5		0.1870	0.0196	0.0349	0.0186	5.0183 × 10³
		0.1131	0.0517	0.0609	0.0430	6.2615 × 10³
	MSPEs	3.4780	1.1896	1.6512	1.0679	3.0499 × 10²

		0.1568	0.6191	0.0934	0.0826	1.2494 × 10³
		0.6239	0.1060	0.0090	0.0083	1.0456 × 10²
Case (I)	MSEs	0.8829	0.8173	0.0895	0.1039	2.6368 × 10²
c = 0.8		0.5882	0.0919	0.0107	0.0100	7.6452 × 10¹
		1.0799	0.9829	0.0961	0.0929	1.1610 × 10³
	MSPEs	4.7930	2.6700	0.8354	0.7771	1.3223 × 10²

		0.4272	0.0660	0.0849	0.0557	4.3002 × 10²
		0.6371	0.0318	0.0499	0.0295	3.7893 × 10³
Case (II)	MSEs	0.4560	0.0715	0.0927	0.0588	1.2784 × 10³
c = 0.5		0.5926	0.0306	0.0491	0.0287	6.7354 × 10³
		0.9052	0.0734	0.0874	0.0583	2.5047 × 10²
	MSPEs	6.8634	1.5096	2.0780	1.2077	5.0464 × 10³

		0.6764	0.4263	0.1212	0.0960	1.3904 × 10³
		0.9721	0.1060	0.0107	0.0102	4.0743 × 10²
Case (II)	MSEs	0.6242	0.4756	0.1146	0.1003	1.0498 × 10³
c = 0.8		1.0282	0.0954	0.0112	0.0098	5.6031 × 10²
		1.3420	0.5474	0.1341	0.1124	9.9632 × 10²
	MSPEs	7.9928	2.1165	0.9514	0.8469	2.3110 × 10²

In summary, Table 1 indicates that the two-stage adjusted linear model (12) performs much better than the full model, and better than the submodel, the SCAD-penalized model and the multistep adjusted model.

3.2. Partially Linear Model with Nonnormally Distributed Covariates

The dimensions of the linear part in the full model (1) and the submodel (2) are chosen to be 50 and 5, respectively. We set β = (0.5,3.5,2.5,1.5,4.0)^T,

, f(u) = u²*sin(3u), where

γ₁ = (0.5,0.1,0.8,0.2,0.5,0.2,0.6,0.5,0.1,0.9),
γ₂ ~ Unif[−0.3,0.3]¹⁰, a 10-dimensional uniform distribution on [−0.3,0.3].

We assume that the covariates are distributed in the following two ways.

Case (I).

, a 51-dimensional student distribution with degree of freedom df = 5, where

()

Case (II). X = (1/(1 + c))(W₁ + cV), with Z₁ = (1/(1 + c))(W₂ + cV), Z₂ = W₃, Z₃ = (1/(1 + c))(W₄ + cV), Z₄ = W₅, , where W₁, W₂, W₃, W₄ ~ Unif[−1.0,1.0]⁵, W₅ ~ Unif[−1.0,1.0]³⁰, V ~ Unif[−1.0,1.0]⁵, uniform distributions on [−1.0,1.0] and constant c = 0.1. All W₁, W₂, W₃, W₄, W₅, and V are independent.

The error term ε is assumed to be normally distributed as N(0,0.3²).

Here we denote the submodel (2) as model (I)′, the multistep adjusted additive partially linear model (3) as model (II)′, the two-stage model (11) as model (III)′ and the full model (1) as model (IV)′. We compare mean square errors (MSEs) of the new two-stage estimator based on model (III)′ with the estimator based on model (I)′, the estimator based on model (II)′ and the least squares estimator based on model (IV)′. We also compare the mean average square errors (MASEs) of the nonparametric estimators of f(·) and the mean square prediction errors (MSPEs) of different models with corresponding estimators.

The data are simulated from the full model (1) with sample size n = 100 and simulation times m = 500. We use the sample-based approximations of ICA, see Hyvärinen and Oja [15]. The parameter a in the SCAD penalty function is set to be 3.7, the number L and the parameter λ is selected by GCV method. We use the standard Fourier orthogonal basis as the basis functions.

Table 2 reports the MSEs of point estimators on the parameter β, the MASEs of f(·) and the MSPEs of model predictions. From the table, we have the following results: (1) has the largest MSEs, its MSEs are much larger than the MSEs of the other estimators, and the new estimator always has the smallest MSEs. (2) The MASEs of f(·) have similar trend to the MSEs of the four estimators, while the differences are not very noticeable. (3) Similar to the MSEs of the estimators, the MSPEs of the two-stage adjusted model are the smallest among the four models. (4) In Case (II), the simulation results of models (I)′, (II)′ and (III)′, perform a little better than those in Case (I) because of the correlation structure among the covariates.

Table 2. MSEs on the parameter β, MASEs of f(·) and MSPEs of the two-stage adjusted model (11) compared with the submodel, multistep adjusted model and the full model.

No.	Item
		0.4352	5.0403	0.3267	2.9753 × 10¹
		0.6859	1.2820 × 10¹	0.3328	1.4593 × 10¹
	MSEs	1.1152	8.1542	0.3723	1.4391 × 10¹
Case (I)		1.8489	7.2055	1.3194	2.4036 × 10¹
		3.3079	1.6144 × 10¹	1.9989	4.8575 × 10¹
	MASEs	3.0887	5.9814	3.0175	3.0633
	MSPEs	4.6047	7.0331 × 10¹	3.5536	3.9648

		0.0377	0.6144	0.0191	—¹
		0.0449	1.0876	0.0305	—
	MSEs	0.0332	3.7510	0.0246	—
Case (II)		0.0396	0.4324	0.0238	—
		0.0512	1.1995	0.0335	—
	MASEs	0.4722	0.5220	0.4126	0.4380
	MSPEs	0.9221	9.3068	0.8053	—

¹“—” denotes the algorithm collapsed and returned no value.

In summary, Table 2 indicates that the two-stage adjusted model (11) performs much better than the full model and the multistep adjusted model, and better than the submodel.

4. Some Remarks

In this paper, the main objective is to consistently estimate the parameter of interest β. When estimating the parameter of interest, its bias is mainly determined by the relevant variables, and its variance may be impacted by other variables. Because variable selection much relies on the sparsity of the parameter, when we directly consider the partially linear model, some irrelevant variables with nonzero coefficients may be selected in the final model. This may affect the estimation of the parameter β on its efficiency and stability. Thus based on the prespecified submodel, a two-stage remodeling method is proposed. In the new remodeling procedure, the correlation among the covariates (X, Z) and the sparsity of the regression structure are fully used. So the final model is sufficiently simplified and conditionally unbiased. Based on the simplified model, the estimation and model prediction are significantly improved. Generally, after the first stage the adjusted model is an additive partially linear model. Therefore, the remodeling method can be applied to partially linear regression model with linear regression model as a special case.

From the remodeling procedure, we can see that it can be directly applied to additive partially linear model, in which the nonparametric function f(U) has component-wise additive form. As for general partially linear model with multivariate nonparametric function, we should resort to multivariate nonparametric estimation method. If the dimension of covariate U is high, it may be faced with “the curse of dimensionality”.

In the procedure of model simplification, orthogonal series estimation method is used. This is only for technical convenience, because the semiparametric penalized least squares (6) can be easily transformed into parametric penalized least squares (10) and then the theoretic results are obtained. Although other nonparametric methods such as kernel and spline can be used without any essential difficulty, they can not directly achieve this goal. Compared with kernel method, it is somewhat difficult for series method to establish the asymptotic normality result for the nonparametric component f(U) under primitive conditions.

Acknowledgment

Lin and Zeng’s research are supported by NNSF projects (11171188, 10921101, and 11231005) of China, NSF and SRRF projects (ZR2010AZ001 and BS2011SF006) of Shandong Province of China and K C Wong-HKBU Fellowship Programme for Mainland China Scholars 2010-11. Wang’s research is supported by NSF project (ZR2011AQ007) of Shandong Province of China.

Appendix

A. Some Conditions and Proofs

A.1. Regularity Conditions (C1)–(C6)

(C1)
has finite nondegenerate compact support, denoted as .
(C2)
The density function r_j(t) of and r₀(t) of U satisfies 0 < L₁ ≤ r_j(t) ≤ L₂ < ∞ on its support for 0 ≤ j ≤ K₀ for some constants L₁ and L₂, and it is continuously differentiable.
(C3)
and are continuous. For given and u, is positive definite, and its eigenvalues are bounded.
(C4)
, Ef(U) = 0, the first two derivatives of f(·) are Lipschitz continuous of order one.
(C5)
as n → ∞.
(C6)
for j = s + 1, …, K₀ where s satisfies for 1 ≤ j ≤ s; for s < j ≤ K₀.

Conditions (C1)–(C3) are some regular constraints on the covariates and condition (C4) is some constraints on the regression structure as those in Härdle et al. [16]. Conditions (C5)-(C6) are assumptions on the penalty function which are similar to those used in Fan and Li [5] and Wang et al. [7].

A.2. Proof for Theorem 1

Let δ = n^−r/(2r+1) + a_n, β = β₀ + δT₁, θ = θ₀ + δT₂, ν = ν₀ + δT₃ and . Firstly, we shall prove that, ∀ϵ > 0, ∃C > 0, P{inf _∥T∥=CF(β, θ, ν) > F(β₀, θ₀, ν₀)} ≥ 1 − ϵ.

Denote D(β, θ, ν) = L⁻¹{F(β, θ, ν) − F(β₀, θ₀, ν₀)}, then we have

()

where

with

, j = 1, …, K₀ and R₀(U_i) = f(U_i) − ν^TΨ₀(U_i).

By the conditions (C1) and (C2), the maximal squared bias of

is equal to

()

. Similarly, ∥R₀(U)∥ = O(L^−r). Then,

()

Noticing that

, by Zhao and Xue [4], we have

()

Similarly, we have

()

By properly choosing a sufficiently large C, I₂ dominates I₁ uniformly in ∥T∥ = C.

Using Taylor expansion,

()

By simple calculations, we have that

()

where l₁ and l₂ are some positive constants. We can find that I₃₁ is also dominated by I₂ uniformly in ∥T∥ = C, and under the condition (C5), we have

()

Hence, by choosing a sufficiently large C, P{inf _∥T∥=CF(β, θ, ν) > F(β₀, θ₀, ν₀)} ≥ 1 − ϵ, which implies that with probability at least 1 − ϵ there exists a local minimum of F(β, θ, ν) in the ball {β₀ + δT₁ : ∥T₁∥≤C}. Denote the local minimizer as

, then

()

With the same argument as above, there exists a local minimum in the ball {θ₀ + δT₂ : ∥T₂∥≤C}, and the local minimizer

satisfies that

()

For the nonparametric component g(·), noticing that

()

it is known that

, so

()

Thus, we get

()

Similarly, there exists a local minimizer satisfies that . Then we can get .

A.3. Proof for Theorem 2

When λ_max → 0, a_n = 0 for large n by the form of

. Then by Theorem 1, it is sufficient to show that: with probability tending to 1 as n → ∞, for any β, it satisfies ∥β − β₀∥ = O_p(n^−r/(2r+1)), θ_j satisfies ∥θ_j − θ_j0∥ = O_p(n^−r/(2r+1)) with j = 1, …, s, and ν satisfies ∥ν − ν₀∥ = O_p(n^−r/(2r+1)), for some small ι_n = Cn^−r/(2r+1),

()

So the minimizer of F(β, θ, ν) is obtained at θ_j = 0, j = s + 1, …, K₀.

In fact,

()

Under the conditions and λ_jn^r/(2r+1) > λ_minn^r/(2r+1) → ∞, then ∂F(β, θ, ν)/∂θ_j = O_p(nλ_j(θ_j/∥θ_j∥₂)). So the sign of the derivative is determined by θ_j.

So with probability tending to 1, , j = s + 1, …, K₀. Then under , , j = s + 1, …, K₀.

A.4. Proof for Theorem 4

By Theorems 1 and 2, we know that, as n → ∞, with probability tending to 1, F(β, θ, ν) attains the local minimum value at

and

. Let F_1n(β, θ, ν) = ∂F(β, θ, ν)/∂β, F_2n(β, θ, ν) = ∂F(β, θ, ν)/∂θ^* and F_3n(β, θ, ν) = ∂F(β, θ, ν)/∂ν, then

()

From (A.17), it yields that

()

where

. Applying the Taylor expansion, we get

()

Furthermore, condition (C5) implies that

, and noting that

as λ_max → 0, then

. So from (A.18), it yields

()

Let

and

, then we have

()

Substituting (A.23) into (A.20), it yields

()

where

From (A.19), it yields that

()

Substituting (A.23) into (A.25), it yields

()

Noting that

()

Equation (A.26) can be rewritten as

()

where

. Let

, then we have

()

Substituting (A.29) into (A.24), and noting that

()

it is easy to show that

()

where

Using the Central Limit Theorem, we can obtain

()

where “

” means the convergence in distribution and

()

In addition, noting that

and

, we have I₂ = 0. Furthermore, we have

()

Invoking

, then by Zhao and Xue [4], we have

()

This together with

and

, we get I₃₁ = o_p(1). Similarly, I₃₂ = o_p(1). Noting that

, so as above, we have I₃₃ = o_p(1). Hence, we get that I₃ = o_p(1).

By the law of large numbers, we have , where “” means the convergence in probability. Then using the Slutsky theorem, we get .

References

1 Shen X. T., Huang H.-C., and Ye J., Inference after model selection, Journal of the American Statistical Association. (2004) 99, no. 467, 751–762, https://doi.org/10.1198/016214504000001097, MR2090908, ZBL1117.62423.
10.1198/016214504000001097
Web of Science® Google Scholar
2 Gai Y., Lin L., and Wang X., Consistent inference for biased sub-model of high-dimensional partially linear model, Journal of Statistical Planning and Inference. (2011) 141, no. 5, 1888–1898, https://doi.org/10.1016/j.jspi.2010.11.041, MR2763218, ZBL1209.62071.
10.1016/j.jspi.2010.11.041
Web of Science® Google Scholar
3 Zeng Y., Lin L., and Wang X., Multi-step-adjustment consistent inference for biased sub-model of multidimensional linear regression, Acta Mathematica Scientia. (2012) 32, no. 6, 1019–1031.
Google Scholar
4 Zhao P. and Xue L., Variable selection for semiparametric varying coefficient partially linear models, Statistics & Probability Letters. (2009) 79, no. 20, 2148–2157, https://doi.org/10.1016/j.spl.2009.07.004, MR2572046, ZBL1171.62026.
10.1016/j.spl.2009.07.004
Web of Science® Google Scholar
5 Fan J. and Li R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association. (2001) 96, no. 456, 1348–1360, https://doi.org/10.1198/016214501753382273, MR1946581, ZBL1073.62547.
10.1198/016214501753382273
Web of Science® Google Scholar
6 Wang L., Chen G., and Li H., Group SCAD regression analysis for microarray time course gene expression data, Bioinformatics. (2007) 23, no. 12, 1486–1494.
10.1093/bioinformatics/btm125
CAS PubMed Web of Science® Google Scholar
7 Wang L., Li H., and Huang J. Z., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, Journal of the American Statistical Association. (2008) 103, no. 484, 1556–1569, https://doi.org/10.1198/016214508000000788, MR2504204.
10.1198/016214508000000788
CAS PubMed Web of Science® Google Scholar
8 Simas Filho E. F. and Seixas J. M., Nonlinear independent component analysis: theoretical review and applications, Learning and Nonlinear Models. (2007) 5, no. 2, 99–120.
10.21528/LNLM-vol5-no2-art3
Google Scholar
9 Fan J., Feng Y., and Song R., Nonparametric independence screening in sparse ultra-high-dimensional additive models, Journal of the American Statistical Association. (2011) 106, no. 494, 544–557, https://doi.org/10.1198/jasa.2011.tm09779, MR2847969, ZBL1232.62064.
10.1198/jasa.2011.tm09779
CAS PubMed Web of Science® Google Scholar
10 Stone C. J., Optimal global rates of convergence for nonparametric regression, The Annals of Statistics. (1982) 10, no. 4, 1040–1053, MR673642, https://doi.org/10.1214/aos/1176345969, ZBL0511.62048.
10.1214/aos/1176345969
Web of Science® Google Scholar
11 Anderson T. W., An Introduction to Multivariate Statistical Analysis, 2003, 3rd edition, John Wiley & Sons, MR1990662.
Google Scholar
12 Rütimann P. and Bühlmann P., High dimensional sparse covariance estimation via directed acyclic graphs, Electronic Journal of Statistics. (2009) 3, 1133–1160, https://doi.org/10.1214/09-EJS534, MR2566184.
10.1214/09-EJS534
Web of Science® Google Scholar
13 Cai T. and Liu W. D., Adaptive thresholding for sparse covariance matrix estimation, Journal of the American Statistical Association. (2011) 106, no. 494, 672–684, https://doi.org/10.1198/jasa.2011.tm10560, MR2847949, ZBL1232.62086.
10.1198/jasa.2011.tm10560
CAS Web of Science® Google Scholar
14 Zou H., The adaptive lasso and its oracle properties, Journal of the American Statistical Association. (2006) 101, no. 476, 1418–1429, https://doi.org/10.1198/016214506000000735, MR2279469, ZBL1171.62326.
10.1198/016214506000000735
CAS Web of Science® Google Scholar
15 Hyvärinen A. and Oja E., A fast fixed-point algorithm for independent component analysis, Neural Computation. (1997) 9, no. 7, 1483–1492.
10.1162/neco.1997.9.7.1483
Web of Science® Google Scholar
16 Härdle W., Liang H., and Gao J. T., Partially Linear Models, 2000, Physica, Heidelberg, Germany, https://doi.org/10.1007/978-3-642-57700-0, MR1787637.
10.1007/978-3-642-57700-0
Google Scholar

All articles

Remodeling and Estimation for Sparse Partially Linear Regression Models

Abstract

1. Introduction

2. New Estimator for the Parameter of Interest

2.1. Multistep-Adjustment by Correlation

2.2. Model Simplification

2.3. Asymptotic Property of Point Estimator

2.4. Some Issues on Implementation

3. Simulation Studies

3.1. Linear Model with Normally Distributed Covariates

3.2. Partially Linear Model with Nonnormally Distributed Covariates

4. Some Remarks

Acknowledgment

Appendix

A. Some Conditions and Proofs

A.1. Regularity Conditions (C1)–(C6)

A.2. Proof for Theorem 1

A.3. Proof for Theorem 2

A.4. Proof for Theorem 4

References

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Remodeling and Estimation for Sparse Partially Linear Regression Models

Abstract

1. Introduction

2. New Estimator for the Parameter of Interest

2.1. Multistep-Adjustment by Correlation

2.2. Model Simplification

2.3. Asymptotic Property of Point Estimator

2.4. Some Issues on Implementation

3. Simulation Studies

3.1. Linear Model with Normally Distributed Covariates

3.2. Partially Linear Model with Nonnormally Distributed Covariates

4. Some Remarks

Acknowledgment

Appendix

A. Some Conditions and Proofs

A.1. Regularity Conditions (C1)–(C6)

A.2. Proof for Theorem 1

A.3. Proof for Theorem 2

A.4. Proof for Theorem 4

References

References

Related

Information