Volume 17, Issue 3 pp. 373-382
NOTE
Full Access

First-differencing in panel data models with incidental functions

Koen Jochmans

Koen Jochmans

Sciences Po, Department of Economics, 28 rue des Saints-Pères, 75007 Paris, France

Search for more papers by this author
First published: 03 August 2014

Summary

This note discusses a class of models for panel data that accommodate between-group heterogeneity that is allowed to exhibit positive within-group variance. Such a set-up generalizes the traditional fixed-effect paradigm in which between-group heterogeneity is limited to univariate factors that act like constants within groups. Notable members of the class of models considered are non-linear regression models with additive heterogeneity and multiplicative-error models suitable for non-negative limited dependent variables. The heterogeneity is modelled as a non-parametric nuisance function of covariates whose functional form is fixed within groups but is allowed to vary freely across groups. A simple approach to perform inference in such situations is based on local first-differencing of observations within a given group. This leads to moment conditions that, asymptotically, are free of nuisance functions. Conventional generalized method of moments procedures can then be readily applied. In particular, under suitable regularity conditions, such estimators are consistent and asymptotically normal, and asymptotically valid inference can be performed using a plug-in estimator of the asymptotic variance.

1. INTRODUCTION

The linear fixed-effect model is a cornerstone model in applied microeconometrics. The introduction of intercept terms that are heterogeneous across units allows us to control for various permanent differences between units that cannot be observed by the researcher. For example, in the influential work of Mundlak (1961, 1978), the aim is to control for managerial ability in the estimation of production functions. With Cobb–Douglas technology, log-output of firm i at time j equals
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0001
where urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0002 represents log-input factors such as capital and labour, α0 is the corresponding vector of elasticities, and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0003 is total factor productivity, which will typically be correlated with the inputs, rendering the ordinary least-squares estimator of α0 inconsistent. To estimate the elasticities from within-group variation, total factor productivity is decomposed as urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0004, where urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0005 is assumed to be orthogonal to the production inputs but urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0006 can be correlated with the urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0007. In this case, a within-group transformation will sweep out urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0008, after which least-squares can be applied to estimate α0. The inclusion of fixed effects in this manner has become standard practice in applied work.

However, there are good reasons to believe that unobserved heterogeneity goes beyond what can be captured by such location parameters. In the production-function example, it seems natural that managerial ability depends on such things as experience, education and sector-specific characteristics. As such, ability itself is the outcome of a production process, and it can be difficult to justify that it remains constant over the sampling period. A more appropriate way to control for managerial ability then could have urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0009 for some latent function urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0010 that maps drivers urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0011, such as experience and schooling, into ability. Similarly, in matching models, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0012 could represent the match-efficiency parameter. In the context of the labour market, Sedláček (2014) finds empirical evidence that matching efficiency is procyclical and is, at least partially, driven by the hiring standards of firms. Moreover, the matching literature has argued that the efficiency parameter should be endogenous to the agents' optimization behaviour, rather than exogenously determined.

This note suggests a simple way to conduct inference on common parameters in panel-data models with non-parametric incidental functions. Besides the linear set-up just described, the approach can equally be used for models with multiplicative errors, such as models for count data, and for logit models, for example. In either case, staying true to the fixed-effect paradigm, the aim is to estimate a finite-dimensional parameter while controlling for between-group heterogeneity in a non-parametric manner. The difference with the traditional fixed-effect view, however, is that the heterogeneity is allowed to vary both within and between groups. This view on unobserved heterogeneity is different from the one taken in recent work on the linear random-coefficient model (Arellano and Bonhomme, 2012, and Graham and Powell, 2012) and, as such, can serve as a useful complement.

2. LOCAL FIRST-DIFFERENCING

2.1. Incidental Functions

Consider a panel dataset consisting of two observations on n units. Restricting attention to two observations is without loss of generality. We let urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0013 denote the outcome variables for unit i, and let urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0014 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0015 denote observable covariates. The distinction between the variables urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0016 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0017 will become clear below.

The workhorse fixed-effect model specifies unit i's response function as a linear function with a unit-specific intercept, as in
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0018(2.1)
for noise terms urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0019 and vector of slope coefficients α0. Applications of this model are widespread. When urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0020, an ordinary least-squares regression of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0021 on urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0022 is known to yield a consistent point estimator of α0 as urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0023. Indeed,
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0024
globally identifies α0 provided urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0025 has full rank. When the covariates are not strictly exogenous, the above moment condition can be replaced by urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0026 for a vector of instrumental variables urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0027. A leading case would be a dynamic model where urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0028 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0029 contains further lags of the outcome variable; see, e.g., Arellano and Bond (1991).
In 2.1, α0 is the parameter of interest. The traditional way of controlling for additional heterogeneity among agents is by introducing a set of strictly exogenous control variables, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0030, as additional regressors. This delivers a specification of the form
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0031(2.2)
Here, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0032 can be flexible polynomials specifications or other non-linear transformations of the controls and, of course, can include interactions with urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0033. The choice of functional form is up to the researcher, and linearity is popular because of the resulting ease of computation via multiple regression. An approach that would prevent functional-form misspecification in the effect of the control variables would be to work with the partially linear model
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0034(2.3)
of which 2.2 is merely a special case. This is the approach advocated in the work of Robinson (1988). While he worked in a cross-sectional framework, it is quite obvious that his results can be extended to the panel-data version in 2.3. See Li and Stengos (1996), Ai et al. (2014) and You and Zhou (2014) for a detailed analysis of such an approach in this type of model.
None the less, a specification such as 2.3 is less natural in a panel context than in a cross-sectional framework. Indeed, a main aim of the panel literature has been to devise flexible methods, which allow for unobserved heterogeneity between units that stretches beyond what can be tackled with cross-section data. While 2.3 allows the effect of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0035 to be non-parametric, it is restricted to be identical across i. Recent empirical work has stressed the presence of excess heterogeneity across agents in microeconometric models. Guvenen (2009), Browning et al. (2010) and Browning and Carro (2010, 2014), for example, provide extensive discussions and empirical evidence on this. An alternative extension of the Robinson framework that stays true to the fixed-effect tradition would be
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0036(2.4)
where, now, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0037 are unit-specific non-parametric functions, and the usual location parameter urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0038 has been absorbed into it. A special case of 2.4 that has received some attention recently is the standard linear random-coefficient model (Swamy, 1970, Chamberlain, 1992b, and Arellano and Bonhomme, 2012). Another is the varying-coefficient model (Hastie and Tibshirani, 1993). None the less, the motivation for allowing for excess heterogeneity is clearly different in these cases.
A complication with 2.4, as opposed to 2.3, is that α0 can no longer be identified through the approach of Robinson (1988). Indeed, an extension of his argument would require that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0039 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0040 can be consistently estimated. Clearly, this is not possible under asymptotics where the number of observations per unit is held fixed. Suppose that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0041 for some function urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0042 for which the expectation urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0043 exists for all v in a neighbourhood of zero. Then, provided that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0044 is a constant,
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0045
globally identifies α0 if urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0046 has full rank. Indeed,
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0047
under this condition. The smoothness condition on urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0048 is fairly weak. Suppose that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0049 is continuously differentiable. Then, its derivative, say urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0050, is locally bounded. Hence, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0051, with v restricted to the neighbourhood urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0052, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0053, satisfies the required Lipschitz-type smoothness condition. When the support of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0054 is discrete, we require that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0055. An estimator of α0 would be
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0056
where urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0057. This estimator is urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0058-consistent and asymptotically normal under standard moment assumptions. When urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0059 is continuous, the event urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0060 has probability zero and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0061-consistent estimation will not be possible. However, under suitable regularity conditions, we can still perform asymptotically valid inference on α0 via urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0062 on redefining urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0063 as
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0064
for a chosen kernel function k and a bandwidth urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0065., Here, the convergence rate of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0066 will be reduced to urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0067. We provide regularity conditions and more detailed asymptotic theory below. In either case, the approach consists of simply constructing urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0068 for each i and then performing a weighted least-squares regression of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0069 on urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0070 with weight urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0071. This estimator is similar in spirit to the one considered for sample-selection models by Kyriazidou (1997, 2001).

The urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0072 can be seen as incidental functions, as opposed to the incidental parameters urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0073 in the conventional set-up in 2.1. Furthermore, the urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0074 can be seen as draws from a distribution that depends on urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0075 but which is left unspecified. The approach just described does not estimate these functions but, rather, differences them out by focusing on the population of “stayers” (Chamberlain, 1984), that is, on units for which urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0076 lies in a shrinking neighbourhood of zero. As such, this approach could be called local first-differencing. Of course, a prerequisite to identification is that the support of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0077 and the support of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0078 are not disjoint. The leading example where this requirement would be violated is when the urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0079 include time dummies or time trends. Such aggregate time effects are commonly used in applied work. Of course, they can easily be included in the traditional way, that is, by including them in a linear fashion and assigning them homogeneous coefficients.

Hoderlein and White (2012) have also recently used stayers to recover parameters of interest from short panel data. They consider fully non-parametric structures of the form
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0080
and study conditions under which local-average response functions can be identified and estimated. More precisely, they give conditions under which
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0081
which is an average partial effect for the subpopulation of stayers. Our set-up is more modest in terms of generality and focuses on different parameters of interest. As such, we can allow urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0082 to be predetermined as opposed to strictly exogenous, and can accommodate discrete components in both urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0083 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0084. None the less, as in Hoderlein and White (2012) and Arellano and Bonhomme (2012), allowing for feedback toward the urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0085 is complicated, because the distribution of the transitory shocks, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0086, can change after conditioning on the event urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0087.

2.2. Non-linear Specifications

The applicability of local first-differencing is not limited to the linear model. Indeed, any fixed-effect model where heterogeneous intercepts can be accommodated can be extended to allow for incidental functions. The literature on panel data models is large, and we do not attempt to give a complete overview here. A survey is provided by Arellano and Honoré (2001).

One obvious generalization would be to allow for a non-linear relationship between urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0088 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0089 but to maintain additivity of the incidental function, as in
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0090
for some function μ that is known up to the Euclidean parameter α0. Another type of non-linearity that has proved important in panel data applications features in models of the form
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0091
A leading example of such a multiplicative model would be an exponential regression model with mean urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0092. Here,
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0093
In the conventional set-up, fixed-effect estimation of multiplicative models of this form was discussed by Chamberlain (1992a) and Wooldridge (1997). Dynamic versions of this model can equally be handled; see Blundell et al. (2002).
The multinomial logit model with fixed effects is the prime example of the success of conditional maximum likelihood in panel models (Chamberlain, 1980). A binary-choice version of a specification with incidental functions would have
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0094
with urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0095 independent of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0096. An application of the conditional-likelihood argument shows that
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0097
which is free of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0098. The optimal unconditional moment condition in the sense of Chamberlain (1987) equals
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0099
and can be seen as the first-order condition associated with a local conditional likelihood. It is useful to note that this moment condition is very similar to the first-order condition of the estimator of Honoré and Kyriazidou (2000) for a dynamic logit model with exogenous regressors.

In each of the examples just mentioned, it is easy to construct a generalized method of moments (GMM) estimator in which the usual moment condition is complemented with the kernel weight urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0100 as described above. We provide asymptotic theory in the next section.

There are several other models that could be extended to allow for incidental functions. Some interesting examples are truncated and censored regression models (Honoré, 1992), as well as general transformation models and generalized regression models (Abrevaya, 1999, 2000). The resulting estimators would have similar asymptotic properties. However, they are M-estimators rather than GMM estimators, and the associated criterion functions are characterized by a certain degree of non-smoothness. As such, they will not fit exactly the generic set-up entertained below.

3. ASYMPTOTIC THEORY

Consider a generic set-up in which a Euclidean parameter urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0101 is identified through the moment condition
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0102
where m is a vector function that is known up to α0. An empirical counterpart to the population moment at α is
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0103
where urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0104 is a non-negative bandwidth sequence that is o(1) and k is a kernel function. Regularity conditions on urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0105 and k are collected in Assumption 3.3. A GMM estimator of α0 based on urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0106 is then given by
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0107
where urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0108 denotes a given positive-definite weight matrix. This section provides distribution theory for urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0109 in the form of a consistency result and an asymptotic-normality result. The proofs are given in the Appendix.

Some elementary regularity conditions are collected in Assumption 3.1.

Assumption 3.1.urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0110 is a compact set and α0 is interior to it. m is twice continuously differentiable in α with derivatives urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0111 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0112. The distribution of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0113 is absolutely continuous and the associated density function is strictly positive in a neighbourhood of zero.

Let urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0114 denote the Euclidean and Frobenius norms. To state sufficient conditions for consistency, let
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0115
for f the density of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0116.

Assumption 3.2.For all urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0117, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0118 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0119 are finite, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0120 is bounded in v, and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0121 is continuous in v in a neighbourhood of zero.

Assumption 3.3.urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0122 is a bounded and symmetric sth-order kernel function.

The conditions in Assumptions 3.2 and 3.3 are conventional. We refer to Li and Racine (2007) for a definition, examples and a discussion of kernel functions that satisfy Assumption 3.3.

The consistency result is stated in Theorem 3.1.

Theorem 3.1.Let Assumptions 3.13.3 hold. Suppose that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0123 for W0 non-stochastic and positive definite. Then urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0124.

To derive the limit distribution of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0125, we need an additional set of conditions. We let
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0126
in the following assumption.

Assumption 3.4.For all urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0127, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0128 is finite, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0129 is bounded in v, and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0130 is continuous in v in a neighbourhood of zero. urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0131 is bounded. urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0132 is continuous in v in a neighbourhood of zero and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0133 is bounded. urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0134 is s-times continuously differentiable with bounded derivatives.

Let urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0135 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0136. Theorem 3.2 gives the asymptotic distribution of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0137.

Theorem 3.2.Let Assumptions 3.13.4 hold. Suppose that Σ has maximal column rank, that Δ is positive definite, and that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0138 for W0 non-stochastic and positive definite. Then

urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0139
provided urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0140 and urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0141.

The matrices Σ and Δ are, respectively, estimated consistently by
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0142

ACKNOWLEDGMENTS

I am grateful to Jaap Abbring, Manuel Arellano, Stefan Hoderlein and three referees.

APPENDIX A

PROOFS OF THEOREMS

Proof of Theorem 3.1: Let urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0143. Given identification, the regularity conditions in Assumption 3.1, and the fact that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0144, we only need to verify
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0145
to establish consistency; see Theorem 2.1 of Newey and McFadden (1994). Because m is differentiable, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0146 is finite, and k is bounded, Lemma 2.9 in Newey and McFadden (1994) further states that it suffices to prove that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0147 for all urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0148. Fix urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0149. By the triangle inequality,
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0150
Assumptions 3.2 and 3.3 imply that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0151 by the law of large numbers. Dominated convergence implies that
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0152
and so urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0153. Thus, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0154. This holds for any urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0155, and so consistency has been shown. urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0156
Proof of Theorem 3.2: We show (a) urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0157 and (b) urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0158. The asymptotic distribution of the estimator then follows from the linearization
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0159
by an application of the delta method. To show (a), first observe that
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0160
The second term on the right-hand side is a bias term. By an sth-order expansion and Assumptions 3.3 and 3.4,
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0161
Because urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0162, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0163 and the bias term is asymptotically negligible. The leading term satisfies the conditions of Lyapunov's central limit theorem. To see this, write
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0164
Then, urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0165 and
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0166
by a bounded-convergence argument and Assumption 3.4. Finally, also the Lyapunov condition is satisfied, because
urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0167
which vanishes as urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0168. This establishes (a). To verify (b), we can proceed as in the proof of Theorem 3.1. In particular, Lemma 2.9 of Newey and McFadden (1994) can again be applied. By the moment conditions in Assumption 3.4, we have that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0169 An application of the bounded convergence theorem similarly shows that urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0170. Uniform convergence of the Jacobian matrix follows and the proof is complete. urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0171

  • 1 As in standard non-parametric regression theory, the choice of k has a much smaller effect than does the choice of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0172. An automated approach to selecting the bandwidth is to estimate it jointly with α0, as in Härdle et al. (1993). See the Online Appendix for details and simulation experiments.
  • 2 Of course, both discrete and continuous variables can equally be accommodated by specifying kernel weights for the continuous elements of urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0173 and indicator functions for the discrete elements.
    • The full text of this article hosted at iucr.org is unavailable due to technical difficulties.