This note discusses a class of models for panel data that accommodate between-group heterogeneity that is allowed to exhibit positive within-group variance. Such a set-up generalizes the traditional fixed-effect paradigm in which between-group heterogeneity is limited to univariate factors that act like constants within groups. Notable members of the class of models considered are non-linear regression models with additive heterogeneity and multiplicative-error models suitable for non-negative limited dependent variables. The heterogeneity is modelled as a non-parametric nuisance function of covariates whose functional form is fixed within groups but is allowed to vary freely across groups. A simple approach to perform inference in such situations is based on local first-differencing of observations within a given group. This leads to moment conditions that, asymptotically, are free of nuisance functions. Conventional generalized method of moments procedures can then be readily applied. In particular, under suitable regularity conditions, such estimators are consistent and asymptotically normal, and asymptotically valid inference can be performed using a plug-in estimator of the asymptotic variance.

1. INTRODUCTION

The linear fixed-effect model is a cornerstone model in applied microeconometrics. The introduction of intercept terms that are heterogeneous across units allows us to control for various permanent differences between units that cannot be observed by the researcher. For example, in the influential work of Mundlak (1961, 1978), the aim is to control for managerial ability in the estimation of production functions. With Cobb–Douglas technology, log-output of firm i at time j equals

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0001$

where $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0002$ represents log-input factors such as capital and labour, α₀ is the corresponding vector of elasticities, and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0003$ is total factor productivity, which will typically be correlated with the inputs, rendering the ordinary least-squares estimator of α₀ inconsistent. To estimate the elasticities from within-group variation, total factor productivity is decomposed as $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0004$ , where $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0005$ is assumed to be orthogonal to the production inputs but $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0006$ can be correlated with the $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0007$ . In this case, a within-group transformation will sweep out $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0008$ , after which least-squares can be applied to estimate α₀. The inclusion of fixed effects in this manner has become standard practice in applied work.

However, there are good reasons to believe that unobserved heterogeneity goes beyond what can be captured by such location parameters. In the production-function example, it seems natural that managerial ability depends on such things as experience, education and sector-specific characteristics. As such, ability itself is the outcome of a production process, and it can be difficult to justify that it remains constant over the sampling period. A more appropriate way to control for managerial ability then could have $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0009$ for some latent function $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0010$ that maps drivers $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0011$ , such as experience and schooling, into ability. Similarly, in matching models, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0012$ could represent the match-efficiency parameter. In the context of the labour market, Sedláček (2014) finds empirical evidence that matching efficiency is procyclical and is, at least partially, driven by the hiring standards of firms. Moreover, the matching literature has argued that the efficiency parameter should be endogenous to the agents' optimization behaviour, rather than exogenously determined.

This note suggests a simple way to conduct inference on common parameters in panel-data models with non-parametric incidental functions. Besides the linear set-up just described, the approach can equally be used for models with multiplicative errors, such as models for count data, and for logit models, for example. In either case, staying true to the fixed-effect paradigm, the aim is to estimate a finite-dimensional parameter while controlling for between-group heterogeneity in a non-parametric manner. The difference with the traditional fixed-effect view, however, is that the heterogeneity is allowed to vary both within and between groups. This view on unobserved heterogeneity is different from the one taken in recent work on the linear random-coefficient model (Arellano and Bonhomme, 2012, and Graham and Powell, 2012) and, as such, can serve as a useful complement.

2. LOCAL FIRST-DIFFERENCING

2.1. Incidental Functions

Consider a panel dataset consisting of two observations on n units. Restricting attention to two observations is without loss of generality. We let $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0013$ denote the outcome variables for unit i, and let $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0014$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0015$ denote observable covariates. The distinction between the variables $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0016$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0017$ will become clear below.

The workhorse fixed-effect model specifies unit i's response function as a linear function with a unit-specific intercept, as in

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0018$ (2.1)

for noise terms $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0019$ and vector of slope coefficients α₀. Applications of this model are widespread. When $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0020$ , an ordinary least-squares regression of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0021$ on $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0022$ is known to yield a consistent point estimator of α₀ as $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0023$ . Indeed,

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0024$

globally identifies α₀ provided $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0025$ has full rank. When the covariates are not strictly exogenous, the above moment condition can be replaced by $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0026$ for a vector of instrumental variables $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0027$ . A leading case would be a dynamic model where $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0028$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0029$ contains further lags of the outcome variable; see, e.g., Arellano and Bond (1991).

In 2.1, α₀ is the parameter of interest. The traditional way of controlling for additional heterogeneity among agents is by introducing a set of strictly exogenous control variables, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0030$ , as additional regressors. This delivers a specification of the form

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0031$ (2.2)

Here, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0032$ can be flexible polynomials specifications or other non-linear transformations of the controls and, of course, can include interactions with $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0033$ . The choice of functional form is up to the researcher, and linearity is popular because of the resulting ease of computation via multiple regression. An approach that would prevent functional-form misspecification in the effect of the control variables would be to work with the partially linear model

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0034$ (2.3)

of which 2.2 is merely a special case. This is the approach advocated in the work of Robinson (1988). While he worked in a cross-sectional framework, it is quite obvious that his results can be extended to the panel-data version in 2.3. See Li and Stengos (1996), Ai et al. (2014) and You and Zhou (2014) for a detailed analysis of such an approach in this type of model.

None the less, a specification such as 2.3 is less natural in a panel context than in a cross-sectional framework. Indeed, a main aim of the panel literature has been to devise flexible methods, which allow for unobserved heterogeneity between units that stretches beyond what can be tackled with cross-section data. While 2.3 allows the effect of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0035$ to be non-parametric, it is restricted to be identical across i. Recent empirical work has stressed the presence of excess heterogeneity across agents in microeconometric models. Guvenen (2009), Browning et al. (2010) and Browning and Carro (2010, 2014), for example, provide extensive discussions and empirical evidence on this. An alternative extension of the Robinson framework that stays true to the fixed-effect tradition would be

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0036$ (2.4)

where, now, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0037$ are unit-specific non-parametric functions, and the usual location parameter $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0038$ has been absorbed into it. A special case of 2.4 that has received some attention recently is the standard linear random-coefficient model (Swamy, 1970, Chamberlain, 1992b, and Arellano and Bonhomme, 2012). Another is the varying-coefficient model (Hastie and Tibshirani, 1993). None the less, the motivation for allowing for excess heterogeneity is clearly different in these cases.

A complication with 2.4, as opposed to 2.3, is that α₀ can no longer be identified through the approach of Robinson (1988). Indeed, an extension of his argument would require that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0039$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0040$ can be consistently estimated. Clearly, this is not possible under asymptotics where the number of observations per unit is held fixed. Suppose that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0041$ for some function $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0042$ for which the expectation $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0043$ exists for all v in a neighbourhood of zero. Then, provided that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0044$ is a constant,

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0045$

globally identifies α₀ if $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0046$ has full rank. Indeed,

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0047$

under this condition. The smoothness condition on $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0048$ is fairly weak. Suppose that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0049$ is continuously differentiable. Then, its derivative, say $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0050$ , is locally bounded. Hence, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0051$ , with v restricted to the neighbourhood $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0052$ , $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0053$ , satisfies the required Lipschitz-type smoothness condition. When the support of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0054$ is discrete, we require that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0055$ . An estimator of α₀ would be

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0056$

where $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0057$ . This estimator is $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0058$ -consistent and asymptotically normal under standard moment assumptions. When $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0059$ is continuous, the event $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0060$ has probability zero and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0061$ -consistent estimation will not be possible. However, under suitable regularity conditions, we can still perform asymptotically valid inference on α₀ via $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0062$ on redefining $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0063$ as

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0064$

for a chosen kernel function k and a bandwidth $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0065$ .1^,2 Here, the convergence rate of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0066$ will be reduced to $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0067$ . We provide regularity conditions and more detailed asymptotic theory below. In either case, the approach consists of simply constructing $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0068$ for each i and then performing a weighted least-squares regression of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0069$ on $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0070$ with weight $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0071$ . This estimator is similar in spirit to the one considered for sample-selection models by Kyriazidou (1997, 2001).

The $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0072$ can be seen as incidental functions, as opposed to the incidental parameters $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0073$ in the conventional set-up in 2.1. Furthermore, the $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0074$ can be seen as draws from a distribution that depends on $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0075$ but which is left unspecified. The approach just described does not estimate these functions but, rather, differences them out by focusing on the population of “stayers” (Chamberlain, 1984), that is, on units for which $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0076$ lies in a shrinking neighbourhood of zero. As such, this approach could be called local first-differencing. Of course, a prerequisite to identification is that the support of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0077$ and the support of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0078$ are not disjoint. The leading example where this requirement would be violated is when the $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0079$ include time dummies or time trends. Such aggregate time effects are commonly used in applied work. Of course, they can easily be included in the traditional way, that is, by including them in a linear fashion and assigning them homogeneous coefficients.

Hoderlein and White (2012) have also recently used stayers to recover parameters of interest from short panel data. They consider fully non-parametric structures of the form

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0080$

and study conditions under which local-average response functions can be identified and estimated. More precisely, they give conditions under which

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0081$

which is an average partial effect for the subpopulation of stayers. Our set-up is more modest in terms of generality and focuses on different parameters of interest. As such, we can allow $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0082$ to be predetermined as opposed to strictly exogenous, and can accommodate discrete components in both $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0083$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0084$ . None the less, as in Hoderlein and White (2012) and Arellano and Bonhomme (2012), allowing for feedback toward the $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0085$ is complicated, because the distribution of the transitory shocks, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0086$ , can change after conditioning on the event $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0087$ .

2.2. Non-linear Specifications

The applicability of local first-differencing is not limited to the linear model. Indeed, any fixed-effect model where heterogeneous intercepts can be accommodated can be extended to allow for incidental functions. The literature on panel data models is large, and we do not attempt to give a complete overview here. A survey is provided by Arellano and Honoré (2001).

One obvious generalization would be to allow for a non-linear relationship between $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0088$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0089$ but to maintain additivity of the incidental function, as in

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0090$

for some function μ that is known up to the Euclidean parameter α₀. Another type of non-linearity that has proved important in panel data applications features in models of the form

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0091$

A leading example of such a multiplicative model would be an exponential regression model with mean $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0092$ . Here,

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0093$

In the conventional set-up, fixed-effect estimation of multiplicative models of this form was discussed by Chamberlain (1992a) and Wooldridge (1997). Dynamic versions of this model can equally be handled; see Blundell et al. (2002).

The multinomial logit model with fixed effects is the prime example of the success of conditional maximum likelihood in panel models (Chamberlain, 1980). A binary-choice version of a specification with incidental functions would have

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0094$

with $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0095$ independent of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0096$ . An application of the conditional-likelihood argument shows that

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0097$

which is free of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0098$ . The optimal unconditional moment condition in the sense of Chamberlain (1987) equals

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0099$

and can be seen as the first-order condition associated with a local conditional likelihood. It is useful to note that this moment condition is very similar to the first-order condition of the estimator of Honoré and Kyriazidou (2000) for a dynamic logit model with exogenous regressors.

In each of the examples just mentioned, it is easy to construct a generalized method of moments (GMM) estimator in which the usual moment condition is complemented with the kernel weight $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0100$ as described above. We provide asymptotic theory in the next section.

There are several other models that could be extended to allow for incidental functions. Some interesting examples are truncated and censored regression models (Honoré, 1992), as well as general transformation models and generalized regression models (Abrevaya, 1999, 2000). The resulting estimators would have similar asymptotic properties. However, they are M-estimators rather than GMM estimators, and the associated criterion functions are characterized by a certain degree of non-smoothness. As such, they will not fit exactly the generic set-up entertained below.

3. ASYMPTOTIC THEORY

Consider a generic set-up in which a Euclidean parameter $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0101$ is identified through the moment condition

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0102$

where m is a vector function that is known up to α₀. An empirical counterpart to the population moment at α is

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0103$

where $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0104$ is a non-negative bandwidth sequence that is o(1) and k is a kernel function. Regularity conditions on $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0105$ and k are collected in Assumption 3.3. A GMM estimator of α₀ based on $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0106$ is then given by

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0107$

where $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0108$ denotes a given positive-definite weight matrix. This section provides distribution theory for $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0109$ in the form of a consistency result and an asymptotic-normality result. The proofs are given in the Appendix.

Some elementary regularity conditions are collected in Assumption 3.1.

Assumption 3.1. $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0110$ is a compact set and α₀ is interior to it. m is twice continuously differentiable in α with derivatives $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0111$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0112$ . The distribution of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0113$ is absolutely continuous and the associated density function is strictly positive in a neighbourhood of zero.

Let $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0114$ denote the Euclidean and Frobenius norms. To state sufficient conditions for consistency, let

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0115$

for f the density of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0116$ .

Assumption 3.2.For all $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0117$ , $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0118$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0119$ are finite, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0120$ is bounded in v, and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0121$ is continuous in v in a neighbourhood of zero.

Assumption 3.3. $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0122$ is a bounded and symmetric sth-order kernel function.

The conditions in Assumptions 3.2 and 3.3 are conventional. We refer to Li and Racine (2007) for a definition, examples and a discussion of kernel functions that satisfy Assumption 3.3.

The consistency result is stated in Theorem 3.1.

Theorem 3.1.Let Assumptions 3.1–3.3 hold. Suppose that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0123$ for W₀ non-stochastic and positive definite. Then $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0124$ .

To derive the limit distribution of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0125$ , we need an additional set of conditions. We let

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0126$

in the following assumption.

Assumption 3.4.For all $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0127$ , $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0128$ is finite, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0129$ is bounded in v, and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0130$ is continuous in v in a neighbourhood of zero. $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0131$ is bounded. $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0132$ is continuous in v in a neighbourhood of zero and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0133$ is bounded. $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0134$ is s-times continuously differentiable with bounded derivatives.

Let $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0135$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0136$ . Theorem 3.2 gives the asymptotic distribution of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0137$ .

Theorem 3.2.Let Assumptions 3.1–3.4 hold. Suppose that Σ has maximal column rank, that Δ is positive definite, and that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0138$ for W₀ non-stochastic and positive definite. Then

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0139$

provided $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0140$ and $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0141$ .

The matrices Σ and Δ are, respectively, estimated consistently by

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0142$

ACKNOWLEDGMENTS

I am grateful to Jaap Abbring, Manuel Arellano, Stefan Hoderlein and three referees.

APPENDIX A

PROOFS OF THEOREMS

Proof of Theorem 3.1: Let $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0143$ . Given identification, the regularity conditions in Assumption 3.1, and the fact that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0144$ , we only need to verify

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0145$

to establish consistency; see Theorem 2.1 of Newey and McFadden (1994). Because m is differentiable, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0146$ is finite, and k is bounded, Lemma 2.9 in Newey and McFadden (1994) further states that it suffices to prove that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0147$ for all $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0148$ . Fix $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0149$ . By the triangle inequality,

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0150$

Assumptions 3.2 and 3.3 imply that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0151$ by the law of large numbers. Dominated convergence implies that

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0152$

and so $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0153$ . Thus, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0154$ . This holds for any $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0155$ , and so consistency has been shown. $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0156$

Proof of Theorem 3.2: We show (a) $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0157$ and (b) $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0158$ . The asymptotic distribution of the estimator then follows from the linearization

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0159$

by an application of the delta method. To show (a), first observe that

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0160$

The second term on the right-hand side is a bias term. By an sth-order expansion and Assumptions 3.3 and 3.4,

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0161$

Because $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0162$ , $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0163$ and the bias term is asymptotically negligible. The leading term satisfies the conditions of Lyapunov's central limit theorem. To see this, write

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0164$

Then, $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0165$ and

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0166$

by a bounded-convergence argument and Assumption 3.4. Finally, also the Lyapunov condition is satisfied, because

$urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0167$

which vanishes as $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0168$ . This establishes (a). To verify (b), we can proceed as in the proof of Theorem 3.1. In particular, Lemma 2.9 of Newey and McFadden (1994) can again be applied. By the moment conditions in Assumption 3.4, we have that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0169$ An application of the bounded convergence theorem similarly shows that $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0170$ . Uniform convergence of the Jacobian matrix follows and the proof is complete. $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0171$

Supporting Information

REFERENCES

Abrevaya, J. (1999). Leapfrog estimation of a fixed-effects model with unknown transformation of the dependent variable. Journal of Econometrics 93, 203–28.
10.1016/S0304-4076(99)00009-3
Web of Science® Google Scholar
Abrevaya, J. (2000). Rank estimation of a generalized fixed-effects regression model. Journal of Econometrics 95, 1–23.
10.1016/S0304-4076(99)00027-5
Web of Science® Google Scholar
Ai, C., J. You and Y. Zhou (2014). Estimation of fixed effects panel data partially linear additive regression models. Econometrics Journal 17, 83–106.
10.1111/ectj.12011
Web of Science® Google Scholar
Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58, 277–97.
10.2307/2297968
Web of Science® Google Scholar
Arellano, M. and S. Bonhomme (2012). Identifying distributional characteristics in random coefficients panel data models. Review of Economic Studies 79, 987–1020.
10.1093/restud/rdr045
Web of Science® Google Scholar
Arellano, M. and B. E. Honoré (2001). Panel data models: some recent developments. In J. J. Heckman and E. Leamer (Eds.), Handbook of Econometrics, Volume 5, 3229–329. Amsterdam: North-Holland.
10.1016/S1573-4412(01)05006-1
Google Scholar
Blundell, R. W., R. Griffith and F. Windmeijer (2002). Individual effects and dynamics in count data models. Journal of Econometrics 108, 113–31.
10.1016/S0304-4076(01)00108-7
Web of Science® Google Scholar
Browning, M. and J. M. Carro (2010). Heterogeneity in dynamic discrete choice models. Econometrics Journal 13, 1–39.
10.1111/j.1368-423X.2009.00301.x
Web of Science® Google Scholar
Browning, M. and J. M. Carro (2014). Dynamic binary outcome models with maximal heterogeneity. Journal of Econometrics 178, 805–23.
10.1016/j.jeconom.2013.11.005
Web of Science® Google Scholar
Browning, M., M. Ejrnæs and J. Alvarez (2010). Modeling income processes with lots of heterogeneity. Review of Economic Studies 77, 1353–81.
10.1111/j.1467-937X.2010.00612.x
Web of Science® Google Scholar
Chamberlain, G. (1980). Analysis of covariance with qualitative data. Review of Economic Studies 47, 225–38.
10.2307/2297110
Web of Science® Google Scholar
Chamberlain, G. (1984). Panel data. In Z. Griliches and M. Intriligator (Eds.), Handbook of Econometrics, Volume 2, 1247–315. Amsterdam: North-Holland.
10.1016/S1573-4412(84)02014-6
Google Scholar
Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment restrictions. Econometrica 34, 305–34.
10.1016/0304-4076(87)90015-7
Web of Science® Google Scholar
Chamberlain, G. (1992a). Comment: sequential moment restrictions in panel data. Journal of Business and Economic Statistics 10, 20–26.
10.2307/1391799
Web of Science® Google Scholar
Chamberlain, G. (1992b). Efficiency bounds for semiparametric regression. Econometrica 60, 567–96.
10.2307/2951584
Web of Science® Google Scholar
Graham, B. S. and J. L. Powell (2012). Identification and estimation of average partial effects in “irregular” correlated random coefficient panel data models. Econometrica 80, 2105–52.
10.3982/ECTA8220
Web of Science® Google Scholar
Guvenen, F. (2009). An empirical investigation of labour income processes. Review of Economic Dynamics 12, 58–79.
10.1016/j.red.2008.06.004
Web of Science® Google Scholar
Härdle, W., P. Hall and H. Ichimura (1993). Optimal smoothing in single-index models. Annals of Statistics 21, 157–78.
10.1214/aos/1176349020
Web of Science® Google Scholar
Hastie, T. and R. Tibshirani (1993). Varying-coefficient models. Journal of the Royal Statistical Society, Series B 55, 757–96.
Web of Science® Google Scholar
Hoderlein, S. and H. White (2012). Nonparametric identification in nonseparable panel data models with generalized fixed effects. Journal of Econometrics 168, 300–14.
10.1016/j.jeconom.2012.01.033
Web of Science® Google Scholar
Honoré, B. E. (1992). Trimmed LAD and least squares estimation of truncated and censored regression models with fixed effects. Econometrica 60, 553–65.
10.2307/2951583
Web of Science® Google Scholar
Honoré, B. E. and E. Kyriazidou (2000). Panel data discrete choice models with lagged dependent variables. Econometrica 68, 839–74.
10.1111/1468-0262.00139
Web of Science® Google Scholar
Kyriazidou, E. (1997). Estimation of a panel data sample selection model. Econometrica 65, 1335–64.
10.2307/2171739
Web of Science® Google Scholar
Kyriazidou, E. (2001). Estimation of dynamic panel data sample selection models. Review of Economic Studies 68, 543–72.
10.1111/1467-937X.00180
Web of Science® Google Scholar
Li, Q. and J. S. Racine (2007). Nonparametric Econometrics: Theory and Practice. Princeton, NJ: Princeton University Press.
Google Scholar
Li, Q. and T. Stengos (1996). Semiparametric estimation of partially linear panel data models. Journal of Econometrics 71, 389–97.
10.1016/0304-4076(94)01711-5
Web of Science® Google Scholar
Mundlak, Y. (1961). Empirical production function free of management bias. Journal of Farm Economics 43, 44–56.
10.2307/1235460
Web of Science® Google Scholar
Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica 46, 69–85.
10.2307/1913646
Web of Science® Google Scholar
Newey, W. K. and D. L. McFadden (1994). Large sample estimation and hypothesis testing. In R. Engle and D. L. McFadden (Eds.), Handbook of Econometrics, Volume 4, 2111–245. Amsterdam: North-Holland.
10.1016/S1573-4412(05)80005-4
Google Scholar
Robinson, P. M. (1988). Root-N-consistent semiparametric regression. Econometrica 56, 931–54.
10.2307/1912705
Web of Science® Google Scholar
Sedláček, P. (2014). Match efficiency and firms' hiring standards. Journal of Monetary Economics 62, 123–33.
10.1016/j.jmoneco.2013.10.001
Web of Science® Google Scholar
Swamy, P. A. V. B. (1970). Efficient inference in a random coefficient regression model. Econometrica 38, 311–23.
10.2307/1913012
Web of Science® Google Scholar
Wooldridge, J. M. (1997). Multiplicative panel data models without the strict exogeneity assumption. Econometric Theory 13, 667–78.
10.1017/S0266466600006125
Web of Science® Google Scholar
You, J. and X. Zhou (2014). Asymptotic theory in fixed effects panel data seemingly unrelated partially linear regression models. Econometric Theory 30, 407–35.
10.1017/S0266466613000352
Web of Science® Google Scholar

1 As in standard non-parametric regression theory, the choice of k has a much smaller effect than does the choice of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0172$ . An automated approach to selecting the bandwidth is to estimate it jointly with α₀, as in Härdle et al. (1993). See the Online Appendix for details and simulation experiments.

2 Of course, both discrete and continuous variables can equally be accommodated by specifying kernel weights for the continuous elements of $urn:x-wiley:13684221:media:ectj12035:ectj12035-math-0173$ and indicator functions for the discrete elements.

Volume17, Issue3

October 2014

Pages 373-382

Filename	Description
ectj12035-sup-0001-SupMat.pdf41.8 KB	Online Appendix: Simulations
ectj12035-sup-0002-SupMat.m2.6 KB	Replication Files

First-differencing in panel data models with incidental functions

Summary

1. INTRODUCTION

2. LOCAL FIRST-DIFFERENCING

2.1. Incidental Functions

2.2. Non-linear Specifications

3. ASYMPTOTIC THEORY

ACKNOWLEDGMENTS

APPENDIX A

PROOFS OF THEOREMS

Supporting Information

REFERENCES

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

First-differencing in panel data models with incidental functions

Summary

1. INTRODUCTION

2. LOCAL FIRST-DIFFERENCING

2.1. Incidental Functions

2.2. Non-linear Specifications

3. ASYMPTOTIC THEORY

ACKNOWLEDGMENTS

APPENDIX A

PROOFS OF THEOREMS

Supporting Information

REFERENCES

References

Related

Information