First-differencing in panel data models with incidental functions
Summary
This note discusses a class of models for panel data that accommodate between-group heterogeneity that is allowed to exhibit positive within-group variance. Such a set-up generalizes the traditional fixed-effect paradigm in which between-group heterogeneity is limited to univariate factors that act like constants within groups. Notable members of the class of models considered are non-linear regression models with additive heterogeneity and multiplicative-error models suitable for non-negative limited dependent variables. The heterogeneity is modelled as a non-parametric nuisance function of covariates whose functional form is fixed within groups but is allowed to vary freely across groups. A simple approach to perform inference in such situations is based on local first-differencing of observations within a given group. This leads to moment conditions that, asymptotically, are free of nuisance functions. Conventional generalized method of moments procedures can then be readily applied. In particular, under suitable regularity conditions, such estimators are consistent and asymptotically normal, and asymptotically valid inference can be performed using a plug-in estimator of the asymptotic variance.
1. INTRODUCTION








However, there are good reasons to believe that unobserved heterogeneity goes beyond what can be captured by such location parameters. In the production-function example, it seems natural that managerial ability depends on such things as experience, education and sector-specific characteristics. As such, ability itself is the outcome of a production process, and it can be difficult to justify that it remains constant over the sampling period. A more appropriate way to control for managerial ability then could have for some latent function
that maps drivers
, such as experience and schooling, into ability. Similarly, in matching models,
could represent the match-efficiency parameter. In the context of the labour market, Sedláček (2014) finds empirical evidence that matching efficiency is procyclical and is, at least partially, driven by the hiring standards of firms. Moreover, the matching literature has argued that the efficiency parameter should be endogenous to the agents' optimization behaviour, rather than exogenously determined.
This note suggests a simple way to conduct inference on common parameters in panel-data models with non-parametric incidental functions. Besides the linear set-up just described, the approach can equally be used for models with multiplicative errors, such as models for count data, and for logit models, for example. In either case, staying true to the fixed-effect paradigm, the aim is to estimate a finite-dimensional parameter while controlling for between-group heterogeneity in a non-parametric manner. The difference with the traditional fixed-effect view, however, is that the heterogeneity is allowed to vary both within and between groups. This view on unobserved heterogeneity is different from the one taken in recent work on the linear random-coefficient model (Arellano and Bonhomme, 2012, and Graham and Powell, 2012) and, as such, can serve as a useful complement.
2. LOCAL FIRST-DIFFERENCING
2.1. Incidental Functions
Consider a panel dataset consisting of two observations on n units. Restricting attention to two observations is without loss of generality. We let denote the outcome variables for unit i, and let
and
denote observable covariates. The distinction between the variables
and
will become clear below.






















































The can be seen as incidental functions, as opposed to the incidental parameters
in the conventional set-up in 2.1. Furthermore, the
can be seen as draws from a distribution that depends on
but which is left unspecified. The approach just described does not estimate these functions but, rather, differences them out by focusing on the population of “stayers” (Chamberlain, 1984), that is, on units for which
lies in a shrinking neighbourhood of zero. As such, this approach could be called local first-differencing. Of course, a prerequisite to identification is that the support of
and the support of
are not disjoint. The leading example where this requirement would be violated is when the
include time dummies or time trends. Such aggregate time effects are commonly used in applied work. Of course, they can easily be included in the traditional way, that is, by including them in a linear fashion and assigning them homogeneous coefficients.








2.2. Non-linear Specifications
The applicability of local first-differencing is not limited to the linear model. Indeed, any fixed-effect model where heterogeneous intercepts can be accommodated can be extended to allow for incidental functions. The literature on panel data models is large, and we do not attempt to give a complete overview here. A survey is provided by Arellano and Honoré (2001).












In each of the examples just mentioned, it is easy to construct a generalized method of moments (GMM) estimator in which the usual moment condition is complemented with the kernel weight as described above. We provide asymptotic theory in the next section.
There are several other models that could be extended to allow for incidental functions. Some interesting examples are truncated and censored regression models (Honoré, 1992), as well as general transformation models and generalized regression models (Abrevaya, 1999, 2000). The resulting estimators would have similar asymptotic properties. However, they are M-estimators rather than GMM estimators, and the associated criterion functions are characterized by a certain degree of non-smoothness. As such, they will not fit exactly the generic set-up entertained below.
3. ASYMPTOTIC THEORY









Some elementary regularity conditions are collected in Assumption 3.1.
Assumption 3.1. is a compact set and α0 is interior to it. m is twice continuously differentiable in α with derivatives
and
. The distribution of
is absolutely continuous and the associated density function is strictly positive in a neighbourhood of zero.



Assumption 3.2.For all ,
and
are finite,
is bounded in v, and
is continuous in v in a neighbourhood of zero.
Assumption 3.3. is a bounded and symmetric sth-order kernel function.
The conditions in Assumptions 3.2 and 3.3 are conventional. We refer to Li and Racine (2007) for a definition, examples and a discussion of kernel functions that satisfy Assumption 3.3.
The consistency result is stated in Theorem 3.1.
Theorem 3.1.Let Assumptions 3.1–3.3 hold. Suppose that for W0 non-stochastic and positive definite. Then
.


Assumption 3.4.For all ,
is finite,
is bounded in v, and
is continuous in v in a neighbourhood of zero.
is bounded.
is continuous in v in a neighbourhood of zero and
is bounded.
is s-times continuously differentiable with bounded derivatives.
Let and
. Theorem 3.2 gives the asymptotic distribution of
.
Theorem 3.2.Let Assumptions 3.1–3.4 hold. Suppose that Σ has maximal column rank, that Δ is positive definite, and that for W0 non-stochastic and positive definite. Then




ACKNOWLEDGMENTS
I am grateful to Jaap Abbring, Manuel Arellano, Stefan Hoderlein and three referees.
APPENDIX A
PROOFS OF THEOREMS





























REFERENCES

