Volume 18, Issue 1 pp. 1-39
ARTICLE
Full Access

Non-parametric inference on the number of equilibria

Maximilian Kasy

Maximilian Kasy

Department of Economics, Harvard University, Littauer Center 200, 1805 Cambridge Street, Cambridge, MA 02138 USA

Institute for Advanced Studies, Stumpergasse 56, 1060 Vienna Austria

Search for more papers by this author
First published: 07 January 2015
Citations: 3

Summary

This paper proposes an estimator and develops an inference procedure for the number of roots of functions that are non-parametrically identified by conditional moment restrictions. It is shown that a smoothed plug-in estimator of the number of roots is superconsistent under i.i.d. asymptotics, but asymptotically normal under non-standard asymptotics. The smoothed estimator is furthermore asymptotically efficient relative to a simple plug-in estimator. The procedure proposed is used to construct confidence sets for the number of equilibria of static games of incomplete information and of stochastic difference equations. In an application to panel data on neighbourhood composition in the United States, no evidence of multiple equilibria is found.

1. INTRODUCTION

Some economic systems show large and persistent differences in outcomes even though the observable exogenous factors influencing these systems differ little. One explanation for such persistent differences in outcomes is multiplicity of equilibria. If a system does have multiple equilibria, then temporary, large interventions might have a permanent effect, by shifting the equilibrium attained, while long-lasting, small interventions might not have a permanent effect.

Knowing the number of equilibria, and in particular whether there are multiple equilibria, is of interest in many economic contexts. Multiple equilibria and poverty traps are discussed by Dasgupta and Ray (1986), Azariadis and Stachurski (2005) and Bowles et al. (2006). Poverty traps can arise, for instance, if an individual's productivity is a function of their income and if wage income reflects productivity, as in models of efficiency wages. Productivity might depend on wages because nutrition and health are improving with income. If this feedback mechanism is strong enough, there might be multiple equilibria, and extreme poverty might be self-perpetuating. In that case, public investments in nutrition and health can permanently lift families out of poverty. Multiple equilibria and urban segregation are discussed by Becker and Murphy (2000) and Card et al. (2008). Urban segregation, along ethnic or sociodemographic dimensions, might arise because households' location choices reflect a preference over neighbourhood composition. If this preference is strong enough, different compositions of a neighbourhood can be stable, given constant exogenous neighbourhood properties. Transition between different stable compositions might lead to rapid composition change, or ‘tipping’, as in the case of gentrification of a neighbourhood. Interest in such tipping behaviour motivated Card et al. (2008), and is the focus of the application discussed in Section 4. of this paper. Multiple equilibria and the market entry of firms are discussed by Bresnahan and Reiss (1991) and Berry (1992). Entering a market might only be profitable for a firm if its competitors do not enter that same market. As a consequence, different configurations of which firms serve which markets might be stable. In sociology, finally, multiple equilibria are of interest in the context of social norms. If the incentives to conform to prevailing behaviours are strong enough, different behavioural patterns might be stable norms (i.e. equilibria); see Young (2008). Transitions between such stable norms correspond to social change. One instance where this has been discussed is the assimilation of immigrant communities into the mainstream culture of a country.

This paper develops an estimator and an inference procedure for the number of equilibria of economic systems. It will be assumed that the equilibria of a system can be represented as solutions to the equation urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0001. It will furthermore be assumed that g can be identified by some conditional moment restriction. The procedure proposed here provides confidence sets for the number urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0002 of solutions to the equation urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0003.

This procedure can be summarized as follows. In a first stage, g and its derivative urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0004 are non-parametrically estimated. These first-stage estimates of g and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0005 are then plugged into a smooth functional urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0006, as defined in 2.4. We show that under standard i.i.d. asymptotics, and for the bandwidth parameter ρ small enough, the continuously distributed urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0007 is equal to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0008 with probability converging to 1. A superconsistent estimator of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0009 can thus be formed by projecting urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0010 on the closest integer.

We then show that a rescaled version of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0011 converges to a normal distribution under a non-standard sequence of experiments. This non-standard sequence of experiments is constructed using increasing levels of noise and shrinking bandwidth as sample size increases. Under this same sequence of experiments, the bootstrap provides consistent estimates of the bias and standard deviation of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0012 relative to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0013. We can thus construct confidence sets for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0014 using t-tests. These confidence sets are sets of integers containing the true number of roots with a pre-specified asymptotic probability of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0015. An alternative to the procedure proposed here would be to use the simple plug-in estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0016. This estimator just counts the roots of the first-stage estimate of g. We show, however, that the simple plug-in estimator is asymptotically inefficient relative to the smoothed estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0017 under the non-standard sequence of experiments considered.

Sections 3.4. and 3.5. discuss two general set-ups that allow us to translate the hypothesis of multiple equilibria into a hypothesis on the number of roots of some identifiable function g; these set-ups are static games of incomplete information and stochastic difference equations. Section 3.4. discusses a non-parametric model of static games of incomplete information, similar to the one analysed by Bajari et al. (2010). Under the assumptions detailed in Section 3.4., we can non-parametrically identify the average best response functions (averaging over private information) of the players in a static incomplete information game. This allows us to represent the Bayesian Nash equilibria of this game as roots of an estimable function. Section 3.4. discusses how to perform inference on the number of such Bayesian Nash equilibria.

Section 3.5. considers panel data of observations of some variable X, where X is generated by a general non-linear stochastic difference equation. This is motivated by the study of neighbourhood composition dynamics in Card et al. (2008). Section 3.5. argues that we can construct tests for the null hypothesis of equilibrium multiplicity of such non-linear difference equations by testing whether non-parametric quantile regressions of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0018 on X have multiple roots.

The rest of this paper is structured as follows. Section 2. presents the inference procedure and its asymptotic justification for the baseline case. Section 3. discusses generalizations, as well as identification and inference in static games of incomplete information and in stochastic difference equations. Section 4. applies the inference procedure to the data on neighbourhood composition studied by Card et al. (2008). In contrast to their results, no evidence of ‘tipping’ (equilibrium multiplicity) is found here. Section 5. concludes. Appendix A presents some Monte Carlo evidence. All proofs are relegated to Appendix B. Additional figures and tables are given in the online Appendix, which also contains a second application of the inference procedure to data on economic growth, similar to those discussed by Azariadis and Stachurski (2005), in their Section 4.1, and by Quah (1996).

2. INFERENCE IN THE BASELINE CASE

2.1. Set-up

Throughout this paper, the parameter of interest is the number of roots Z of some function g on a subset urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0019 of its support:
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0020(2.1)
Interest in this parameter is motivated by economic models in which the equilibria can be represented as roots of such a function g. Identification of the parameter urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0021 follows from identification of g on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0022. In this section, inference on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0023 is discussed for functions g with one-dimensional and compact domain and range. Throughout, the following assumption will be maintained.

Assumption 2.1.(a) The observable data are i.i.d. draws of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0024, where each draw has the same distribution as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0025; (b) the set urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0026 is compact, and the density of X is bounded away from 0 on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0027; (c) the function g is identified by a conditional moment restriction of the form

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0028(2.2)
(d) the function g is continuously differentiable and generic in the sense of Definition 2.1.

Examples of functions characterized by conditional moment restrictions as in 2.2 are conditional mean regressions, for which urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0029, and conditional qth quantile regressions, for which urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0030.

Definition 2.1. (Genericity)A continuously differentiable function g is called generic if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0031 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0032, and if all roots of g are in the interior of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0033.

Genericity of g implies that g has only a finite number of roots. Genericity in the sense of Definition 2.1 is commonly assumed in microeconomic theory; see the discussion in Mas-Colell et al. (1995, p. 593ff).

We propose the following inference procedure for the number of roots of g, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0034. First, estimate urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0035 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0036 using local linear m-regression:
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0037(2.3)
Here, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0038 for some (symmetric, positive) kernel function K integrating to one with bandwidth τ. Equation 2.3 is a sample analogue of 2.2, where a kernel weighted local average is replacing the conditional expectation. Next, calculate urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0039, where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0040 is defined as
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0041(2.4)
In this expression, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0042 for a Lipschitz continuous, positive symmetric kernel L integrating to one with bandwidth 1 and support [ − 1, 1]. The intuition for this expression will be discussed in detail below. Estimate the variance and bias of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0043 relative to Z using bootstrap. Finally, construct integer valued confidence sets for Z using t-statistics based on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0044 and the bootstrapped variance and bias.

2.2. Basic properties and consistency

The rest of this section will motivate and justify this procedure. First, we see that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0045 is a superconsistent estimator of Z, in the sense that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0046 for any diverging sequence urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0047, under i.i.d. sampling and conditions to be stated. Then, we present the central result of this paper, which establishes asymptotic normality of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0048 under a non-standard sequence of experiments. From this result, it follows that inference based on t-statistics, using bootstrapped standard errors and bias corrections, provides asymptotically valid confidence sets for Z. We also show that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0049 is an efficient estimator relative to the simple plug-in estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0050 under the non-standard asymptotic sequence.

We are mainly concerned with constructing confidence sets for Z, rather than a point estimator. A point estimator could be formed by projecting urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0051 on the closest integer. While urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0052 will be called an estimator of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0053, it should be kept in mind that its primary role is as an intermediate statistic in the construction of confidence sets.

The following proposition states that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0054 for generic g and ρ small enough. The two functionals only differ around non-generic g, or ‘bifurcation points’ (i.e. g where Z jumps). The functional urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0055 is a smooth approximation of Z which varies continuously around such jumps.

Proposition 2.1.For g continuously differentiable and generic, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0056 is small enough, then

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0057

The intuition underlying Proposition 2.1 is as follows. Given a generic function g, consider the subset of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0058 where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0059 is not zero. If ρ is small enough, this subset is partitioned into disjoint neighbourhoods of the roots of g, and g is monotonic in each of these neighbourhoods. A change of variables, setting urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0060, shows that the integral over each of these neighbourhoods equals one. Figure 1 illustrates the relationship between Z and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0061. For the functions g depicted, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0062, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0063, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0064 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0065. The two functionals are equal if g does not peak within the range urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0066, but if g does peak within the range urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0067, they are different and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0068 is not integer valued.

Details are in the caption following the image
Z and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0069.
It is useful to equip the space of continuously differentiable functions on the compact set urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0070, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0071, with the following norm:
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0072(2.5)
This is the uniform first-order Sobolev norm on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0073. Given this norm, we have the following proposition.

Proposition 2.2. (Local constancy)urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0074 is constant in a neighbourhood, with respect to the norm urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0075, of any generic function urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0076, and so is urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0077 if ρ is small enough.

Using a neighbourhood of g with respect to the sup norm in levels only, instead of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0078, is not enough for the assertion of Proposition 2.2 to hold. For any function g1 that has at least one root, we can find a function g2 arbitrarily close to g1 in the uniform sense, which has more roots than g1, by adding a ‘wiggle’ around a root of g1. Figure 2 illustrates. This figure shows two functions that are uniformly close in levels but not in derivatives, and which have different numbers of roots. However, if one additionally restricts the first derivative of g2 to be uniformly close to the the derivative of g1, additional wiggles are precluded around generic roots, because around these g1 has a non-zero derivative. Because derivatives are ‘harder’ to estimate than levels, variation in the estimated derivatives dominates the asymptotic distribution of estimators for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0079, as will be shown. Proposition 2.2 immediately implies the following theorem as a corollary. This theorem states that the plug-in estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0080 converges to a degenerate limiting distribution at an ‘infinite’ rate, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0081 converges with respect to the norm urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0082 (i.e. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0083 is equal to the true number of roots with probability converging to 1).

Theorem 2.1. (Superconsistency)If urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0084 converges uniformly in probability to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0085, if g is generic and if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0086 is some arbitrary diverging sequence, then

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0087
Furthermore, if ρ is small enough so that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0088 holds, then
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0089

Details are in the caption following the image
On the importance of wiggles.

This result implies that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0090 if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0091 as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0092.

2.3. Asymptotic normality and relative efficiency

We have shown our first claim, superconsistency of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0093 given uniform convergence of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0094. Next, we show our second claim, asymptotic normality of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0095 under a non-standard sequence of experiments. This section then concludes by formally stating the efficiency of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0096 relative to the simple plug-in estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0097. To further characterize the asymptotic distribution of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0098, we need a suitable approximation for the distribution of the first-stage estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0099. Kong et al. (2010) provide uniform Bahadur representations for local polynomial estimators of m-regressions. We state their result, for the special case of local linear m-regression, as an assumption.

Assumption 2.2. (Bahadur expansion)The estimation error of the estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0100urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0101 defined by 2.3 can be approximated by a local average as follows:

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0102(2.6)
Here, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0103 is the density of X, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0104, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0105 (in a piecewise derivative sense; m is assumed to be piecewise differentiable), urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0106, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0107 is a non-random matrix converging uniformly to the identity matrix, and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0108 uniformly in x.

The crucial part of Assumption 2.2 is the assumption that the remainder R is asymptotically negligible relative to the linear (sample mean) component of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0109. This assumption is only well defined in the context of a specific sequence of experiments. In Theorem 2.2, this assumption will be understood to hold relative to the sequence of experiments defined in Assumption 2.3. In the case of qth quantile regression, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0110 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0111. In the case of mean regression, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0112 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0113.

The asymptotic results in the remainder of this section depend on the availability of an expansion in the form of expansion 2.6 and the relative negligibility of the remainder, but not on any other specifics of local linear m-regression. This will allow for fairly straightforward generalizations of the baseline case considered here to the cases discussed in Section 3., as well as to other cases that are beyond the scope of this paper, once we have appropriate expansions for the first-stage estimators.

By Proposition 2.2, consistency of any plug-in estimator follows from uniform convergence of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0114. Such uniform convergence follows from Assumption 2.2, combined with a Glivenko–Cantelli theorem on uniform convergence of averages, assuming i.i.d. draws from the joint distribution of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0115 as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0116; see van der Vaart (1998), Chapter 19. Superconsistency of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0117 therefore follows, which implies that standard i.i.d. asymptotics with rescaling of the estimator yield only degenerate distributional approximations. This is because urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0118 and Z are constant in a C1 neighbourhood of any generic g, even though they jump at bifurcation points (i.e. non-generic g). As a consequence, all terms in a functional Taylor expansion of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0119, as a function of g, vanish, except for the remainder. The application of ‘delta method’ type arguments, as in Newey (1994), gives only the degenerate limit distribution.

In finite samples, however, the sampling variation of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0120 is, in general, not negligible, as the simulations of Appendix A confirm, which makes the distributional approximation of the degenerate limit useless for inference. Asymptotic statistical theory approximates the finite sample distribution of interest by a limiting distribution of a sequence of experiments, of which our actual experiment is an element. The choice of sequence is to some extent arbitrary; the standard sequence where observations are i.i.d. draws from a distribution, which does not change as n increases, is just one possibility. In econometrics, non-standard asymptotics are used, for instance, in the literature on weak instruments; see, e.g. Staiger and Stock (1997), Imbens and Wooldridge (2007) and Andrews and Cheng (2012). In the present set-up, a non-degenerate distributional limit of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0121 can only be obtained under a sequence of experiments, which yields a non-degenerate limiting distribution of the first-stage estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0122. We now consider asymptotics under such a sequence of experiments. The sequence we consider has increasing amounts of noise relative to signal as sample size increases.

Assumption 2.3.Experiments are indexed by n, and for the nth experiment we observe urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0123 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0124. The observations urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0125 are i.i.d. given n, and

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0126(2.7)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0127(2.8)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0128(2.9)
where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0129 is a real-valued sequence and
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0130

The last equality requires the criterion function m to be scale neutral. This holds for quantiles and the mean, in particular. For a given sample size n, this is the same model as before. As n changes, the function g identified by 2.2 is held constant. If urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0131 grows in n, the estimation problem in this sequence of models becomes increasingly difficult relative to i.i.d. sampling. Note that 2.9 does not describe an additive structural model, which would allow us to predict counterfactual outcomes. Instead, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0132 is simply the statistical residual, given by the difference of Y and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0133, which is also well defined for non-additive structural models.

Our next result, Theorem 2.2, assumes that the approximation of Assumption 2.2 holds under the non-standard sequence of experiments described by Assumption 2.3. Theorem 1 in Kong et al. (2010) implies that Assumption 2.2 holds under standard asymptotics and weak regularity conditions. Their result extends to our setting in a fairly straightforward way, however. This is most easily seen in the case of mean regression. We can write urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0134 as a sum of two terms: (a) urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0135; (b) urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0136. We can then apply the result of Kong et al. (2010) to local linear regression on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0137 of each of these terms separately. Both the Bahadur expansion and the local linear mean regression estimator are linear in Y. As a consequence, the remainder R for a regression of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0138 on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0139 is given by the sum of the two remainders corresponding to regression of terms (a) and (b) on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0140. Whichever of the two Bahadur expansions corresponding to (a) and (b) dominates the asymptotic distribution is thereby guaranteed to be of larger order than the sum of the two remainder terms. A similar logic applies more generally, for instance to the case of local linear quantile regression; a complete proof is beyond the scope of the present paper.

By Corollary 2.1, a necessary condition for a non-degenerate limit of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0141 is that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0142 converges to a non-degenerate limiting distribution. As is well known, and also follows from Assumption 2.2, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0143 converges at a slower rate than urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0144, so that asymptotically variation in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0145 will dominate, namely by adding ‘wiggles’ around the actual roots. If urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0146 in the sequence of experiments defined in Assumption 2.3, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0147 converges uniformly in probability to g, whereas urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0148 converges pointwise to a non-degenerate limit. This is the basis for the following theorem.

Theorem 2.2. (Asymptotic normality)Under Assumptions 2.1, 2.2 and 2.3, and if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0149, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0150, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0151 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0152, then there exist urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0153 and V such that

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0154
for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0155. Both μ and V depend on the data-generating process only via the asymptotic mean and variance of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0156 at the roots of g, which in turn depend upon urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0157, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0158, s and Varurn:x-wiley:13684221:media:ectj12043:ectj12043-math-0159 evaluated at the roots of g.

This theorem justifies the use of t-tests based on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0160 for null hypotheses of the form urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0161. The construction of a t-statistic requires a consistent estimator of V and an estimator of μ converging at a rate faster than urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0162. Based on the last part of Theorem 2.2, we can construct such estimators as follows. Any plug-in estimator that consistently estimates the (co)variances of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0163 under the given sequence of experiments consistently estimates μ and V. One such plug-in estimator is standard bootstrap (i.e. resampling from the empirical distribution function). The Bahadur expansion in Assumption 2.2, which approximates urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0164 by sample averages, implies that the bootstrap gives a resampling distribution with the asymptotically correct covariance structure for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0165. From this and Theorem 2.2, it then follows that the bootstrap gives consistent variance and bias estimates for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0166, where the bias is estimated from the difference of the resampling estimates relative to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0167. If sample size grows fast enough relative to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0168 and τ, the asymptotic validity of a standard normal approximation for the pivot follows.

It would be interesting to develop distributional refinements for this statistic using higher-order bootstrapping, along the lines discussed by Horowitz (2001). However, higher-order bootstrapping might be very computationally demanding in the present case, in particular if criteria such as quantile regression are used to identify g.

Theorem 2.2 also implies that increasing the bandwidth parameter ρ reduces the variance without affecting the bias in the limiting normal distribution. Asymptotically, the difficulty in estimating Z is driven entirely by fluctuations in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0169. These fluctuations lead both to upward bias and to variance in plug-in estimators. When ρ is larger, these fluctuations are averaged over a larger range of X, thereby reducing variance. Theorem 2.2 implies that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0170 is asymptotically inefficient relative to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0171 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0172. Furthermore, by Proposition 2.1, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0173 for all generic g. If the relative inefficiency carries over to the limit as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0174, it follows that the simple plug-in estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0175 is asymptotically inefficient relative to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0176. Note, however, that this is only a heuristic argument. We cannot exchange the limits with respect to ρ and with respect to n to obtain the limit distribution of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0177. The following theorem, which is fairly easy to show, states a formally correct version of this argument.

Theorem 2.3. (Asymptotic inefficiency of the naive plug-in estimator)Consider the set-up of Theorem 2.2, and assume urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0178. Then, as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0179,

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0180
and
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0181

From this theorem, it follows in particular that tests based on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0182 will, in general, not be consistent under the sequence of experiments considered (i.e. the probability of false acceptances does not go to zero). This stands in contrast to tests based on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0183.

2.4. Alternative approaches

The reader might wonder rightly whether there are alternative estimators that, like our urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0184, avoid the issues of the naive estimator (overestimating the number of roots, in particular), and that possibly beat urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0185 in terms of some notion of relative efficiency. One possible estimator that comes to mind is the ϱ-packing number of the set of roots of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0186, where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0187 slowly. The packing number is the largest integer z such that there are z disjoint balls of radius ϱ centred at roots of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0188.

The packing number is in fact closely related to our estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0189. For an appropriate scaling of ϱ, we can think of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0190 as smoothly interpolating the packing number. The following numerical illustration helps to make the point. Consider urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0191, and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0192. This function has four roots at a distance of 1/4 from each other, and has a maximum absolute value of 1. For this function g, consider both urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0193 and the packing number of g as a function of ρ (or ϱ). The result is plotted in Figure 3, which illustrates the relationship between urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0194 and the packing number of the set of roots of g for the function urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0195, by plotting both as a function of bandwidth. For comparability, we have scaled urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0196. As can be seen from this figure, both estimators behave similarly, with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0197 interpolating the jumps of the packing number. To the extent that smoother estimators are preferable in many contexts (see the literature on model selection versus shrinkage), it might be that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0198 is better behaved. A formalization of this heuristic argument, and a full development of the asymptotic theory of packing numbers, is beyond the scope of the present paper. One advantage of considering urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0199, which motivates our focus on this estimator rather than, for instance, the packing number, is that it allows for an easier development of asymptotic theory and of corresponding inference procedures, which are the main object of the present paper.

Details are in the caption following the image
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0200 and the packing number.

The reader might further wonder, rightly again, whether the sequence of experiments we chose in Assumption 2.3 is peculiar, and whether another sequence might give different answers. The problem of estimating urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0201 might be made more difficult not only by increasing the variance of the regression residuals, but also by letting the roots of g move closer to each other. Formally, we might consider urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0202 where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0203 are i.i.d. and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0204 for a diverging sequence urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0205. Such a sequence of experiments, however, effectively reduces to the setting of standard asymptotics once we substitute the bandwidth ρ by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0206, and account for the fact that effective sample size grows only at rate urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0207. This implies, in particular, that the superconsistency result of Theorem 2.1 also applies to this alternative sequence of experiments, which makes it unsuitable for inference.

3. EXTENSIONS AND APPLICATIONS

In this section, several extensions and applications of the results of Section 2. are presented. Sections 3.1.3.3. discuss, respectively, inference on Z if g is identified by more general moment conditions, inference on Z if the domain and range of g are multidimensional and inference on the number of stable and unstable roots. Sections 3.4. and 3.5. discuss identification and inference for the two applications mentioned in the introduction: static games of incomplete information and stochastic difference equations.

3.1. Conditioning on covariates

In the previous section, inference on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0208 was discussed for functions g identified by the moment condition
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0209
This subsection generalizes to functions g identified by
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0210(3.1)
where the parameter of interest now is urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0211, the number of roots of g in x given w1. The conditional moment restriction 3.1 can be rationalized by a structural model of the form urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0212, where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0213 and g is defined by
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0214
We assume that the joint density of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0215 is bounded away from zero on the set urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0216 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0217, where supp denotes the compact support of either random vector.

The vector W2 serves as a vector of control variables. The conditional independence assumption urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0218 is also known as ‘selection on observables’. The function g is equal to the average structural function if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0219, and equal to a quantile structural function if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0220. The average structural function will be of importance in the context of games of incomplete information, as discussed in Section 3.4.; quantile structural functions will be used to characterize stochastic difference equations in Section 3.5.. When games of incomplete information are discussed in Section 3.4., urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0221 will correspond to the component of public information, which is not excluded from either player's response function.

The inference procedure proposed in the previous section is based upon two steps. First, the function g and its derivative are estimated using local linear m-regression. In the second step, the estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0222 is plugged into the functional urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0223, which is a smooth approximation of the functional urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0224. We can generalize this approach by maintaining the same second step while using more general first-stage estimators urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0225. Equation 3.1 suggests estimating g by a non-parametric sample analogue, replacing the conditional expectation with a local linear kernel estimator of it, and the expectation over W2 with a sample average. Formally, let urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0226, where
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0227(3.2)

An asymptotic normality result can be shown in this context, which generalizes Theorem 2.2. In light of the proof of Theorem 2.2, the crucial step is to obtain a sequence of experiments such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0228 converges uniformly to g while urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0229 has a non-degenerate limiting distribution. If we obtain an approximation of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0230 equivalent to the approximation in Assumption 2.2, all further steps of the proof apply immediately. This can be done, using the results of Newey (1994), for the following sequence of experiments.

Assumption 3.1.Experiments are indexed by n, and for the nth experiment we observe urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0231 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0232. The observations urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0233 are i.i.d. given n, and

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0234(3.3)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0235(3.4)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0236(3.5)

Theorem 3.1. (Asymptotic normality, with control variables)Under the assumptions of Section 2., but with g identified by 3.1 and the data generated by the model given by Assumption 3.1, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0237, where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0238, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0239, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0240 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0241, then there exist urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0242 and V such that

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0243

3.2. Higher-dimensional systems

Thus far, only one-dimensional arguments x and one-dimensional ranges for the function g have been considered, where x is the argument over which urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0244 integrates. All results of Section 2. are easily extended to a higher-dimensional set-up. In particular, assume we are interested in the number of roots of a function g from urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0245 to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0246. Generalizing 2.4, we can define urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0247 as
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0248(3.6)
where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0249 are again estimated by local linear m regression, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0250 is a kernel with support urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0251, and the integral is taken over the set urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0252 in the support of g. As in the one-dimensional case, superconsistency follows from uniform convergence of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0253. The following theorem, generalizing Theorem 2.2, holds for arbitrary d.

Theorem 3.2. (Asymptotic normality, multidimensional systems)Under the assumptions of Section 2., but with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0254 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0255 defined by 3.6, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0256, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0257, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0258 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0259, then there exist urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0260 such that

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0261

3.3. Stable and unstable roots

Instead of testing for the total number of roots, one might be interested in the number of stable and unstable roots, Zs and Zu. Stable roots are those where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0262 is negative, and unstable roots are those where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0263 is positive:
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0264(3.7)
In the multidimensional case, we could more generally consider roots with a given number of positive and negative eigenvalues of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0265. We can define smooth approximations of the parameters Zs and Zu as follows:
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0266(3.8)

Again, all arguments of Section 2. go through essentially unchanged for these parameters. In particular, Theorem 2.2 applies literally, replacing Z with Zs or Zu.

More generally, functionals that are smooth approximations of the number of roots with various stability properties can be constructed in the multidimensional case by multiplying the integrand with an indicator function depending on the signs of the eigenvalues of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0267.

3.4. Static games of incomplete information

This section and Section 3.5. discuss how to apply the inference procedure proposed to test for equilibrium multiplicity in economic models. The discussion in this subsection builds on Bajari et al. (2010).

Consider the following static game of incomplete information. Assume there are two players urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0268, who both have to choose between one of two actions, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0269. Player i makes her choice based on public information s, as well as private information urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0270. The public information s is observed by the econometrician, and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0271 is independent of s. It is assumed that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0272 does not enter player i's utility. Denote the probability that player i plays strategy urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0273 given the public information s by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0274. Player i's expected utility given her information, and hence her optimal action urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0275, depends on s and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0276, as well as player urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0277's probability of choosing urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0278, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0279. Let us denote the average best response of player i, integrating over the distribution of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0280, by
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0281(3.9)
Figure 4 illustrates, by plotting the response functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0282 for a given s. The functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0283 are the (average) best response functions, Bayesian Nash equilibrium requires urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0284, and we observe one equilibrium urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0285 in the data. In this figure, there are two further equilibria that are not directly observable. In Bayesian Nash equilibrium, the probability of player i choosing urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0286, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0287, equals the average best response of player i, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0288. This implies the two equilibrium conditions
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0289
for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0290. In Figure 4, the Bayesian Nash equilibria correspond to the intersections of the graphs of the two urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0291. The condition for Bayesian Nash equilibrium in this game can be restated as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0292, where
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0293(3.10)
The number of roots of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0294 in σ1 is the number of Bayesian Nash equilibria in this game, given s.
Details are in the caption following the image
Response functions and Bayesian Nash equilibria.
We now discuss identification and inference on the number of Bayesian Nash equilibria of this game, given the public information s. Assume we observe an i.i.d. sample of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0295, the players' realized actions and the public information of the game, where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0296 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0297 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0298. In this subsection, i indexes players and j indexes observations. Rational expectation beliefs of player urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0299 about the expected action of player i are given by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0300. The following two-stage estimation procedure is a non-parametric variant of the procedure proposed by Bajari et al. (2010). We can get an estimate of the beliefs, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0301, by local linear mean regression.
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0302(3.11)
Average best responses of players are given by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0303. Without further restrictions, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0304 is not identified, because by definition σ is functionally dependent on s. If, however, exclusion restrictions of the form
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0305(3.12)
are imposed, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0306 can be identified. In particular, assume that exclusion restriction 3.12 holds, with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0307. There is one excluded component of s for each player, the remaining urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0308 components are not excluded from either response function urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0309. Assume furthermore that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0310 has full support [0, 1] given urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0311, for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0312. Under these assumptions, we can estimate the best response functions, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0313, again using local linear mean regression:
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0314(3.13)
Note that no functional form restrictions are needed for identification of the choice functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0315. This stands in contrast to Bajari et al. (2010), who need to impose such restrictions in order to be able to identify the underlying preferences. Recall that the condition for Bayesian Nash equilibrium in this game is given by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0316. Inserting urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0317 into urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0318, both estimated by 3.13, yields an estimator of g, which can be written as
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0319(3.14)
Based on this estimator, we can perform inference on the number of Bayesian Nash equilibria given s, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0320. In particular, let
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0321(3.15)
where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0322 is given by 3.14. The term urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0323 refers to the estimated derivative of g w.r.t. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0324, and similarly for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0325 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0326, so that
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0327(3.16)
Inference on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0328 can now proceed as before, if an asymptotic normality result similar to Theorem 2.2 can be shown. In the proof of Theorem 2.2, three properties of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0329 needed to be proven for the statement of the theorem to follow. First, under the given sequence of experiments, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0330 converges uniformly in probability to a degenerate limit. Second, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0331 converges in distribution to a non-degenerate limit. Third, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0332 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0333 are asymptotically independent for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0334. These properties can be shown for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0335 in the present case, with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0336 replacing x, for an appropriate choice of sequence of experiments, where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0337 is a scale parameter as before.
The choice of sequence of experiments may seem to be more complicated here than in the baseline case, because the dependent variable a is naturally bounded by [0, 1], so that increasing the residual variance would be inconsistent with the structural model. This is not a problem, however, if we note that the distribution of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0338, in the baseline model, is invariant to a proportional rescaling of Y, g and ρ. We can therefore define a sequence of experiments that is equivalent to the one defined by 2.72.9 if we replace 2.9 by
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0339
and ρ by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0340. Intuitively, shrinking the signal g is equivalent to increasing the noise urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0341. Returning to games of incomplete information, consider the following sequence of experiments.

Assumption 3.2.For urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0342, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0343 is continuously differentiable and monotonic in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0344, and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0345 denotes the inverse of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0346 with respect to the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0347 argument, given urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0348. Experiments are indexed by n, and for the nth experiment we observe urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0349 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0350. The observations urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0351 are i.i.d. given n and

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0352(3.17)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0353(3.18)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0354(3.19)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0355(3.20)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0356(3.21)

Equations 3.173.19 are the same as in the model we have been discussing so far. Equations 3.20 and 3.21 shrink the graphs of the best response functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0357 towards the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0358 line (compare Figure 4), parallel to the σ1 axis. Denote urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0359. We obtain
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0360
By 3.21, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0361, then urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0362, and hence
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0363(3.22)

Using this sequence of experiments, we can now state an asymptotic normality result, similar to Theorem 2.2, for static games of incomplete information. The statement of the theorem differs in two respects from the baseline case. First, ρ is replaced by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0364 in all expressions. Because this sequence of experiments shrinks g rather than expanding the error, the bandwidth ρ must also shrink correspondingly. Second, the rate of growth of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0365 is smaller. Because all regressions are controlling for s1 or s2, rates of convergence are slower. In particular, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0366 converges to a non-degenerate limit iff urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0367, where k is the dimensionality of the support of the response functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0368, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0369.

Theorem 3.3. (Asymptotic normality, static games of incomplete information)Under the sequence of experiments defined by Assumption 3.2, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0370 uniformly in the Bahadur expansions as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0371, and if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0372, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0373, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0374 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0375, then there exist urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0376 and V such that

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0377

3.5. Stochastic difference equations

In this subsection, we discuss the identification and interpretation of the number of roots of g for stochastic difference equations of the form
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0378(3.23)
Interest in such difference equations is motivated by the study of neighbourhood composition dynamics in Card et al. (2008). This discussion will form the basis of the empirical application in Section 4.. The results of this subsection suggest that, if the stochastic difference equation 3.23 had multiple equilibria, then we should expect to find multiple roots in cross-sectional quantile regressions of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0379 on X. The notion of multiple equilibria here has to be generalized to the notion of multiple equilibrium regions.

The intuition for this claim is as follows. Holding ε constant, the number of roots of g in X is the number of equilibria of the difference equation 3.23. If ε is stochastic, then the number of roots can still serve to characterize qualitative dynamics in terms of equilibrium regions. This is shown in Figure 5, which illustrates the characterization of dynamics derived in this section. In this figure, gU and gL are upper and lower envelopes of g for a sequence of realizations of ε. There are ranges of X in which the sign of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0380 does not depend on ε. This implies that in these ranges X moves towards the equilibrium regions, which are the regions in which the roots of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0381 lie. Equilibrium regions correspond to the dashed segments of the X-axis, the basin of attraction of the lower equilibrium region is given by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0382 and the basin of attraction of the upper equilibrium region is urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0383.

Details are in the caption following the image
Qualitative dynamics of stochastic difference equations.

How is the joint distribution of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0384 related to the transition function g? Unobserved heterogeneity, which is positively related over time, leads to an upward bias in quantile regression slopes relative to the corresponding structural slopes. To show this, denote the qth conditional quantile of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0385 given X by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0386, the conditional cumulative distribution function at Q by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0387, and the conditional probability density by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0388. The following lemma shows that quantile regressions of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0389 on X yield biased slopes relative to the structural slope urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0390, if X is not exogenous. The second term in 3.24 reflects the bias due to statistical dependence between X and ε.

Lemma 3.1. (Bias in quantile regression slopes)If urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0391, and if Q and F are differentiable with respect to the conditioning argument X, then

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0392(3.24)

The following assumption of first-order stochastic dominance states that there is no negative dependence between current urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0393, evaluated at fixed urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0394, and current X.

Assumption 3.3. (First-order stochastic dominance)urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0395 is non-increasing as a function of X, holding urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0396 constant.

Violation of this assumption would require some underlying cyclical dynamics, in continuous time, with a frequency close enough to half the frequency of observation, or more generally with a ratio of frequencies that is an odd number divided by two. It seems safe to discard this possibility in most applications. This assumption might not hold, for instance, if outcomes were influenced by seasonal factors and observations were semi-annual.

We can now formally state the claim that, if there are unstable equilibria structurally, then quantile regressions should exhibit multiple roots.

Proposition 3.1. (Unstable equilibria in dynamics and quantile regressions)Assume that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0397 and that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0398, and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0399 for all ε. If Assumption 3.3 holds and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0400 has only one root X for all q, then the conditional average structural functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0401, as functions of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0402, are stable at the roots m:

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0403
for all X, where (0, X) is in the support of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0404.

This proposition assumes global stability of g (i.e. X does not diverge to infinity). Under such global stability, if there is only one root of g, then this root is stable. According to this proposition, if quantile regressions only have one stable root, then the same is true for the conditional average structural functions. This is not conclusive, but it is suggestive that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0405 themselves have only one root.

Let us now turn to the implications of the number of roots of g for the qualitative dynamics of the stochastic difference equation 3.23. Let urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0406. If g describes a structural relationship, the counterfactual time path under ‘manipulated’ initial condition urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0407 is given by
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0408(3.25)
Given the initial condition urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0409 and shocks urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0410, 3.23 describes a time inhomogeneous deterministic difference equation. The following argument makes statements about the qualitative behaviour of this difference equation based on properties of the function g, in particular based on the number of roots in x of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0411 for given unobservables urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0412. Consider Figure 5, which shows gU and gL defined by
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0413(3.26)
urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0414(3.27)
The functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0415 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0416 are the upper and lower envelopes of the family of functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0417 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0418. The direction of movement of X over time does not depend on s in the ranges where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0419 or urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0420 (which is where the horizontal axis is drawn solid in Figure 5), because the sign of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0421 does not depend on s in these ranges. In other words, suppose we start off with an initial value below x1 in the picture. If that is the case, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0422 will converge monotonically toward the left-hand dashed range and then remain within that range for all urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0423. Similarly, for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0424 in the upper ‘basin of attraction’ beyond x2, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0425 will converge to the upper ‘equilibrium range’ given by the right-hand dashed range. Hence, small changes of initial conditions (from x1 to x2) can have large and persistent effects on X in this case, in contrast to the case where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0426 only has one stable root for all ε. These arguments are summarized in the following proposition.

Proposition 3.2. (Characterizing dynamics of stochastic difference equations)Assume that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0427 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0428, defined by (3.26) and (3.27), are smooth and generic, positive for sufficiently small x and negative for sufficiently large x, and have the same number z of roots, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0429 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0430, and let urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0431, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0432. Define the following mutually disjoint ranges:

urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0433
Then, all urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0434 are negative on the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0435, and positive on the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0436. Furthermore, all urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0437 are negative in a neighbourhood to the right of the maximum of the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0438 and positive to the left of the minimum, and the reverse holds for the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0439. Therefore, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0440 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0441, then urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0442 will converge monotonically toward urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0443 and then remain within urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0444. If urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0445 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0446, then urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0447 will converge monotonically toward urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0448 and then remain within urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0449.

Assuming non-emptiness of these ranges, the interval urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0450 is a basin of attraction for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0451 (i.e. X in this interval converges monotonically to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0452 and then remains there). The main difference relative to the deterministic, time homogenous case is the blurring of the stable equilibrium to a stable set urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0453.

We did not make any assumptions on the joint distribution of the unobserved factors urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0454. The whole argument of the preceding theorem is conditional on these factors. However, the predictions of the theorem will be sharper (given g) if serial dependence of unobserved factors is stronger, increasing the number of units i to which the assertion is applicable and reducing the size of the intervals urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0455 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0456, because urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0457 is going to be smaller on average.

In summary, Proposition 3.1 implies that, if we do not find multiple roots in quantile regressions, then the conditional average structural functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0458 do not have multiple roots. Proposition 3.2 implies that, if upper and lower envelopes of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0459 do not have multiple roots, then the dynamics of the system are stable and initial conditions do not matter in the long run.

4. APPLICATION TO THE DYNAMICS OF NEIGHBOURHOOD COMPOSITION

This section analyses the dynamics of minority share in a neighbourhood, applying the methods developed in the last two sections to the data used for analysis of neighbourhood composition dynamics by Card et al. (2008). They study whether preferences over neighbourhood composition lead to a ‘white flight’, once the minority share in a neighbourhood exceeds a certain level. They argue that such ‘tipping’ behaviour implies discontinuities in the change of neighbourhood composition over time as a function of initial composition, and they test for the presence of such discontinuities in cross-sectional regressions over different neighbourhoods in a given city. This argument is based on the theoretical models of Becker and Murphy (2000), which do not allow for individual heterogeneity and consider infinite time horizons. The present paper argues that, if we allow for heterogeneity and finite time, and if tipping does take place, then we should expect multiple roots rather than discontinuities. Kasy (2015) discusses a search-matching model of the housing market with social externalites, which has this implication.

Card et al. (2008) provided full access to their datasets, which allows us to use identical samples and variable definitions as in their work. The dataset is an extract from the Neighbourhood Change Database (NCDB), which aggregates US census variables to the level of census tracts. Tract definitions are changing between census waves but the NCDB matches observations from the same geographic area over time, thus allowing observation of the development over several decades of the universe of US neighbourhoods. In the dataset used by Card et al. (2008), all rural tracts are dropped, as well as all tracts with population below 200 and tracts that grew by more than five standard deviations above the metropolitan statistical area (MSA) mean. The definition of MSA used is the MSAPMA from the NCDB, which is equal to a ‘primary metropolitan statistical area’ if the tract lies in one of those, and equal to the MSA it lies otherwise. For further details on sample selection and variable definition, see Card et al. (2008).

The graphs and tables to be discussed are constructed as follows. For each of the MSAs and each of the decades separately, we run local linear quantile regressions of the change in minority share of a neighbourhood (tract) on minority share at the beginning of the decade. This is done for the quantiles 0.2, 0.5 and 0.8, with a bandwidth τ of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0460, where n is the sample size. Figure 6 shows local linear quantile regressions of the change in minority share (left column) and of the change in white population relative to initial population (right column) on initial minority share for the quantiles 0.2, 0.5 and 0.8. The figures do not show confidence bands. The figure plots these quantile regressions for the three largest MSAs. For each of the regressions, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0461 is calculated, where ρ is chosen as 0.04. The integral in the expression for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0462 is taken over the interval [0, 1], intersected with the support of initial minority share if the latter is smaller. Note that it is possible to find no (stable) equilibrium for an MSA (i.e. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0463), if high initial minority shares do not occur in that MSA and most neighbourhoods experienced growing minority shares. Figure 7 shows kernel density plots of the regressor, the initial minority share across neighbourhoods, which suggest that support problems are not an issue, at least for the largest MSAs. For each urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0464, bootstrap standard errors and bias are calculated, as well as the corresponding t-test statistics for the null hypothesis urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0465, implying an integer-valued confidence set (of level 0.05) for z. By the results of Section 2., these confidence sets have an asymptotic coverage probability of 95%. By the Monte Carlo evidence of Appendix A, they are likely to be conservative (i.e. have a larger coverage probability). If the confidence sets thus obtained are empty, the two neighbouring integers of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0466 are included in the intervals shown. This makes inference even more conservative. Table 1 shows the resulting confidence sets for the 12 largest MSAs in the United States (by 2009 population), for all quantiles and decades under consideration.

Details are in the caption following the image
Quantile regressions of the changes in minority share and white population.
Details are in the caption following the image
Density of minority share across neighbourhoods.
Table 1. 0.95 confidence sets for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0467 by decade and quantile, change in minority share
1970s 1980s 1990s
MSA urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0468 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0469 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0470 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0471 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0472 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0473 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0474 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0475 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0476
New York, NY PMSA [0,1] [0,1] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0]
Los Angeles-Long Beach, CA PMSA [1,1] [1,1] [0,1] [0,1] [0,1] [0,1] [1,1] [1,1] [0,0]
Chicago, IL PMSA [0,1] [0,1] [0,1] [2,2] [0,1] [0,1] [1,1] [0,1] [0,0]
Dallas, TX PMSA [1,2] [1,1] [0,0] [0,1] [0,0] [0,0] [0,1] [0,1] [0,0]
Philadelphia, PA-NJ PMSA [1,2] [0,1] [0,1] [1,1] [0,1] [0,1] [1,1] [0,1] [0,0]
Houston, TX PMSA [1,1] [0,0] [0,0] [1,2] [0,1] [0,0] [0,1] [0,0] [0,0]
Miami, FL PMSA [0,1] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0] [0,0]
Washington, DC-MD-VA-WV PMSA [0,1] [0,0] [0,0] [1,1] [0,1] [0,0] [1,1] [0,1] [0,0]
Atlanta, GA MSA [1,1] [1,1] [0,0] [2,3] [0,0] [0,0] [0,0] [0,0] [0,0]
Boston, MA-NH PMSA [0,1] [0,1] [0,1] [0,1] [0,1] [0,0] [1,1] [0,0] [0,1]
Detroit, MI PMSA [1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,0]
Phoenix-Mesa, AZ MSA [1,1] [0,0] [0,0] [1,1] [0,1] [0,0] [1,1] [0,1] [0,0]
San Francisco, CA PMSA [1,1] [0,1] [0,1] [0,0] [0,1] [0,0] [1,1] [0,0] [0,0]

Note

  • The table shows confidence intervals in the integers for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0477 for the 12 largest MSAs of the United States, ordered by population, where g is estimated by quantile regression of the change in minority share over a decade on the initial minority share for the quantiles 0.2, 0.5 and 0.8. Regression bandwidth urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0781 is urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0478, and σ is chosen as 0.04. Confidence sets are based on t-statistics using bootstrapped bias and standard errors.

As can be seen from the table, in very few cases there is evidence of Z exceeding 1. In all cases shown, except for the 0.2 quantile for Atlanta in the 1980s, we can reject the null urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0479. Similar patterns hold for almost all of the 118 cities in the dataset. Rather than exhibiting multiple equilibria, the data indicate a general rise in minority share that is largest for neighbourhoods with intermediate initial share, but not to the extent of leading to tipping behaviour. Proposition 3.1 suggests that, if we do not find multiple roots in quantile regressions, we can reject multiple equilibria in the underlying structural relationship. I take these results as indicative that tipping is not a widespread phenomenon in US ethnic neighbourhood composition over the decades under consideration. This stands in contrast to the conclusion of Card et al. (2008), who do find evidence of tipping.

The approach used here differs from the main analysis in Card et al. (2008) in a number of ways. Card et al. (2008) (a) use polynomial least-squares regression with a discontinuity. They (b) use a split sample method to test for the presence of a discontinuity, and they (c) regress the change in the non-Hispanic, white population, divided by initial neighbourhood population, on initial minority share. We (a) use local linear quantile regression without a discontinuity, we (b) run the regressions on full samples for each MSA and test for the number of roots, and we (c) regress the change in minority share on initial minority share.

To check whether the differing results are due to variable choice (c) rather than testing procedure, the left column of Figure 6 and Table 1 are replicated using the change in the non-Hispanic, white population relative to initial population as the dependent variable, as did Card et al. (2008). The right column of Figure 6 shows such quantile regressions. These figures correspond to the ones in Card et al. (2008, p. 190), using the same variables but a different regression method and the full samples. Table 2 shows confidence sets for the number of roots of these regressions for the 12 largest MSAs. In comparing Tables 1 and 2, note that there is a correspondence between the lower quantiles of the first (low increase in minority share) and the upper quantiles of the latter (higher increase/lower decrease of white population). The two tables show fairly similar results. Again, no systematic evidence of multiple roots is found.

Table 2. 0.95 confidence sets for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0480 by decade and quantile, change in white population
1970s 1980s 1990s
MSA urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0481 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0482 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0483 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0484 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0485 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0486 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0487 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0488 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0489
New York, NY PMSA [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
Los Angeles-Long Beach, CA PMSA [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
Chicago, IL PMSA [0,1] [0,1] [0,1] [0,0] [0,1] [1,1] [0,1] [0,1] [0,1]
Dallas, TX PMSA [0,1] [0,1] [0,1] [0,0] [1,1] [0,2] [0,1] [1,1] [0,1]
Philadelphia, PA-NJ PMSA [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [1,1]
Houston, TX PMSA [0,1] [0,1] [0,1] [1,1] [1,1] [1,1] [0,1] [0,1] [0,1]
Miami, FL PMSA [0,1] [0,1] [0,1] [0,0] [0,0] [1,1] [1,1] [1,1] [1,1]
Washington, DC-MD-VA-WV PMSA [0,1] [0,0] [0,1] [0,0] [1,1] [0,0] [0,1] [0,1] [0,1]
Atlanta, GA MSA [0,1] [1,1] [0,1] [1,1] [1,1] [1,1] [1,1] [1,2] [0,1]
Boston, MA-NH PMSA [0,1] [0,1] [0,1] [0,0] [0,0] [1,1] [0,0] [0,1] [0,1]
Detroit, MI PMSA [0,1] [0,1] [0,1] [0,0] [0,0] [1,1] [0,1] [0,1] [0,1]
Phoenix-Mesa, AZ MSA [0,1] [0,1] [0,1] [0,0] [1,1] [0,0] [0,1] [0,1] [0,1]
San Francisco, CA PMSA [0,1] [0,1] [0,1] [0,0] [0,0] [0,0] [0,0] [1,1] [0,0]

Note

  • The table shows confidence intervals in the integers for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0490 for the 12 largest MSAs of the United States, ordered by population, where g is estimated by quantile regression of the change in the non-Hispanic, white population over a decade, divided by initial total population, on the initial minority share for the quantiles 0.2, 0.5 and 0.8. Regression bandwidth urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0782 is urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0491, and σ is chosen as 0.05 times the maximal change. Confidence sets are based on t-statistics using bootstrapped bias and standard errors.

Some factors might lead to a bias in the estimated number of equilibria, using the methods developed here. First, the test might be sensitive to the chosen range of integration if there are roots near the boundary. If a root lies right on the boundary of the chosen range of integration, it enters urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0492 as 1/2 only. Extending the range of integration beyond the unit interval, however, might also lead to an upward bias in the estimated number of roots, if extrapolated regression functions intersect with the horizontal axis. Second, choosing a bandwidth parameter ρ that is too large might bias the estimated number of equilibria downwards, if the function g peaks within the range urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0493. Third, there might be roots of g in the unit interval but beyond the support of the data.

5. SUMMARY AND CONCLUSION

This paper proposes an inference procedure for the number of roots of functions non-parametrically identified using conditional moment restrictions, and develops the corresponding asymptotic theory. In particular, it is shown that a smoothed plug-in estimator of the number of roots is superconsistent under i.i.d. asymptotics, but asymptotically normal under non-standard asymptotics, and asymptotically efficient relative to a simple plug-in estimator. In Section 3., these results are extended to cover various more general cases, allowing for covariates as controls, higher-dimensional domain and range, and for inference on the number of equilibria with various stability properties. This section also discusses how to apply the results to static games of incomplete information and to stochastic difference equations. In an application of the methods developed here to data on neighbourhood composition dynamics in the United States, no evidence of multiple of equilibria is found.

The inference procedure can also be used to test for bifurcations (i.e. (dis)appearing equilibria as a function of changing exogenous covariates). It is easy to test the hypothesis urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0494, because the corresponding estimators urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0495 are independent for W1 and W2 further apart than twice the bandwidth τ. If there are bifurcations, small exogenous shifts might have a large (discontinuous) effect on the equilibrium attained, if the ‘old’ equilibrium disappears.

In the dynamic set-up, one might furthermore consider to apply the procedure to detrended data (e.g. by demeaning urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0496). It seems likely that regressions of detrended data have a higher number of roots. The rationale of such an approach could be found in underlying models in which the dynamics of a detrended variable are stationary. This is, in particular, the case in Solow-type growth models, in which GDP or capital stock is stationary after normalizing by a technological growth factor.

Finally, it might also be interesting to extend the results obtained here to cover further cases where g cannot be directly estimated using conditional moment restrictions. The crucial step for such extensions, as illustrated by the various cases discussed in Section 3., is to find a sequence of experiments such that the first-stage estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0497 converges in probability to a degenerate limit whereas urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0498 converges in distribution to a non-degenerate limit. Furthermore, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0499 needs to be asymptotically independent of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0500 for all urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0501. There are many potential applications of the results obtained here, where it might be interesting to know whether the underlying dynamics or strategic interactions imply multiple equilibria. Examples include household level poverty traps, intergenerational mobility, efficiency wages, macro models of economic growth (as analysed in the online Appendix), financial market bubbles (herding), market entry and social norms.

ACKNOWLEDGEMENTS

I thank seminar participants at UC Berkeley, UCLA, USC, Brown, NYU, UPenn, LSE, UCL, Sciences Po, TSE, Mannheim and IHS Vienna for their helpful comments and suggestions. I particularly thank Tim Armstrong, David Card, Kiril Datchev, Victor Chernozhukov, Jinyong Hahn, Michael Jansson, Bryan Graham, Susanne Kimm, Patrick Kline, Rosa Matzkin, Enrico Moretti, Denis Nekipelov, James Powell, Alexander Rothenberg, Jesse Rothstein, James Stock and Mark van der Laan for many valuable discussions, and David Card, Alexander Mas and Jesse Rothstein for the access provided to their data. This work was supported by a DOC fellowship from the Austrian Academy of Sciences at the Department of Economics, UC Berkeley.

    Appendix A: MONTE CARLO EVIDENCE

    This section presents simulation results to check the accuracy in finite samples of the asymptotic approximations obtained in Theorem 2.2. In all simulations, X are i.i.d. draws of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0503 random variables, and the additive errors γ are either uniformly or normally distributed,
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0504(A.1)
    where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0505 is an appropriately centred and scaled uniform or normal distribution. Two functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0506 are considered, the first with one root and the second with three roots:
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0507

    The function g is estimated by median regression, mean regression and 0.9 quantile regression, where the γ in the simulations are shifted appropriately to have median, mean or 0.9 quantile at the respective g. Figures A.1–A.3 and Table A.1 show sequences of four experiments with 400, 800, 1,600 and 3,200 observations. These models are chosen to be comparable to the empirical application discussed in Section 4.. The variance of γ in each experiment is chosen to yield the same variance for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0508, as implied by the asymptotic approximation of the Bahadur expansion, in all experiments for a given g. By the proof of Theorem 2.2, we should therefore obtain similar simulation results across all set-ups. Furthermore, the variance of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0509 should be constant up to a factor urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0510. The parameters of these simulations are chosen to lie in an intermediate range where variation in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0511 is existent but moderate.

    Figure A.1 shows density plots for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0512 from the sequences of Monte Carlo experiments with uniform errors and g identified by median regression, as described in this appendix; in the online Appendix, similar figures are presented for the other experiments. The upper graph shows the distribution from four experiments with increasing sample size n and correspondingly growing variance of the residual γ, where the true parameter Z equals one. The same holds for the lower graph, except that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0513. As predicted by Theorem 2.2, biases are positive, and both bias and variance are decreasing in n. Figure A.2 shows the distribution of the naive plug-in estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0514, from the same simulations as in Figure A.1. It was shown in Section 2. that this estimator is asymptotically inefficient relative to the smoothed plug-in estimator. This relative inefficiency is reflected in a larger dispersion in the simulations, as can be seen by comparing Figures A.1 and A.2. Figure A.3 shows density plots for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0515, normalized by its sample mean and standard deviation, from the same simulations as in Figure A.1. It also shows, as a reference, the density of a standard normal. These plots suggest that the sample distribution of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0516 is somewhat right-skewed relative to a normal distribution.

    Details are in the caption following the image
    Density of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0517 in Monte Carlo experiments.
    Details are in the caption following the image
    Distribution of simple plug-in estimator urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0518 in Monte Carlo experiments.
    Details are in the caption following the image
    Density of normalized urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0519 in Monte Carlo experiments.

    Table 3 shows the results of simulations using bootstrapped standard deviations and biases, for mean regression with uniform errors. The results show, for the range of experiments considered, that rejection frequencies are lower than the 0.05 value implied by asymptotic theory. If this pattern generalizes, inference based upon the t-statistic proposed in this paper is conservative in finite samples. In particular, it seems that bootstrapped standard errors are too large.

    Table 3. Monte Carlo rejection probabilities
    n τ r urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0520 urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0521
    400 0.065 0.179 0.05 0.01
    800 0.059 0.194 0.03 0.02
    1,600 0.055 0.231 0.02 0.01
    3,200 0.052 0.290 0.02 0.01
    400 0.065 0.268 0.03 0.02
    800 0.059 0.292 0.01 0.02
    1,600 0.055 0.347 0.01 0.01
    3,200 0.052 0.434 0.01 0.02

    Note

    • This table shows the frequency of rejection of the null under a test of asymptotic level 5%, for the sequences of Monte Carlo experiments described in Appendix A. The g are estimated by mean regression, the errors are uniformly distributed, and the first four experiments are generated using g1 with one root, the next four using g2 with three roots. The columns show sample size, regression bandwidth, error standard deviation and the rejection probabilities of one-sided tests, respectively.

    Appendix B: PROOFS

    Proof of Proposition 2.1.By continuity of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0522 as well as genericity of g, we can choose ρ small enough such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0523 is constantly equal to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0524 in each of the neighbourhoods of the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0525 roots of g, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0526, defined by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0527. Hence, we can write the integral urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0528 as a sum of integrals over these neighbourhoods, in each of which there is exactly one root. Assume w.l.o.g. that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0529 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0530 is constant in the range of x where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0531. Then, by a change of variables setting urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0532,

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0533

    Proof of Proposition 2.2.We need to find ε such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0534 implies urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0535. By genericity of g, each root urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0536 of g is such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0537. By continuous first derivatives, we can then find δ such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0538 is constant in the neighbourhood urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0539 of each of the finitely many roots urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0540 and the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0541 are mutually disjoint. By continuity of g,

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0542(B.1)
    and
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0543(B.2)
    where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0544 is the closure of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0545. Choosing urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0546 fulfils our purpose. To see this, choose urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0547 such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0548. For urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0549, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0550 is bounded away from zero by B.1. In urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0551, there must be exactly one x such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0552: Because urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0553 are mutually disjoint, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0554, by B.1 again urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0555 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0556, and finally the sign of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0557 is constantly equal to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0558 in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0559 by B.2.

    The assertion for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0560 follows now from the first part of this proof, combined with Proposition 2.1, if we can choose ρ independent of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0561 such that Proposition 2.1 applies. Sufficient for this is ρ that separates roots. Choosing urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0562 accomplishes this. By B.1, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0563 will separate the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0564, and by the previous argument each of the urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0565 will contain exactly one root of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0566.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0567

    Proof of Theorem 2.2.We use urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0568 to denote a sequence of approximations to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0569. Write urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0570, if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0571 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0572 have the same non-degenerate distributional limit for some non-random sequences urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0573 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0574. In particular, as long as such sequences exist that guarantee convergence to a non-degenerate limit, this is implied by equality up to a remainder, which is asymptotically negligible under the given sequence of experiments (i.e. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0575 if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0576).

    1. Approximation of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0577 with g

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0578
    The remainder of this approximation is given by
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0579
    Negligibility of this remainder follows if we can show uniform convergence of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0580 at a rate faster than ρ under our sequence of experiments. Under the given sequence of experiments, the variance of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0581 is of order urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0582 – this follows from the Bahadur expansion. Because we have assumed urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0583, we obtain urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0584, which implies pointwise convergence. Pointwise convergence at rate τ implies uniform convergence at the slightly slower rate urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0585, which is faster than ρ by our assumptions on τ and ρ; for background on uniform convergence of kernel estimators, see, e.g. Appendix A.1 of Armstrong (2014).

    The fact that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0586 implies that the remainder urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0587 is of the same order. To see this, note that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0588 is urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0589, so that the remainder is of the same order as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0590. The integrand of this expression is non-zero only in a neighbourhood of size of order ρ of the roots of g; the difference urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0591 is of order urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0592 because urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0593 is Lipschitz with constant urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0594, so that the claim follows.

    Thus, we have shown that the remainder urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0595 is of order urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0596; this is smaller than the order of the leading term of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0597, which we show to be urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0598. The remainder is thus asymptotically negligible.

    From the approximation urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0599 we immediately obtain urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0600 if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0601, because in that case urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0602 for ρ small enough. The claim of Theorem 2.2 is thus trivially satisfied for the case urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0603, and we assume urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0604 for the rest of this proof.

    2. Approximation of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0605 by the Bahadur expansion.

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0606
    The absolute value of the remainder of this approximation is less than or equal to
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0607
    where R is the remainder of the Bahadur expansion. Negligibility of the remainder of the approximation is a consequence of the assumption that the remainder of the Bahadur expansion is negligible (i.e. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0608 uniformly in x).

    3. Restriction to one root at 0 and Taylor approximations

    Assume that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0609 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0610 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0611 (i.e. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0612). This is without loss of generality, because the integral for the general case is simply a sum of the independent integrals in a neighbourhood of each root.

    Now define urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0613, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0614, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0615 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0616.

    By replacing g with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0617 in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0618 and replacing urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0619 with w, both justified by smoothness and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0620, as well as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0621 uniformly, we obtain
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0622
    where we use urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0623 to denote the sample average. The absolute value of the remainder of this approximation is less than or equal to
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0624
    Here, we use ∑ and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0625 as shorthand for the sum and sample average of the previous display. Both terms in this expression go to 0 as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0626. We can assume furthermore that
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0627
    conditional on falling in this interval, and that
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0628
    These assumptions are justified by another Taylor approximation, this time of the distribution functions urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0629 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0630, assuming both distribution functions to be differentiable at 0. To see that this approximation is justified, note that distributional convergence to the same limit is equivalent to convergence of the expectations of any Lipschitz continuous bounded function of the statistics to the same limit. The difference in expectations between a function h of Z2 and of its approximation using conditionally uniform X and i.i.d. ϕ is given by
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0631
    This integral goes to 0 because the support of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0632 in X is a neighbourhood of 0 shrinking to 0.

    4. Partitioning the range of integration

    Partition urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0633 into subintervals urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0634, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0635 with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0636. Then
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0637
    with
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0638
    The remainder of this approximation is given by
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0639
    This approximation is warranted by Lipschitz continuity of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0640 with a Lipschitz constant of order urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0641, and by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0642.

    5. Poisson approximation

    The following argument essentially replaces the number of X falling into the interval urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0643, which is approximately distributed urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0644, with a Poisson random variable with parameter urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0645; the distribution of everything else conditional on this number remains the same.

    Let urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0646 be distributed i.i.d. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0647 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0648. This is an approximation to the number of X falling into the bin urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0649. Draw urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0650 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0651 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0652 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0653. Now define
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0654
    Then
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0655
    where urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0656 are identically distributed and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0657 is independent of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0658 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0659.
    Conditional on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0660, the equality is exact. The exact distribution of the number of observations falling in the interval urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0661, corresponding to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0662, would be given by
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0663
    The Poisson approximation sets the latter part of this expression to a constant in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0664. This is justified by the usual arguments deriving the Poisson distribution as a limit of Binomial distributions. The approximation of Z3 follows by an argument similar to the second part of step 3 in this proof, once we note that the multinomial probability mass function converges uniformly.

    6. Moments of the integrals over the subintervals

    • (a)

      urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0665.

    • (b)

      urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0666.

    • (c)

      urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0667.

    • (d)

      urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0668.

    These equations follow from noting first pointwise convergence to normality of
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0669
    under our sequence of experiments. This is the point where the rate urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0670 matters:
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0671
    Here, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0672 are i.i.d. urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0673. Now asymptotic normality follows by noting
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0674
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0675 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0676. Similarly
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0677
    Second, a change of the order of integration and the limit in n delivers the claims, where this change of order is justifiable by the dominated convergence theorem. For instance,
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0678

    7. Central limit theorem applied to the sum of integrals over the subintervals

    Now apply a central limit theorem for m-dependent sequences to the sum of integrals. For a definition of m-dependence, see Hoeffding and Robbins (1994). Note that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0679 is an m-dependent sequence with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0680. We have
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0681
    Asymptotic normality for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0682 follows, and by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0683, the same holds for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0684. Furthermore, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0685, and hence so is urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0686.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0687

    Proof of Theorem 2.3.Fix one of the roots x0 of g. By the arguments of the proof of Theorem 2.2, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0688 (not to be confused with urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0689) converges to a non-degenerate normal distribution for all x. In particular,

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0690
    By uniform convergence in levels of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0691 and the intermediate value theorem (compare also Figure 2),
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0692
    This proofs the first claim. The second claim now immediately follows from urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0693.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0694

    Proof of Theorem 3.1 (Sketch): We approximate urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0695 by a criterion function that has the form of 2.3 (i.e. a local weighted average over the empirical distribution of some objective function). Based on this approximation, we can then again apply the results of Kong et al. (2010). Newey (1994) provides a set of results that facilitate such approximations of partial means. In particular, Lemma 5.4 in Newey (1994) allows derivation of the required approximation by replacing the outer sum over j in 3.2 with an expectation, and by linearizing the fraction inside. The first replacement is asymptotically warranted because the variation created by averaging over the empirical distribution is of order urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0696 and is hence dominated by the variation in the non-parametric component. The second replacement follows from differentiability and requires, in particular, that the denominator of the fraction be asymptotically bounded away from zero. This is guaranteed by the requirement that W2 has full conditional support given urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0697. Formally, Lemma 5.4 in Newey (1994) gives
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0698
    where
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0699(B.3)
    This approximation of the objective function has the general form assumed in Kong et al. (2010) if we set
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0700(B.4)
    providing us with the desired Bahadur expansion. Choosing the appropriate sequence of experiments, from here on the entire proof and result of Theorem 2.2 go through unchanged. If urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0701 const, the rates have to be adapted as follows. The number of observations within each rectangle of size urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0702 goes to ∞ if urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0703. Finally, the variance of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0704 converges iff urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0705.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0706

    Proof of Theorem 3.2.The proof requires the following modifications relative to the one-dimensional case. Assumption 2.2 is still applicable, where the only difference in the d-dimensional case is that 2.6 has to be multiplied by urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0707. For urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0708 to have a pointwise non-degenerate distributional limit, we have to choose the rate urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0709 to equal urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0710, which is slower for higher d. To see this, note that Varurn:x-wiley:13684221:media:ectj12043:ectj12043-math-0711. Here, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0712 is Lipschitz continuous of order urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0713, so that we require urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0714 for step 4 of the proof of Theorem 2.2. The range of integration has to be partitioned into rectangular subranges of area urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0715 instead of intervals of length τ. There will be approximately const urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0716 such subintegrals. The variance of the integral of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0717 over each of these subranges will be of order urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0718, similarly for expectations and covariances. This yields a variance of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0719 of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0720; see step 7 of the proof of Theorem 2.2.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0721

    Proof of Theorem 3.3.By 3.14 and 3.16, it is sufficient to show that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0722 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0723 converge jointly in distribution, while urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0724, as well as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0725, converge in probability. These claims follow as before if we combine the convergence of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0726 from display 3.22 with Bahadur expansion 2.6 for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0727 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0728, where the latter are evaluated at urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0729, which is not constant but converges.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0730

    Proof of Lemma 3.1.By definition of conditional quantiles, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0731. Differentiating this with respect to X gives

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0732(B.5)

    The differential in the numerator has two components, one due to the structural relation between urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0733 and X (i.e. the derivative with respect to the argument X of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0734), and one due to the stochastic dependence of X and ε:

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0735
    This can be seen as follows: We can decompose the derivative according to
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0736
    To simplify the first derivative, note that by iterated expectations
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0737
    Differentiating this with respect to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0738 gives
    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0739
    The claim now is immediate.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0740

    Proof of Proposition 3.1.Because X and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0741 have their support in the interval [0, 1], urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0742 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0743. Therefore, the unique root X of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0744 must be stable, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0745. By Lemma 3.1 and Assumption 3.3, this implies that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0746.

    Finally, note that for all X where (0, X) is in the support of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0747, there exists a q such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0748.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0749

    Proof of Proposition 3.2.The claims are immediate, noting that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0750 and similarly for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0751. Furthermore, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0752 for all s, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0753 and urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0754 for all s, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0755. Next, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0756 on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0757, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0758 from which negativity on urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0759 follows, similarly for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0760.

    Finally, under monotonicity of potential outcomes, assuming for simplicity differentiability of g,

    urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0761

    The numerator is always positive by assumption, the denominator is negative for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0762 and positive for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0763 because we had assumed g positive for sufficiently small x. Hence, urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0764 is positive for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0765 and negative for urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0766.urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0767

  1. 1 ‘System’ might refer to households, firms, urban neighbourhoods, national economies, etc.
  2. 2 An estimator is called superconsistent if it converges at a rate faster than the usual parametric rate, which equals the square root of the sample size.
  3. 3 Other related papers from the recent literature include Aradillas-Lopez (2010), Lewbel and Tang (2011) and de Paula and Tang (2012). In contrast to these, we do not assume additively separable heterogeneity in latent payoffs. We can do this because we are only interested in response functions, not in latent utility. Note that our paper does not contribute to the literature discussing identification and estimation problems in games of complete information with multiple equilibria.
  4. 4 Suppose that g has an infinite number of roots in the compact set urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0768. Then, the set of x such that urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0769 has an accumulation point in urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0770. At this accumulation point, genericity is violated.
  5. 5 The following theorem requires uniform convergence in probability of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0771 to urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0772. Note that this is a slightly different condition from convergence of urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0773 w.r.t. the norm urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0774 because urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0775 need not equal urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0776.
  6. 6 Kong et al. (2010) provide regularity conditions under whichurn:x-wiley:13684221:media:ectj12043:ectj12043-math-0777 uniformly in X, for some urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0778 as urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0779 for stationary mixing processes.
  7. 7 The approach of this paper, using local asymptotics, contrasts with the approach taken by most of the literature discussing inference on discrete valued parameters, testing and model selection. As argued by Choirat and Seri (2012), this literature has mostly focused on the use of large deviations asymptotics. The reason is that consistent estimators for discrete objects tend to converge at an exponential rate. Which type of asymptotics provides a more accurate approximation of finite sample distributions ultimately depends on the specific data-generating process; see Andrews and Cheng (2012). We should also mention the literature on testing for multimodality of densities (which is also based on i.i.d. asymptotics); see, e.g. Fischer et al. (1994).
  8. 8 We could also define an equivalent sequence of experiments holding constant the amounts of noise and shrinking the signal.
  9. 9 The proof of Theorem 2.2 uses somewhat similar arguments as Horváth (1991) and Giné et al. (2003), who discuss the asymptotic distribution of the L1 norm (urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0780 norm) of kernel density estimators.
  10. 10 I thank an anonymous referee for the suggestions discussed in this subsection.
  11. 11 This is an important restriction. It precludes, in particular, application of this set-up to correlated value auctions.
  12. 12 The implementation of local linear quantile regression uses code downloaded from http://www.econ.uiuc.edu/∼roger/research/rq/rq.html.
  13. 13 The full set of results for all 115 MSAs in the dataset can be found in the online Appendix.
    • The full text of this article hosted at iucr.org is unavailable due to technical difficulties.