This paper proposes an estimator and develops an inference procedure for the number of roots of functions that are non-parametrically identified by conditional moment restrictions. It is shown that a smoothed plug-in estimator of the number of roots is superconsistent under i.i.d. asymptotics, but asymptotically normal under non-standard asymptotics. The smoothed estimator is furthermore asymptotically efficient relative to a simple plug-in estimator. The procedure proposed is used to construct confidence sets for the number of equilibria of static games of incomplete information and of stochastic difference equations. In an application to panel data on neighbourhood composition in the United States, no evidence of multiple equilibria is found.

1. INTRODUCTION

Some economic systems show large and persistent differences in outcomes even though the observable exogenous factors influencing these systems differ little.1 One explanation for such persistent differences in outcomes is multiplicity of equilibria. If a system does have multiple equilibria, then temporary, large interventions might have a permanent effect, by shifting the equilibrium attained, while long-lasting, small interventions might not have a permanent effect.

Knowing the number of equilibria, and in particular whether there are multiple equilibria, is of interest in many economic contexts. Multiple equilibria and poverty traps are discussed by Dasgupta and Ray (1986), Azariadis and Stachurski (2005) and Bowles et al. (2006). Poverty traps can arise, for instance, if an individual's productivity is a function of their income and if wage income reflects productivity, as in models of efficiency wages. Productivity might depend on wages because nutrition and health are improving with income. If this feedback mechanism is strong enough, there might be multiple equilibria, and extreme poverty might be self-perpetuating. In that case, public investments in nutrition and health can permanently lift families out of poverty. Multiple equilibria and urban segregation are discussed by Becker and Murphy (2000) and Card et al. (2008). Urban segregation, along ethnic or sociodemographic dimensions, might arise because households' location choices reflect a preference over neighbourhood composition. If this preference is strong enough, different compositions of a neighbourhood can be stable, given constant exogenous neighbourhood properties. Transition between different stable compositions might lead to rapid composition change, or ‘tipping’, as in the case of gentrification of a neighbourhood. Interest in such tipping behaviour motivated Card et al. (2008), and is the focus of the application discussed in Section 4. of this paper. Multiple equilibria and the market entry of firms are discussed by Bresnahan and Reiss (1991) and Berry (1992). Entering a market might only be profitable for a firm if its competitors do not enter that same market. As a consequence, different configurations of which firms serve which markets might be stable. In sociology, finally, multiple equilibria are of interest in the context of social norms. If the incentives to conform to prevailing behaviours are strong enough, different behavioural patterns might be stable norms (i.e. equilibria); see Young (2008). Transitions between such stable norms correspond to social change. One instance where this has been discussed is the assimilation of immigrant communities into the mainstream culture of a country.

This paper develops an estimator and an inference procedure for the number of equilibria of economic systems. It will be assumed that the equilibria of a system can be represented as solutions to the equation $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0001$ . It will furthermore be assumed that g can be identified by some conditional moment restriction. The procedure proposed here provides confidence sets for the number $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0002$ of solutions to the equation $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0003$ .

This procedure can be summarized as follows. In a first stage, g and its derivative $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0004$ are non-parametrically estimated. These first-stage estimates of g and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0005$ are then plugged into a smooth functional $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0006$ , as defined in 2.4. We show that under standard i.i.d. asymptotics, and for the bandwidth parameter ρ small enough, the continuously distributed $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0007$ is equal to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0008$ with probability converging to 1. A superconsistent estimator of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0009$ can thus be formed by projecting $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0010$ on the closest integer.2

We then show that a rescaled version of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0011$ converges to a normal distribution under a non-standard sequence of experiments. This non-standard sequence of experiments is constructed using increasing levels of noise and shrinking bandwidth as sample size increases. Under this same sequence of experiments, the bootstrap provides consistent estimates of the bias and standard deviation of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0012$ relative to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0013$ . We can thus construct confidence sets for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0014$ using t-tests. These confidence sets are sets of integers containing the true number of roots with a pre-specified asymptotic probability of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0015$ . An alternative to the procedure proposed here would be to use the simple plug-in estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0016$ . This estimator just counts the roots of the first-stage estimate of g. We show, however, that the simple plug-in estimator is asymptotically inefficient relative to the smoothed estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0017$ under the non-standard sequence of experiments considered.

Sections 3.4. and 3.5. discuss two general set-ups that allow us to translate the hypothesis of multiple equilibria into a hypothesis on the number of roots of some identifiable function g; these set-ups are static games of incomplete information and stochastic difference equations. Section 3.4. discusses a non-parametric model of static games of incomplete information, similar to the one analysed by Bajari et al. (2010).3 Under the assumptions detailed in Section 3.4., we can non-parametrically identify the average best response functions (averaging over private information) of the players in a static incomplete information game. This allows us to represent the Bayesian Nash equilibria of this game as roots of an estimable function. Section 3.4. discusses how to perform inference on the number of such Bayesian Nash equilibria.

Section 3.5. considers panel data of observations of some variable X, where X is generated by a general non-linear stochastic difference equation. This is motivated by the study of neighbourhood composition dynamics in Card et al. (2008). Section 3.5. argues that we can construct tests for the null hypothesis of equilibrium multiplicity of such non-linear difference equations by testing whether non-parametric quantile regressions of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0018$ on X have multiple roots.

The rest of this paper is structured as follows. Section 2. presents the inference procedure and its asymptotic justification for the baseline case. Section 3. discusses generalizations, as well as identification and inference in static games of incomplete information and in stochastic difference equations. Section 4. applies the inference procedure to the data on neighbourhood composition studied by Card et al. (2008). In contrast to their results, no evidence of ‘tipping’ (equilibrium multiplicity) is found here. Section 5. concludes. Appendix A presents some Monte Carlo evidence. All proofs are relegated to Appendix B. Additional figures and tables are given in the online Appendix, which also contains a second application of the inference procedure to data on economic growth, similar to those discussed by Azariadis and Stachurski (2005), in their Section 4.1, and by Quah (1996).

2. INFERENCE IN THE BASELINE CASE

2.1. Set-up

Throughout this paper, the parameter of interest is the number of roots Z of some function g on a subset $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0019$ of its support:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0020$ (2.1)

Interest in this parameter is motivated by economic models in which the equilibria can be represented as roots of such a function g. Identification of the parameter $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0021$ follows from identification of g on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0022$ . In this section, inference on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0023$ is discussed for functions g with one-dimensional and compact domain and range. Throughout, the following assumption will be maintained.

Assumption 2.1.(a) The observable data are i.i.d. draws of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0024$ , where each draw has the same distribution as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0025$ ; (b) the set $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0026$ is compact, and the density of X is bounded away from 0 on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0027$ ; (c) the function g is identified by a conditional moment restriction of the form

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0028$ (2.2)

(d) the function g is continuously differentiable and generic in the sense of Definition 2.1.

Examples of functions characterized by conditional moment restrictions as in 2.2 are conditional mean regressions, for which $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0029$ , and conditional qth quantile regressions, for which $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0030$ .

Definition 2.1. (Genericity)A continuously differentiable function g is called generic if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0031$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0032$ , and if all roots of g are in the interior of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0033$ .

Genericity of g implies that g has only a finite number of roots.4 Genericity in the sense of Definition 2.1 is commonly assumed in microeconomic theory; see the discussion in Mas-Colell et al. (1995, p. 593ff).

We propose the following inference procedure for the number of roots of g, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0034$ . First, estimate $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0035$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0036$ using local linear m-regression:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0037$ (2.3)

Here, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0038$ for some (symmetric, positive) kernel function K integrating to one with bandwidth τ. Equation 2.3 is a sample analogue of 2.2, where a kernel weighted local average is replacing the conditional expectation. Next, calculate $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0039$ , where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0040$ is defined as

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0041$ (2.4)

In this expression, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0042$ for a Lipschitz continuous, positive symmetric kernel L integrating to one with bandwidth 1 and support [ − 1, 1]. The intuition for this expression will be discussed in detail below. Estimate the variance and bias of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0043$ relative to Z using bootstrap. Finally, construct integer valued confidence sets for Z using t-statistics based on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0044$ and the bootstrapped variance and bias.

2.2. Basic properties and consistency

The rest of this section will motivate and justify this procedure. First, we see that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0045$ is a superconsistent estimator of Z, in the sense that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0046$ for any diverging sequence $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0047$ , under i.i.d. sampling and conditions to be stated. Then, we present the central result of this paper, which establishes asymptotic normality of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0048$ under a non-standard sequence of experiments. From this result, it follows that inference based on t-statistics, using bootstrapped standard errors and bias corrections, provides asymptotically valid confidence sets for Z. We also show that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0049$ is an efficient estimator relative to the simple plug-in estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0050$ under the non-standard asymptotic sequence.

We are mainly concerned with constructing confidence sets for Z, rather than a point estimator. A point estimator could be formed by projecting $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0051$ on the closest integer. While $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0052$ will be called an estimator of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0053$ , it should be kept in mind that its primary role is as an intermediate statistic in the construction of confidence sets.

The following proposition states that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0054$ for generic g and ρ small enough. The two functionals only differ around non-generic g, or ‘bifurcation points’ (i.e. g where Z jumps). The functional $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0055$ is a smooth approximation of Z which varies continuously around such jumps.

Proposition 2.1.For g continuously differentiable and generic, if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0056$ is small enough, then

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0057$

The intuition underlying Proposition 2.1 is as follows. Given a generic function g, consider the subset of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0058$ where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0059$ is not zero. If ρ is small enough, this subset is partitioned into disjoint neighbourhoods of the roots of g, and g is monotonic in each of these neighbourhoods. A change of variables, setting $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0060$ , shows that the integral over each of these neighbourhoods equals one. Figure 1 illustrates the relationship between Z and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0061$ . For the functions g depicted, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0062$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0063$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0064$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0065$ . The two functionals are equal if g does not peak within the range $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0066$ , but if g does peak within the range $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0067$ , they are different and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0068$ is not integer valued.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Z and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0069$ .

**Figure 1**
Open in figure viewer PowerPoint

Z and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0069$ .

It is useful to equip the space of continuously differentiable functions on the compact set $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0070$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0071$ , with the following norm:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0072$ (2.5)

This is the uniform first-order Sobolev norm on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0073$ . Given this norm, we have the following proposition.

Proposition 2.2. (Local constancy) $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0074$ is constant in a neighbourhood, with respect to the norm $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0075$ , of any generic function $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0076$ , and so is $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0077$ if ρ is small enough.

Using a neighbourhood of g with respect to the sup norm in levels only, instead of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0078$ , is not enough for the assertion of Proposition 2.2 to hold. For any function g₁ that has at least one root, we can find a function g₂ arbitrarily close to g₁ in the uniform sense, which has more roots than g₁, by adding a ‘wiggle’ around a root of g₁. Figure 2 illustrates. This figure shows two functions that are uniformly close in levels but not in derivatives, and which have different numbers of roots. However, if one additionally restricts the first derivative of g₂ to be uniformly close to the the derivative of g₁, additional wiggles are precluded around generic roots, because around these g₁ has a non-zero derivative. Because derivatives are ‘harder’ to estimate than levels, variation in the estimated derivatives dominates the asymptotic distribution of estimators for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0079$ , as will be shown. Proposition 2.2 immediately implies the following theorem as a corollary. This theorem states that the plug-in estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0080$ converges to a degenerate limiting distribution at an ‘infinite’ rate, if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0081$ converges with respect to the norm $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0082$ (i.e. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0083$ is equal to the true number of roots with probability converging to 1).5

Theorem 2.1. (Superconsistency)If $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0084$ converges uniformly in probability to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0085$ , if g is generic and if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0086$ is some arbitrary diverging sequence, then

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0087$

Furthermore, if ρ is small enough so that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0088$ holds, then

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0089$

This result implies that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0090$ if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0091$ as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0092$ .

2.3. Asymptotic normality and relative efficiency

We have shown our first claim, superconsistency of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0093$ given uniform convergence of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0094$ . Next, we show our second claim, asymptotic normality of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0095$ under a non-standard sequence of experiments. This section then concludes by formally stating the efficiency of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0096$ relative to the simple plug-in estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0097$ . To further characterize the asymptotic distribution of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0098$ , we need a suitable approximation for the distribution of the first-stage estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0099$ . Kong et al. (2010) provide uniform Bahadur representations for local polynomial estimators of m-regressions. We state their result, for the special case of local linear m-regression, as an assumption.

Assumption 2.2. (Bahadur expansion)The estimation error of the estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0100$ $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0101$ defined by 2.3 can be approximated by a local average as follows:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0102$ (2.6)

Here, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0103$ is the density of X, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0104$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0105$ (in a piecewise derivative sense; m is assumed to be piecewise differentiable), $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0106$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0107$ is a non-random matrix converging uniformly to the identity matrix, and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0108$ uniformly in x.

The crucial part of Assumption 2.2 is the assumption that the remainder R is asymptotically negligible relative to the linear (sample mean) component of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0109$ . This assumption is only well defined in the context of a specific sequence of experiments.6 In Theorem 2.2, this assumption will be understood to hold relative to the sequence of experiments defined in Assumption 2.3. In the case of qth quantile regression, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0110$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0111$ . In the case of mean regression, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0112$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0113$ .

The asymptotic results in the remainder of this section depend on the availability of an expansion in the form of expansion 2.6 and the relative negligibility of the remainder, but not on any other specifics of local linear m-regression. This will allow for fairly straightforward generalizations of the baseline case considered here to the cases discussed in Section 3., as well as to other cases that are beyond the scope of this paper, once we have appropriate expansions for the first-stage estimators.

By Proposition 2.2, consistency of any plug-in estimator follows from uniform convergence of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0114$ . Such uniform convergence follows from Assumption 2.2, combined with a Glivenko–Cantelli theorem on uniform convergence of averages, assuming i.i.d. draws from the joint distribution of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0115$ as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0116$ ; see van der Vaart (1998), Chapter 19. Superconsistency of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0117$ therefore follows, which implies that standard i.i.d. asymptotics with rescaling of the estimator yield only degenerate distributional approximations. This is because $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0118$ and Z are constant in a C¹ neighbourhood of any generic g, even though they jump at bifurcation points (i.e. non-generic g). As a consequence, all terms in a functional Taylor expansion of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0119$ , as a function of g, vanish, except for the remainder. The application of ‘delta method’ type arguments, as in Newey (1994), gives only the degenerate limit distribution.

In finite samples, however, the sampling variation of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0120$ is, in general, not negligible, as the simulations of Appendix A confirm, which makes the distributional approximation of the degenerate limit useless for inference. Asymptotic statistical theory approximates the finite sample distribution of interest by a limiting distribution of a sequence of experiments, of which our actual experiment is an element. The choice of sequence is to some extent arbitrary; the standard sequence where observations are i.i.d. draws from a distribution, which does not change as n increases, is just one possibility. In econometrics, non-standard asymptotics are used, for instance, in the literature on weak instruments; see, e.g. Staiger and Stock (1997), Imbens and Wooldridge (2007) and Andrews and Cheng (2012). In the present set-up, a non-degenerate distributional limit of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0121$ can only be obtained under a sequence of experiments, which yields a non-degenerate limiting distribution of the first-stage estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0122$ .7 We now consider asymptotics under such a sequence of experiments. The sequence we consider has increasing amounts of noise relative to signal as sample size increases.8

Assumption 2.3.Experiments are indexed by n, and for the nth experiment we observe $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0123$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0124$ . The observations $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0125$ are i.i.d. given n, and

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0126$ (2.7)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0127$ (2.8)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0128$ (2.9)

where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0129$ is a real-valued sequence and

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0130$

The last equality requires the criterion function m to be scale neutral. This holds for quantiles and the mean, in particular. For a given sample size n, this is the same model as before. As n changes, the function g identified by 2.2 is held constant. If $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0131$ grows in n, the estimation problem in this sequence of models becomes increasingly difficult relative to i.i.d. sampling. Note that 2.9 does not describe an additive structural model, which would allow us to predict counterfactual outcomes. Instead, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0132$ is simply the statistical residual, given by the difference of Y and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0133$ , which is also well defined for non-additive structural models.

Our next result, Theorem 2.2, assumes that the approximation of Assumption 2.2 holds under the non-standard sequence of experiments described by Assumption 2.3. Theorem 1 in Kong et al. (2010) implies that Assumption 2.2 holds under standard asymptotics and weak regularity conditions. Their result extends to our setting in a fairly straightforward way, however. This is most easily seen in the case of mean regression. We can write $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0134$ as a sum of two terms: (a) $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0135$ ; (b) $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0136$ . We can then apply the result of Kong et al. (2010) to local linear regression on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0137$ of each of these terms separately. Both the Bahadur expansion and the local linear mean regression estimator are linear in Y. As a consequence, the remainder R for a regression of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0138$ on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0139$ is given by the sum of the two remainders corresponding to regression of terms (a) and (b) on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0140$ . Whichever of the two Bahadur expansions corresponding to (a) and (b) dominates the asymptotic distribution is thereby guaranteed to be of larger order than the sum of the two remainder terms. A similar logic applies more generally, for instance to the case of local linear quantile regression; a complete proof is beyond the scope of the present paper.

By Corollary 2.1, a necessary condition for a non-degenerate limit of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0141$ is that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0142$ converges to a non-degenerate limiting distribution. As is well known, and also follows from Assumption 2.2, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0143$ converges at a slower rate than $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0144$ , so that asymptotically variation in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0145$ will dominate, namely by adding ‘wiggles’ around the actual roots. If $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0146$ in the sequence of experiments defined in Assumption 2.3, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0147$ converges uniformly in probability to g, whereas $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0148$ converges pointwise to a non-degenerate limit. This is the basis for the following theorem.9

Theorem 2.2. (Asymptotic normality)Under Assumptions 2.1, 2.2 and 2.3, and if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0149$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0150$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0151$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0152$ , then there exist $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0153$ and V such that

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0154$

for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0155$ . Both μ and V depend on the data-generating process only via the asymptotic mean and variance of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0156$ at the roots of g, which in turn depend upon $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0157$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0158$ , s and Var $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0159$ evaluated at the roots of g.

This theorem justifies the use of t-tests based on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0160$ for null hypotheses of the form $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0161$ . The construction of a t-statistic requires a consistent estimator of V and an estimator of μ converging at a rate faster than $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0162$ . Based on the last part of Theorem 2.2, we can construct such estimators as follows. Any plug-in estimator that consistently estimates the (co)variances of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0163$ under the given sequence of experiments consistently estimates μ and V. One such plug-in estimator is standard bootstrap (i.e. resampling from the empirical distribution function). The Bahadur expansion in Assumption 2.2, which approximates $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0164$ by sample averages, implies that the bootstrap gives a resampling distribution with the asymptotically correct covariance structure for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0165$ . From this and Theorem 2.2, it then follows that the bootstrap gives consistent variance and bias estimates for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0166$ , where the bias is estimated from the difference of the resampling estimates relative to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0167$ . If sample size grows fast enough relative to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0168$ and τ, the asymptotic validity of a standard normal approximation for the pivot follows.

It would be interesting to develop distributional refinements for this statistic using higher-order bootstrapping, along the lines discussed by Horowitz (2001). However, higher-order bootstrapping might be very computationally demanding in the present case, in particular if criteria such as quantile regression are used to identify g.

Theorem 2.2 also implies that increasing the bandwidth parameter ρ reduces the variance without affecting the bias in the limiting normal distribution. Asymptotically, the difficulty in estimating Z is driven entirely by fluctuations in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0169$ . These fluctuations lead both to upward bias and to variance in plug-in estimators. When ρ is larger, these fluctuations are averaged over a larger range of X, thereby reducing variance. Theorem 2.2 implies that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0170$ is asymptotically inefficient relative to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0171$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0172$ . Furthermore, by Proposition 2.1, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0173$ for all generic g. If the relative inefficiency carries over to the limit as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0174$ , it follows that the simple plug-in estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0175$ is asymptotically inefficient relative to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0176$ . Note, however, that this is only a heuristic argument. We cannot exchange the limits with respect to ρ and with respect to n to obtain the limit distribution of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0177$ . The following theorem, which is fairly easy to show, states a formally correct version of this argument.

Theorem 2.3. (Asymptotic inefficiency of the naive plug-in estimator)Consider the set-up of Theorem 2.2, and assume $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0178$ . Then, as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0179$ ,

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0180$

and

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0181$

From this theorem, it follows in particular that tests based on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0182$ will, in general, not be consistent under the sequence of experiments considered (i.e. the probability of false acceptances does not go to zero). This stands in contrast to tests based on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0183$ .

2.4. Alternative approaches

The reader might wonder rightly whether there are alternative estimators that, like our $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0184$ , avoid the issues of the naive estimator (overestimating the number of roots, in particular), and that possibly beat $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0185$ in terms of some notion of relative efficiency.10 One possible estimator that comes to mind is the ϱ-packing number of the set of roots of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0186$ , where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0187$ slowly. The packing number is the largest integer z such that there are z disjoint balls of radius ϱ centred at roots of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0188$ .

The packing number is in fact closely related to our estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0189$ . For an appropriate scaling of ϱ, we can think of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0190$ as smoothly interpolating the packing number. The following numerical illustration helps to make the point. Consider $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0191$ , and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0192$ . This function has four roots at a distance of 1/4 from each other, and has a maximum absolute value of 1. For this function g, consider both $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0193$ and the packing number of g as a function of ρ (or ϱ). The result is plotted in Figure 3, which illustrates the relationship between $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0194$ and the packing number of the set of roots of g for the function $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0195$ , by plotting both as a function of bandwidth. For comparability, we have scaled $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0196$ . As can be seen from this figure, both estimators behave similarly, with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0197$ interpolating the jumps of the packing number. To the extent that smoother estimators are preferable in many contexts (see the literature on model selection versus shrinkage), it might be that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0198$ is better behaved. A formalization of this heuristic argument, and a full development of the asymptotic theory of packing numbers, is beyond the scope of the present paper. One advantage of considering $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0199$ , which motivates our focus on this estimator rather than, for instance, the packing number, is that it allows for an easier development of asymptotic theory and of corresponding inference procedures, which are the main object of the present paper.

**Figure 3**
Open in figure viewer PowerPoint

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0200$ and the packing number.

The reader might further wonder, rightly again, whether the sequence of experiments we chose in Assumption 2.3 is peculiar, and whether another sequence might give different answers. The problem of estimating $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0201$ might be made more difficult not only by increasing the variance of the regression residuals, but also by letting the roots of g move closer to each other. Formally, we might consider $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0202$ where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0203$ are i.i.d. and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0204$ for a diverging sequence $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0205$ . Such a sequence of experiments, however, effectively reduces to the setting of standard asymptotics once we substitute the bandwidth ρ by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0206$ , and account for the fact that effective sample size grows only at rate $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0207$ . This implies, in particular, that the superconsistency result of Theorem 2.1 also applies to this alternative sequence of experiments, which makes it unsuitable for inference.

3. EXTENSIONS AND APPLICATIONS

In this section, several extensions and applications of the results of Section 2. are presented. Sections 3.1.–3.3. discuss, respectively, inference on Z if g is identified by more general moment conditions, inference on Z if the domain and range of g are multidimensional and inference on the number of stable and unstable roots. Sections 3.4. and 3.5. discuss identification and inference for the two applications mentioned in the introduction: static games of incomplete information and stochastic difference equations.

3.1. Conditioning on covariates

In the previous section, inference on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0208$ was discussed for functions g identified by the moment condition

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0209$

This subsection generalizes to functions g identified by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0210$ (3.1)

where the parameter of interest now is $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0211$ , the number of roots of g in x given w₁. The conditional moment restriction 3.1 can be rationalized by a structural model of the form $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0212$ , where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0213$ and g is defined by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0214$

We assume that the joint density of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0215$ is bounded away from zero on the set $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0216$ $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0217$ , where supp denotes the compact support of either random vector.

The vector W₂ serves as a vector of control variables. The conditional independence assumption $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0218$ is also known as ‘selection on observables’. The function g is equal to the average structural function if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0219$ , and equal to a quantile structural function if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0220$ . The average structural function will be of importance in the context of games of incomplete information, as discussed in Section 3.4.; quantile structural functions will be used to characterize stochastic difference equations in Section 3.5.. When games of incomplete information are discussed in Section 3.4., $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0221$ will correspond to the component of public information, which is not excluded from either player's response function.

The inference procedure proposed in the previous section is based upon two steps. First, the function g and its derivative are estimated using local linear m-regression. In the second step, the estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0222$ is plugged into the functional $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0223$ , which is a smooth approximation of the functional $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0224$ . We can generalize this approach by maintaining the same second step while using more general first-stage estimators $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0225$ . Equation 3.1 suggests estimating g by a non-parametric sample analogue, replacing the conditional expectation with a local linear kernel estimator of it, and the expectation over W₂ with a sample average. Formally, let $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0226$ , where

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0227$ (3.2)

An asymptotic normality result can be shown in this context, which generalizes Theorem 2.2. In light of the proof of Theorem 2.2, the crucial step is to obtain a sequence of experiments such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0228$ converges uniformly to g while $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0229$ has a non-degenerate limiting distribution. If we obtain an approximation of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0230$ equivalent to the approximation in Assumption 2.2, all further steps of the proof apply immediately. This can be done, using the results of Newey (1994), for the following sequence of experiments.

Assumption 3.1.Experiments are indexed by n, and for the nth experiment we observe $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0231$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0232$ . The observations $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0233$ are i.i.d. given n, and

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0234$ (3.3)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0235$ (3.4)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0236$ (3.5)

Theorem 3.1. (Asymptotic normality, with control variables)Under the assumptions of Section 2., but with g identified by 3.1 and the data generated by the model given by Assumption 3.1, if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0237$ , where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0238$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0239$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0240$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0241$ , then there exist $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0242$ and V such that

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0243$

3.2. Higher-dimensional systems

Thus far, only one-dimensional arguments x and one-dimensional ranges for the function g have been considered, where x is the argument over which $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0244$ integrates. All results of Section 2. are easily extended to a higher-dimensional set-up. In particular, assume we are interested in the number of roots of a function g from $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0245$ to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0246$ . Generalizing 2.4, we can define $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0247$ as

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0248$ (3.6)

where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0249$ are again estimated by local linear m regression, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0250$ is a kernel with support $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0251$ , and the integral is taken over the set $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0252$ in the support of g. As in the one-dimensional case, superconsistency follows from uniform convergence of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0253$ . The following theorem, generalizing Theorem 2.2, holds for arbitrary d.

Theorem 3.2. (Asymptotic normality, multidimensional systems)Under the assumptions of Section 2., but with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0254$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0255$ defined by 3.6, if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0256$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0257$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0258$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0259$ , then there exist $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0260$ such that

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0261$

3.3. Stable and unstable roots

Instead of testing for the total number of roots, one might be interested in the number of stable and unstable roots, Z^s and Z^u. Stable roots are those where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0262$ is negative, and unstable roots are those where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0263$ is positive:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0264$ (3.7)

In the multidimensional case, we could more generally consider roots with a given number of positive and negative eigenvalues of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0265$ . We can define smooth approximations of the parameters Z^s and Z^u as follows:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0266$ (3.8)

Again, all arguments of Section 2. go through essentially unchanged for these parameters. In particular, Theorem 2.2 applies literally, replacing Z with Z^s or Z^u.

More generally, functionals that are smooth approximations of the number of roots with various stability properties can be constructed in the multidimensional case by multiplying the integrand with an indicator function depending on the signs of the eigenvalues of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0267$ .

3.4. Static games of incomplete information

This section and Section 3.5. discuss how to apply the inference procedure proposed to test for equilibrium multiplicity in economic models. The discussion in this subsection builds on Bajari et al. (2010).

Consider the following static game of incomplete information. Assume there are two players $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0268$ , who both have to choose between one of two actions, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0269$ . Player i makes her choice based on public information s, as well as private information $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0270$ . The public information s is observed by the econometrician, and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0271$ is independent of s. It is assumed that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0272$ does not enter player i's utility.11 Denote the probability that player i plays strategy $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0273$ given the public information s by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0274$ . Player i's expected utility given her information, and hence her optimal action $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0275$ , depends on s and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0276$ , as well as player $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0277$ 's probability of choosing $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0278$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0279$ . Let us denote the average best response of player i, integrating over the distribution of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0280$ , by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0281$ (3.9)

Figure 4 illustrates, by plotting the response functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0282$ for a given s. The functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0283$ are the (average) best response functions, Bayesian Nash equilibrium requires $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0284$ , and we observe one equilibrium $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0285$ in the data. In this figure, there are two further equilibria that are not directly observable. In Bayesian Nash equilibrium, the probability of player i choosing $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0286$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0287$ , equals the average best response of player i, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0288$ . This implies the two equilibrium conditions

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0289$

for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0290$ . In Figure 4, the Bayesian Nash equilibria correspond to the intersections of the graphs of the two $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0291$ . The condition for Bayesian Nash equilibrium in this game can be restated as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0292$ , where

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0293$ (3.10)

The number of roots of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0294$ in σ₁ is the number of Bayesian Nash equilibria in this game, given s.

We now discuss identification and inference on the number of Bayesian Nash equilibria of this game, given the public information s. Assume we observe an i.i.d. sample of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0295$ , the players' realized actions and the public information of the game, where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0296$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0297$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0298$ . In this subsection, i indexes players and j indexes observations. Rational expectation beliefs of player $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0299$ about the expected action of player i are given by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0300$ . The following two-stage estimation procedure is a non-parametric variant of the procedure proposed by Bajari et al. (2010). We can get an estimate of the beliefs, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0301$ , by local linear mean regression.

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0302$ (3.11)

Average best responses of players are given by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0303$ . Without further restrictions, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0304$ is not identified, because by definition σ is functionally dependent on s. If, however, exclusion restrictions of the form

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0305$ (3.12)

are imposed, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0306$ can be identified. In particular, assume that exclusion restriction 3.12 holds, with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0307$ . There is one excluded component of s for each player, the remaining $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0308$ components are not excluded from either response function $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0309$ . Assume furthermore that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0310$ has full support [0, 1] given $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0311$ , for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0312$ . Under these assumptions, we can estimate the best response functions, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0313$ , again using local linear mean regression:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0314$ (3.13)

Note that no functional form restrictions are needed for identification of the choice functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0315$ . This stands in contrast to Bajari et al. (2010), who need to impose such restrictions in order to be able to identify the underlying preferences. Recall that the condition for Bayesian Nash equilibrium in this game is given by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0316$ . Inserting $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0317$ into $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0318$ , both estimated by 3.13, yields an estimator of g, which can be written as

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0319$ (3.14)

Based on this estimator, we can perform inference on the number of Bayesian Nash equilibria given s, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0320$ . In particular, let

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0321$ (3.15)

where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0322$ is given by 3.14. The term $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0323$ refers to the estimated derivative of g w.r.t. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0324$ , and similarly for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0325$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0326$ , so that

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0327$ (3.16)

Inference on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0328$ can now proceed as before, if an asymptotic normality result similar to Theorem 2.2 can be shown. In the proof of Theorem 2.2, three properties of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0329$ needed to be proven for the statement of the theorem to follow. First, under the given sequence of experiments, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0330$ converges uniformly in probability to a degenerate limit. Second, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0331$ converges in distribution to a non-degenerate limit. Third, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0332$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0333$ are asymptotically independent for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0334$ . These properties can be shown for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0335$ in the present case, with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0336$ replacing x, for an appropriate choice of sequence of experiments, where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0337$ is a scale parameter as before.

The choice of sequence of experiments may seem to be more complicated here than in the baseline case, because the dependent variable a is naturally bounded by [0, 1], so that increasing the residual variance would be inconsistent with the structural model. This is not a problem, however, if we note that the distribution of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0338$ , in the baseline model, is invariant to a proportional rescaling of Y, g and ρ. We can therefore define a sequence of experiments that is equivalent to the one defined by 2.7–2.9 if we replace 2.9 by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0339$

and ρ by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0340$ . Intuitively, shrinking the signal g is equivalent to increasing the noise $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0341$ . Returning to games of incomplete information, consider the following sequence of experiments.

Assumption 3.2.For $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0342$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0343$ is continuously differentiable and monotonic in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0344$ , and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0345$ denotes the inverse of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0346$ with respect to the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0347$ argument, given $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0348$ . Experiments are indexed by n, and for the nth experiment we observe $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0349$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0350$ . The observations $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0351$ are i.i.d. given n and

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0352$ (3.17)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0353$ (3.18)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0354$ (3.19)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0355$ (3.20)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0356$ (3.21)

Equations 3.17–3.19 are the same as in the model we have been discussing so far. Equations 3.20 and 3.21 shrink the graphs of the best response functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0357$ towards the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0358$ line (compare Figure 4), parallel to the σ₁ axis. Denote $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0359$ . We obtain

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0360$

By 3.21, if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0361$ , then $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0362$ , and hence

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0363$ (3.22)

Using this sequence of experiments, we can now state an asymptotic normality result, similar to Theorem 2.2, for static games of incomplete information. The statement of the theorem differs in two respects from the baseline case. First, ρ is replaced by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0364$ in all expressions. Because this sequence of experiments shrinks g rather than expanding the error, the bandwidth ρ must also shrink correspondingly. Second, the rate of growth of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0365$ is smaller. Because all regressions are controlling for s₁ or s₂, rates of convergence are slower. In particular, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0366$ converges to a non-degenerate limit iff $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0367$ , where k is the dimensionality of the support of the response functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0368$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0369$ .

Theorem 3.3. (Asymptotic normality, static games of incomplete information)Under the sequence of experiments defined by Assumption 3.2, if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0370$ uniformly in the Bahadur expansions as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0371$ , and if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0372$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0373$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0374$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0375$ , then there exist $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0376$ and V such that

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0377$

3.5. Stochastic difference equations

In this subsection, we discuss the identification and interpretation of the number of roots of g for stochastic difference equations of the form

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0378$ (3.23)

Interest in such difference equations is motivated by the study of neighbourhood composition dynamics in Card et al. (2008). This discussion will form the basis of the empirical application in Section 4.. The results of this subsection suggest that, if the stochastic difference equation 3.23 had multiple equilibria, then we should expect to find multiple roots in cross-sectional quantile regressions of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0379$ on X. The notion of multiple equilibria here has to be generalized to the notion of multiple equilibrium regions.

The intuition for this claim is as follows. Holding ε constant, the number of roots of g in X is the number of equilibria of the difference equation 3.23. If ε is stochastic, then the number of roots can still serve to characterize qualitative dynamics in terms of equilibrium regions. This is shown in Figure 5, which illustrates the characterization of dynamics derived in this section. In this figure, g^U and g^L are upper and lower envelopes of g for a sequence of realizations of ε. There are ranges of X in which the sign of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0380$ does not depend on ε. This implies that in these ranges X moves towards the equilibrium regions, which are the regions in which the roots of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0381$ lie. Equilibrium regions correspond to the dashed segments of the X-axis, the basin of attraction of the lower equilibrium region is given by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0382$ and the basin of attraction of the upper equilibrium region is $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0383$ .

How is the joint distribution of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0384$ related to the transition function g? Unobserved heterogeneity, which is positively related over time, leads to an upward bias in quantile regression slopes relative to the corresponding structural slopes. To show this, denote the qth conditional quantile of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0385$ given X by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0386$ , the conditional cumulative distribution function at Q by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0387$ , and the conditional probability density by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0388$ . The following lemma shows that quantile regressions of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0389$ on X yield biased slopes relative to the structural slope $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0390$ , if X is not exogenous. The second term in 3.24 reflects the bias due to statistical dependence between X and ε.

Lemma 3.1. (Bias in quantile regression slopes)If $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0391$ , and if Q and F are differentiable with respect to the conditioning argument X, then

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0392$ (3.24)

The following assumption of first-order stochastic dominance states that there is no negative dependence between current $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0393$ , evaluated at fixed $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0394$ , and current X.

Assumption 3.3. (First-order stochastic dominance) $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0395$ is non-increasing as a function of X, holding $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0396$ constant.

Violation of this assumption would require some underlying cyclical dynamics, in continuous time, with a frequency close enough to half the frequency of observation, or more generally with a ratio of frequencies that is an odd number divided by two. It seems safe to discard this possibility in most applications. This assumption might not hold, for instance, if outcomes were influenced by seasonal factors and observations were semi-annual.

We can now formally state the claim that, if there are unstable equilibria structurally, then quantile regressions should exhibit multiple roots.

Proposition 3.1. (Unstable equilibria in dynamics and quantile regressions)Assume that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0397$ and that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0398$ , and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0399$ for all ε. If Assumption 3.3 holds and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0400$ has only one root X for all q, then the conditional average structural functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0401$ , as functions of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0402$ , are stable at the roots m:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0403$

for all X, where (0, X) is in the support of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0404$ .

This proposition assumes global stability of g (i.e. X does not diverge to infinity). Under such global stability, if there is only one root of g, then this root is stable. According to this proposition, if quantile regressions only have one stable root, then the same is true for the conditional average structural functions. This is not conclusive, but it is suggestive that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0405$ themselves have only one root.

Let us now turn to the implications of the number of roots of g for the qualitative dynamics of the stochastic difference equation 3.23. Let $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0406$ . If g describes a structural relationship, the counterfactual time path under ‘manipulated’ initial condition $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0407$ is given by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0408$ (3.25)

Given the initial condition $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0409$ and shocks $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0410$ , 3.23 describes a time inhomogeneous deterministic difference equation. The following argument makes statements about the qualitative behaviour of this difference equation based on properties of the function g, in particular based on the number of roots in x of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0411$ for given unobservables $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0412$ . Consider Figure 5, which shows g^U and g^L defined by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0413$ (3.26)

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0414$ (3.27)

The functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0415$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0416$ are the upper and lower envelopes of the family of functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0417$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0418$ . The direction of movement of X over time does not depend on s in the ranges where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0419$ or $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0420$ (which is where the horizontal axis is drawn solid in Figure 5), because the sign of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0421$ does not depend on s in these ranges. In other words, suppose we start off with an initial value below x₁ in the picture. If that is the case, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0422$ will converge monotonically toward the left-hand dashed range and then remain within that range for all $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0423$ . Similarly, for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0424$ in the upper ‘basin of attraction’ beyond x₂, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0425$ will converge to the upper ‘equilibrium range’ given by the right-hand dashed range. Hence, small changes of initial conditions (from x₁ to x₂) can have large and persistent effects on X in this case, in contrast to the case where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0426$ only has one stable root for all ε. These arguments are summarized in the following proposition.

Proposition 3.2. (Characterizing dynamics of stochastic difference equations)Assume that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0427$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0428$ , defined by (3.26) and (3.27), are smooth and generic, positive for sufficiently small x and negative for sufficiently large x, and have the same number z of roots, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0429$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0430$ , and let $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0431$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0432$ . Define the following mutually disjoint ranges:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0433$

Then, all $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0434$ are negative on the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0435$ , and positive on the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0436$ . Furthermore, all $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0437$ are negative in a neighbourhood to the right of the maximum of the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0438$ and positive to the left of the minimum, and the reverse holds for the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0439$ . Therefore, if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0440$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0441$ , then $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0442$ will converge monotonically toward $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0443$ and then remain within $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0444$ . If $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0445$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0446$ , then $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0447$ will converge monotonically toward $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0448$ and then remain within $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0449$ .

Assuming non-emptiness of these ranges, the interval $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0450$ is a basin of attraction for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0451$ (i.e. X in this interval converges monotonically to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0452$ and then remains there). The main difference relative to the deterministic, time homogenous case is the blurring of the stable equilibrium to a stable set $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0453$ .

We did not make any assumptions on the joint distribution of the unobserved factors $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0454$ . The whole argument of the preceding theorem is conditional on these factors. However, the predictions of the theorem will be sharper (given g) if serial dependence of unobserved factors is stronger, increasing the number of units i to which the assertion is applicable and reducing the size of the intervals $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0455$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0456$ , because $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0457$ is going to be smaller on average.

In summary, Proposition 3.1 implies that, if we do not find multiple roots in quantile regressions, then the conditional average structural functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0458$ do not have multiple roots. Proposition 3.2 implies that, if upper and lower envelopes of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0459$ do not have multiple roots, then the dynamics of the system are stable and initial conditions do not matter in the long run.

4. APPLICATION TO THE DYNAMICS OF NEIGHBOURHOOD COMPOSITION

This section analyses the dynamics of minority share in a neighbourhood, applying the methods developed in the last two sections to the data used for analysis of neighbourhood composition dynamics by Card et al. (2008). They study whether preferences over neighbourhood composition lead to a ‘white flight’, once the minority share in a neighbourhood exceeds a certain level. They argue that such ‘tipping’ behaviour implies discontinuities in the change of neighbourhood composition over time as a function of initial composition, and they test for the presence of such discontinuities in cross-sectional regressions over different neighbourhoods in a given city. This argument is based on the theoretical models of Becker and Murphy (2000), which do not allow for individual heterogeneity and consider infinite time horizons. The present paper argues that, if we allow for heterogeneity and finite time, and if tipping does take place, then we should expect multiple roots rather than discontinuities. Kasy (2015) discusses a search-matching model of the housing market with social externalites, which has this implication.

Card et al. (2008) provided full access to their datasets, which allows us to use identical samples and variable definitions as in their work. The dataset is an extract from the Neighbourhood Change Database (NCDB), which aggregates US census variables to the level of census tracts. Tract definitions are changing between census waves but the NCDB matches observations from the same geographic area over time, thus allowing observation of the development over several decades of the universe of US neighbourhoods. In the dataset used by Card et al. (2008), all rural tracts are dropped, as well as all tracts with population below 200 and tracts that grew by more than five standard deviations above the metropolitan statistical area (MSA) mean. The definition of MSA used is the MSAPMA from the NCDB, which is equal to a ‘primary metropolitan statistical area’ if the tract lies in one of those, and equal to the MSA it lies otherwise. For further details on sample selection and variable definition, see Card et al. (2008).

The graphs and tables to be discussed are constructed as follows. For each of the MSAs and each of the decades separately, we run local linear quantile regressions of the change in minority share of a neighbourhood (tract) on minority share at the beginning of the decade. This is done for the quantiles 0.2, 0.5 and 0.8, with a bandwidth τ of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0460$ , where n is the sample size.12 Figure 6 shows local linear quantile regressions of the change in minority share (left column) and of the change in white population relative to initial population (right column) on initial minority share for the quantiles 0.2, 0.5 and 0.8. The figures do not show confidence bands. The figure plots these quantile regressions for the three largest MSAs. For each of the regressions, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0461$ is calculated, where ρ is chosen as 0.04. The integral in the expression for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0462$ is taken over the interval [0, 1], intersected with the support of initial minority share if the latter is smaller. Note that it is possible to find no (stable) equilibrium for an MSA (i.e. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0463$ ), if high initial minority shares do not occur in that MSA and most neighbourhoods experienced growing minority shares. Figure 7 shows kernel density plots of the regressor, the initial minority share across neighbourhoods, which suggest that support problems are not an issue, at least for the largest MSAs. For each $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0464$ , bootstrap standard errors and bias are calculated, as well as the corresponding t-test statistics for the null hypothesis $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0465$ , implying an integer-valued confidence set (of level 0.05) for z. By the results of Section 2., these confidence sets have an asymptotic coverage probability of 95%. By the Monte Carlo evidence of Appendix A, they are likely to be conservative (i.e. have a larger coverage probability). If the confidence sets thus obtained are empty, the two neighbouring integers of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0466$ are included in the intervals shown. This makes inference even more conservative. Table 1 shows the resulting confidence sets for the 12 largest MSAs in the United States (by 2009 population), for all quantiles and decades under consideration.13

Table 1. 0.95 confidence sets for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0467$ by decade and quantile, change in minority share

	1970s			1980s			1990s
MSA	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0468$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0469$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0470$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0471$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0472$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0473$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0474$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0475$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0476$
New York, NY PMSA	[0,1]	[0,1]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]
Los Angeles-Long Beach, CA PMSA	[1,1]	[1,1]	[0,1]	[0,1]	[0,1]	[0,1]	[1,1]	[1,1]	[0,0]
Chicago, IL PMSA	[0,1]	[0,1]	[0,1]	[2,2]	[0,1]	[0,1]	[1,1]	[0,1]	[0,0]
Dallas, TX PMSA	[1,2]	[1,1]	[0,0]	[0,1]	[0,0]	[0,0]	[0,1]	[0,1]	[0,0]
Philadelphia, PA-NJ PMSA	[1,2]	[0,1]	[0,1]	[1,1]	[0,1]	[0,1]	[1,1]	[0,1]	[0,0]
Houston, TX PMSA	[1,1]	[0,0]	[0,0]	[1,2]	[0,1]	[0,0]	[0,1]	[0,0]	[0,0]
Miami, FL PMSA	[0,1]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]
Washington, DC-MD-VA-WV PMSA	[0,1]	[0,0]	[0,0]	[1,1]	[0,1]	[0,0]	[1,1]	[0,1]	[0,0]
Atlanta, GA MSA	[1,1]	[1,1]	[0,0]	[2,3]	[0,0]	[0,0]	[0,0]	[0,0]	[0,0]
Boston, MA-NH PMSA	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,0]	[1,1]	[0,0]	[0,1]
Detroit, MI PMSA	[1,2]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,0]
Phoenix-Mesa, AZ MSA	[1,1]	[0,0]	[0,0]	[1,1]	[0,1]	[0,0]	[1,1]	[0,1]	[0,0]
San Francisco, CA PMSA	[1,1]	[0,1]	[0,1]	[0,0]	[0,1]	[0,0]	[1,1]	[0,0]	[0,0]

Note

The table shows confidence intervals in the integers for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0477$ for the 12 largest MSAs of the United States, ordered by population, where g is estimated by quantile regression of the change in minority share over a decade on the initial minority share for the quantiles 0.2, 0.5 and 0.8. Regression bandwidth $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0781$ is $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0478$ , and σ is chosen as 0.04. Confidence sets are based on t-statistics using bootstrapped bias and standard errors.

As can be seen from the table, in very few cases there is evidence of Z exceeding 1. In all cases shown, except for the 0.2 quantile for Atlanta in the 1980s, we can reject the null $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0479$ . Similar patterns hold for almost all of the 118 cities in the dataset. Rather than exhibiting multiple equilibria, the data indicate a general rise in minority share that is largest for neighbourhoods with intermediate initial share, but not to the extent of leading to tipping behaviour. Proposition 3.1 suggests that, if we do not find multiple roots in quantile regressions, we can reject multiple equilibria in the underlying structural relationship. I take these results as indicative that tipping is not a widespread phenomenon in US ethnic neighbourhood composition over the decades under consideration. This stands in contrast to the conclusion of Card et al. (2008), who do find evidence of tipping.

The approach used here differs from the main analysis in Card et al. (2008) in a number of ways. Card et al. (2008) (a) use polynomial least-squares regression with a discontinuity. They (b) use a split sample method to test for the presence of a discontinuity, and they (c) regress the change in the non-Hispanic, white population, divided by initial neighbourhood population, on initial minority share. We (a) use local linear quantile regression without a discontinuity, we (b) run the regressions on full samples for each MSA and test for the number of roots, and we (c) regress the change in minority share on initial minority share.

To check whether the differing results are due to variable choice (c) rather than testing procedure, the left column of Figure 6 and Table 1 are replicated using the change in the non-Hispanic, white population relative to initial population as the dependent variable, as did Card et al. (2008). The right column of Figure 6 shows such quantile regressions. These figures correspond to the ones in Card et al. (2008, p. 190), using the same variables but a different regression method and the full samples. Table 2 shows confidence sets for the number of roots of these regressions for the 12 largest MSAs. In comparing Tables 1 and 2, note that there is a correspondence between the lower quantiles of the first (low increase in minority share) and the upper quantiles of the latter (higher increase/lower decrease of white population). The two tables show fairly similar results. Again, no systematic evidence of multiple roots is found.

Table 2. 0.95 confidence sets for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0480$ by decade and quantile, change in white population

	1970s			1980s			1990s
MSA	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0481$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0482$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0483$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0484$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0485$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0486$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0487$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0488$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0489$
New York, NY PMSA	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]
Los Angeles-Long Beach, CA PMSA	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]
Chicago, IL PMSA	[0,1]	[0,1]	[0,1]	[0,0]	[0,1]	[1,1]	[0,1]	[0,1]	[0,1]
Dallas, TX PMSA	[0,1]	[0,1]	[0,1]	[0,0]	[1,1]	[0,2]	[0,1]	[1,1]	[0,1]
Philadelphia, PA-NJ PMSA	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[0,1]	[1,1]
Houston, TX PMSA	[0,1]	[0,1]	[0,1]	[1,1]	[1,1]	[1,1]	[0,1]	[0,1]	[0,1]
Miami, FL PMSA	[0,1]	[0,1]	[0,1]	[0,0]	[0,0]	[1,1]	[1,1]	[1,1]	[1,1]
Washington, DC-MD-VA-WV PMSA	[0,1]	[0,0]	[0,1]	[0,0]	[1,1]	[0,0]	[0,1]	[0,1]	[0,1]
Atlanta, GA MSA	[0,1]	[1,1]	[0,1]	[1,1]	[1,1]	[1,1]	[1,1]	[1,2]	[0,1]
Boston, MA-NH PMSA	[0,1]	[0,1]	[0,1]	[0,0]	[0,0]	[1,1]	[0,0]	[0,1]	[0,1]
Detroit, MI PMSA	[0,1]	[0,1]	[0,1]	[0,0]	[0,0]	[1,1]	[0,1]	[0,1]	[0,1]
Phoenix-Mesa, AZ MSA	[0,1]	[0,1]	[0,1]	[0,0]	[1,1]	[0,0]	[0,1]	[0,1]	[0,1]
San Francisco, CA PMSA	[0,1]	[0,1]	[0,1]	[0,0]	[0,0]	[0,0]	[0,0]	[1,1]	[0,0]

Note

The table shows confidence intervals in the integers for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0490$ for the 12 largest MSAs of the United States, ordered by population, where g is estimated by quantile regression of the change in the non-Hispanic, white population over a decade, divided by initial total population, on the initial minority share for the quantiles 0.2, 0.5 and 0.8. Regression bandwidth $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0782$ is $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0491$ , and σ is chosen as 0.05 times the maximal change. Confidence sets are based on t-statistics using bootstrapped bias and standard errors.

Some factors might lead to a bias in the estimated number of equilibria, using the methods developed here. First, the test might be sensitive to the chosen range of integration if there are roots near the boundary. If a root lies right on the boundary of the chosen range of integration, it enters $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0492$ as 1/2 only. Extending the range of integration beyond the unit interval, however, might also lead to an upward bias in the estimated number of roots, if extrapolated regression functions intersect with the horizontal axis. Second, choosing a bandwidth parameter ρ that is too large might bias the estimated number of equilibria downwards, if the function g peaks within the range $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0493$ . Third, there might be roots of g in the unit interval but beyond the support of the data.

5. SUMMARY AND CONCLUSION

This paper proposes an inference procedure for the number of roots of functions non-parametrically identified using conditional moment restrictions, and develops the corresponding asymptotic theory. In particular, it is shown that a smoothed plug-in estimator of the number of roots is superconsistent under i.i.d. asymptotics, but asymptotically normal under non-standard asymptotics, and asymptotically efficient relative to a simple plug-in estimator. In Section 3., these results are extended to cover various more general cases, allowing for covariates as controls, higher-dimensional domain and range, and for inference on the number of equilibria with various stability properties. This section also discusses how to apply the results to static games of incomplete information and to stochastic difference equations. In an application of the methods developed here to data on neighbourhood composition dynamics in the United States, no evidence of multiple of equilibria is found.

The inference procedure can also be used to test for bifurcations (i.e. (dis)appearing equilibria as a function of changing exogenous covariates). It is easy to test the hypothesis $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0494$ , because the corresponding estimators $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0495$ are independent for W₁ and W₂ further apart than twice the bandwidth τ. If there are bifurcations, small exogenous shifts might have a large (discontinuous) effect on the equilibrium attained, if the ‘old’ equilibrium disappears.

In the dynamic set-up, one might furthermore consider to apply the procedure to detrended data (e.g. by demeaning $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0496$ ). It seems likely that regressions of detrended data have a higher number of roots. The rationale of such an approach could be found in underlying models in which the dynamics of a detrended variable are stationary. This is, in particular, the case in Solow-type growth models, in which GDP or capital stock is stationary after normalizing by a technological growth factor.

Finally, it might also be interesting to extend the results obtained here to cover further cases where g cannot be directly estimated using conditional moment restrictions. The crucial step for such extensions, as illustrated by the various cases discussed in Section 3., is to find a sequence of experiments such that the first-stage estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0497$ converges in probability to a degenerate limit whereas $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0498$ converges in distribution to a non-degenerate limit. Furthermore, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0499$ needs to be asymptotically independent of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0500$ for all $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0501$ . There are many potential applications of the results obtained here, where it might be interesting to know whether the underlying dynamics or strategic interactions imply multiple equilibria. Examples include household level poverty traps, intergenerational mobility, efficiency wages, macro models of economic growth (as analysed in the online Appendix), financial market bubbles (herding), market entry and social norms.

ACKNOWLEDGEMENTS

I thank seminar participants at UC Berkeley, UCLA, USC, Brown, NYU, UPenn, LSE, UCL, Sciences Po, TSE, Mannheim and IHS Vienna for their helpful comments and suggestions. I particularly thank Tim Armstrong, David Card, Kiril Datchev, Victor Chernozhukov, Jinyong Hahn, Michael Jansson, Bryan Graham, Susanne Kimm, Patrick Kline, Rosa Matzkin, Enrico Moretti, Denis Nekipelov, James Powell, Alexander Rothenberg, Jesse Rothstein, James Stock and Mark van der Laan for many valuable discussions, and David Card, Alexander Mas and Jesse Rothstein for the access provided to their data. This work was supported by a DOC fellowship from the Austrian Academy of Sciences at the Department of Economics, UC Berkeley.

Appendix A: MONTE CARLO EVIDENCE

This section presents simulation results to check the accuracy in finite samples of the asymptotic approximations obtained in Theorem 2.2. In all simulations, X are i.i.d. draws of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0503$ random variables, and the additive errors γ are either uniformly or normally distributed,

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0504$ (A.1)

where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0505$ is an appropriately centred and scaled uniform or normal distribution. Two functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0506$ are considered, the first with one root and the second with three roots:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0507$

The function g is estimated by median regression, mean regression and 0.9 quantile regression, where the γ in the simulations are shifted appropriately to have median, mean or 0.9 quantile at the respective g. Figures A.1–A.3 and Table A.1 show sequences of four experiments with 400, 800, 1,600 and 3,200 observations. These models are chosen to be comparable to the empirical application discussed in Section 4.. The variance of γ in each experiment is chosen to yield the same variance for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0508$ , as implied by the asymptotic approximation of the Bahadur expansion, in all experiments for a given g. By the proof of Theorem 2.2, we should therefore obtain similar simulation results across all set-ups. Furthermore, the variance of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0509$ should be constant up to a factor $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0510$ . The parameters of these simulations are chosen to lie in an intermediate range where variation in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0511$ is existent but moderate.

Figure A.1 shows density plots for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0512$ from the sequences of Monte Carlo experiments with uniform errors and g identified by median regression, as described in this appendix; in the online Appendix, similar figures are presented for the other experiments. The upper graph shows the distribution from four experiments with increasing sample size n and correspondingly growing variance of the residual γ, where the true parameter Z equals one. The same holds for the lower graph, except that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0513$ . As predicted by Theorem 2.2, biases are positive, and both bias and variance are decreasing in n. Figure A.2 shows the distribution of the naive plug-in estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0514$ , from the same simulations as in Figure A.1. It was shown in Section 2. that this estimator is asymptotically inefficient relative to the smoothed plug-in estimator. This relative inefficiency is reflected in a larger dispersion in the simulations, as can be seen by comparing Figures A.1 and A.2. Figure A.3 shows density plots for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0515$ , normalized by its sample mean and standard deviation, from the same simulations as in Figure A.1. It also shows, as a reference, the density of a standard normal. These plots suggest that the sample distribution of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0516$ is somewhat right-skewed relative to a normal distribution.

**Figure A.1**
Open in figure viewer PowerPoint

Density of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0517$ in Monte Carlo experiments.

**Figure A.2**
Open in figure viewer PowerPoint

Distribution of simple plug-in estimator $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0518$ in Monte Carlo experiments.

**Figure A.3**
Open in figure viewer PowerPoint

Density of normalized $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0519$ in Monte Carlo experiments.

Table 3 shows the results of simulations using bootstrapped standard deviations and biases, for mean regression with uniform errors. The results show, for the range of experiments considered, that rejection frequencies are lower than the 0.05 value implied by asymptotic theory. If this pattern generalizes, inference based upon the t-statistic proposed in this paper is conservative in finite samples. In particular, it seems that bootstrapped standard errors are too large.

Table 3. Monte Carlo rejection probabilities

n	τ	r	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0520$	$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0521$
400	0.065	0.179	0.05	0.01
800	0.059	0.194	0.03	0.02
1,600	0.055	0.231	0.02	0.01
3,200	0.052	0.290	0.02	0.01
400	0.065	0.268	0.03	0.02
800	0.059	0.292	0.01	0.02
1,600	0.055	0.347	0.01	0.01
3,200	0.052	0.434	0.01	0.02

Note

This table shows the frequency of rejection of the null under a test of asymptotic level 5%, for the sequences of Monte Carlo experiments described in Appendix A. The g are estimated by mean regression, the errors are uniformly distributed, and the first four experiments are generated using g¹ with one root, the next four using g² with three roots. The columns show sample size, regression bandwidth, error standard deviation and the rejection probabilities of one-sided tests, respectively.

Appendix B: PROOFS

Proof of Proposition 2.1.By continuity of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0522$ as well as genericity of g, we can choose ρ small enough such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0523$ is constantly equal to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0524$ in each of the neighbourhoods of the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0525$ roots of g, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0526$ , defined by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0527$ . Hence, we can write the integral $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0528$ as a sum of integrals over these neighbourhoods, in each of which there is exactly one root. Assume w.l.o.g. that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0529$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0530$ is constant in the range of x where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0531$ . Then, by a change of variables setting $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0532$ ,

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0533$

Proof of Proposition 2.2.We need to find ε such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0534$ implies $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0535$ . By genericity of g, each root $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0536$ of g is such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0537$ . By continuous first derivatives, we can then find δ such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0538$ is constant in the neighbourhood $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0539$ of each of the finitely many roots $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0540$ and the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0541$ are mutually disjoint. By continuity of g,

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0542$ (B.1)

and

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0543$ (B.2)

where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0544$ is the closure of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0545$ . Choosing $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0546$ fulfils our purpose. To see this, choose $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0547$ such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0548$ . For $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0549$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0550$ is bounded away from zero by B.1. In $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0551$ , there must be exactly one x such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0552$ : Because $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0553$ are mutually disjoint, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0554$ , by B.1 again $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0555$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0556$ , and finally the sign of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0557$ is constantly equal to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0558$ in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0559$ by B.2.

The assertion for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0560$ follows now from the first part of this proof, combined with Proposition 2.1, if we can choose ρ independent of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0561$ such that Proposition 2.1 applies. Sufficient for this is ρ that separates roots. Choosing $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0562$ accomplishes this. By B.1, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0563$ will separate the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0564$ , and by the previous argument each of the $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0565$ will contain exactly one root of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0566$ . $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0567$

Proof of Theorem 2.2.We use $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0568$ to denote a sequence of approximations to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0569$ . Write $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0570$ , if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0571$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0572$ have the same non-degenerate distributional limit for some non-random sequences $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0573$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0574$ . In particular, as long as such sequences exist that guarantee convergence to a non-degenerate limit, this is implied by equality up to a remainder, which is asymptotically negligible under the given sequence of experiments (i.e. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0575$ if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0576$ ).

1. Approximation of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0577$ with g

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0578$

The remainder of this approximation is given by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0579$

Negligibility of this remainder follows if we can show uniform convergence of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0580$ at a rate faster than ρ under our sequence of experiments. Under the given sequence of experiments, the variance of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0581$ is of order $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0582$ – this follows from the Bahadur expansion. Because we have assumed $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0583$ , we obtain $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0584$ , which implies pointwise convergence. Pointwise convergence at rate τ implies uniform convergence at the slightly slower rate $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0585$ , which is faster than ρ by our assumptions on τ and ρ; for background on uniform convergence of kernel estimators, see, e.g. Appendix A.1 of Armstrong (2014).

The fact that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0586$ implies that the remainder $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0587$ is of the same order. To see this, note that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0588$ is $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0589$ , so that the remainder is of the same order as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0590$ . The integrand of this expression is non-zero only in a neighbourhood of size of order ρ of the roots of g; the difference $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0591$ is of order $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0592$ because $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0593$ is Lipschitz with constant $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0594$ , so that the claim follows.

Thus, we have shown that the remainder $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0595$ is of order $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0596$ ; this is smaller than the order of the leading term of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0597$ , which we show to be $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0598$ . The remainder is thus asymptotically negligible.

From the approximation $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0599$ we immediately obtain $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0600$ if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0601$ , because in that case $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0602$ for ρ small enough. The claim of Theorem 2.2 is thus trivially satisfied for the case $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0603$ , and we assume $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0604$ for the rest of this proof.

2. Approximation of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0605$ by the Bahadur expansion.

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0606$

The absolute value of the remainder of this approximation is less than or equal to

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0607$

where R is the remainder of the Bahadur expansion. Negligibility of the remainder of the approximation is a consequence of the assumption that the remainder of the Bahadur expansion is negligible (i.e. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0608$ uniformly in x).

3. Restriction to one root at 0 and Taylor approximations

Assume that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0609$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0610$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0611$ (i.e. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0612$ ). This is without loss of generality, because the integral for the general case is simply a sum of the independent integrals in a neighbourhood of each root.

Now define $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0613$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0614$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0615$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0616$ .

By replacing g with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0617$ in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0618$ and replacing $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0619$ with w, both justified by smoothness and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0620$ , as well as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0621$ uniformly, we obtain

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0622$

where we use $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0623$ to denote the sample average. The absolute value of the remainder of this approximation is less than or equal to

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0624$

Here, we use ∑ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0625$ as shorthand for the sum and sample average of the previous display. Both terms in this expression go to 0 as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0626$ . We can assume furthermore that

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0627$

conditional on falling in this interval, and that

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0628$

These assumptions are justified by another Taylor approximation, this time of the distribution functions $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0629$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0630$ , assuming both distribution functions to be differentiable at 0. To see that this approximation is justified, note that distributional convergence to the same limit is equivalent to convergence of the expectations of any Lipschitz continuous bounded function of the statistics to the same limit. The difference in expectations between a function h of Z² and of its approximation using conditionally uniform X and i.i.d. ϕ is given by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0631$

This integral goes to 0 because the support of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0632$ in X is a neighbourhood of 0 shrinking to 0.

4. Partitioning the range of integration

Partition $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0633$ into subintervals $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0634$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0635$ with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0636$ . Then

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0637$

with

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0638$

The remainder of this approximation is given by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0639$

This approximation is warranted by Lipschitz continuity of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0640$ with a Lipschitz constant of order $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0641$ , and by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0642$ .

5. Poisson approximation

The following argument essentially replaces the number of X falling into the interval $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0643$ , which is approximately distributed $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0644$ , with a Poisson random variable with parameter $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0645$ ; the distribution of everything else conditional on this number remains the same.

Let $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0646$ be distributed i.i.d. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0647$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0648$ . This is an approximation to the number of X falling into the bin $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0649$ . Draw $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0650$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0651$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0652$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0653$ . Now define

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0654$

Then

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0655$

where $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0656$ are identically distributed and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0657$ is independent of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0658$ for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0659$ .

Conditional on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0660$ , the equality is exact. The exact distribution of the number of observations falling in the interval $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0661$ , corresponding to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0662$ , would be given by

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0663$

The Poisson approximation sets the latter part of this expression to a constant in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0664$ . This is justified by the usual arguments deriving the Poisson distribution as a limit of Binomial distributions. The approximation of Z³ follows by an argument similar to the second part of step 3 in this proof, once we note that the multinomial probability mass function converges uniformly.

6. Moments of the integrals over the subintervals

(a)
$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0665$ .
(b)
$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0666$ .
(c)
$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0667$ .
(d)
$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0668$ .

These equations follow from noting first pointwise convergence to normality of

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0669$

under our sequence of experiments. This is the point where the rate $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0670$ matters:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0671$

Here, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0672$ are i.i.d. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0673$ . Now asymptotic normality follows by noting

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0674$

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0675$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0676$ . Similarly

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0677$

Second, a change of the order of integration and the limit in n delivers the claims, where this change of order is justifiable by the dominated convergence theorem. For instance,

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0678$

7. Central limit theorem applied to the sum of integrals over the subintervals

Now apply a central limit theorem for m-dependent sequences to the sum of integrals. For a definition of m-dependence, see Hoeffding and Robbins (1994). Note that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0679$ is an m-dependent sequence with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0680$ . We have

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0681$

Asymptotic normality for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0682$ follows, and by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0683$ , the same holds for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0684$ . Furthermore, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0685$ , and hence so is $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0686$ . $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0687$

Proof of Theorem 2.3.Fix one of the roots x₀ of g. By the arguments of the proof of Theorem 2.2, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0688$ (not to be confused with $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0689$ ) converges to a non-degenerate normal distribution for all x. In particular,

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0690$

By uniform convergence in levels of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0691$ and the intermediate value theorem (compare also Figure 2),

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0692$

This proofs the first claim. The second claim now immediately follows from $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0693$ . $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0694$

Proof of Theorem 3.1 (Sketch): We approximate $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0695$ by a criterion function that has the form of 2.3 (i.e. a local weighted average over the empirical distribution of some objective function). Based on this approximation, we can then again apply the results of Kong et al. (2010). Newey (1994) provides a set of results that facilitate such approximations of partial means. In particular, Lemma 5.4 in Newey (1994) allows derivation of the required approximation by replacing the outer sum over j in 3.2 with an expectation, and by linearizing the fraction inside. The first replacement is asymptotically warranted because the variation created by averaging over the empirical distribution is of order $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0696$ and is hence dominated by the variation in the non-parametric component. The second replacement follows from differentiability and requires, in particular, that the denominator of the fraction be asymptotically bounded away from zero. This is guaranteed by the requirement that W₂ has full conditional support given $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0697$ . Formally, Lemma 5.4 in Newey (1994) gives

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0698$

where

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0699$ (B.3)

This approximation of the objective function has the general form assumed in Kong et al. (2010) if we set

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0700$ (B.4)

providing us with the desired Bahadur expansion. Choosing the appropriate sequence of experiments, from here on the entire proof and result of Theorem 2.2 go through unchanged. If $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0701$ const, the rates have to be adapted as follows. The number of observations within each rectangle of size $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0702$ goes to ∞ if $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0703$ . Finally, the variance of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0704$ converges iff $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0705$ . $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0706$

Proof of Theorem 3.2.The proof requires the following modifications relative to the one-dimensional case. Assumption 2.2 is still applicable, where the only difference in the d-dimensional case is that 2.6 has to be multiplied by $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0707$ . For $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0708$ to have a pointwise non-degenerate distributional limit, we have to choose the rate $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0709$ to equal $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0710$ , which is slower for higher d. To see this, note that Var $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0711$ . Here, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0712$ is Lipschitz continuous of order $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0713$ , so that we require $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0714$ for step 4 of the proof of Theorem 2.2. The range of integration has to be partitioned into rectangular subranges of area $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0715$ instead of intervals of length τ. There will be approximately const $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0716$ such subintegrals. The variance of the integral of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0717$ over each of these subranges will be of order $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0718$ , similarly for expectations and covariances. This yields a variance of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0719$ of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0720$ ; see step 7 of the proof of Theorem 2.2. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0721$

Proof of Theorem 3.3.By 3.14 and 3.16, it is sufficient to show that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0722$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0723$ converge jointly in distribution, while $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0724$ , as well as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0725$ , converge in probability. These claims follow as before if we combine the convergence of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0726$ from display 3.22 with Bahadur expansion 2.6 for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0727$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0728$ , where the latter are evaluated at $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0729$ , which is not constant but converges. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0730$

Proof of Lemma 3.1.By definition of conditional quantiles, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0731$ . Differentiating this with respect to X gives

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0732$ (B.5)

The differential in the numerator has two components, one due to the structural relation between $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0733$ and X (i.e. the derivative with respect to the argument X of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0734$ ), and one due to the stochastic dependence of X and ε:

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0735$

This can be seen as follows: We can decompose the derivative according to

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0736$

To simplify the first derivative, note that by iterated expectations

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0737$

Differentiating this with respect to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0738$ gives

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0739$

The claim now is immediate. $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0740$

Proof of Proposition 3.1.Because X and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0741$ have their support in the interval [0, 1], $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0742$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0743$ . Therefore, the unique root X of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0744$ must be stable, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0745$ . By Lemma 3.1 and Assumption 3.3, this implies that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0746$ .

Finally, note that for all X where (0, X) is in the support of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0747$ , there exists a q such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0748$ . $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0749$

Proof of Proposition 3.2.The claims are immediate, noting that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0750$ and similarly for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0751$ . Furthermore, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0752$ for all s, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0753$ and $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0754$ for all s, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0755$ . Next, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0756$ on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0757$ , $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0758$ from which negativity on $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0759$ follows, similarly for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0760$ .

Finally, under monotonicity of potential outcomes, assuming for simplicity differentiability of g,

$urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0761$

The numerator is always positive by assumption, the denominator is negative for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0762$ and positive for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0763$ because we had assumed g positive for sufficiently small x. Hence, $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0764$ is positive for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0765$ and negative for $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0766$ . $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0767$

Supporting Information

REFERENCES

Andrews, D. W. and X. Cheng (2012). Estimation and inference with weak, semi-strong, and strong identification. Econometrica 80, 2153–211.
10.3982/ECTA9456
Web of Science® Google Scholar
Aradillas-Lopez, A. (2010). Semiparametric estimation of a simultaneous game with incomplete information. Journal of Econometrics 157, 409–31.
10.1016/j.jeconom.2010.03.043
Web of Science® Google Scholar
Armstrong, T. B. (2014). Weighted KS statistics for inference on conditional moment inequalities. Journal of Econometrics 181, 92–116.
10.1016/j.jeconom.2014.04.021
Web of Science® Google Scholar
Azariadis, C. and J. Stachurski (2005). Poverty traps. In P. Aghion and S. N. Durlauf (Eds.), Handbook of Economic Growth, Volume 1, 295–384. Amsterdam: Elsevier.
10.1016/S1574-0684(05)01005-1
Google Scholar
Bajari, P., H. Hong, J. Krainer and D. Nekipelov (2010). Estimating static models of strategic interactions. Journal of Business and Economic Statistics 28, 469–82.
10.1198/jbes.2009.07264
Web of Science® Google Scholar
Becker, G. and K. Murphy (2000). Social Economics: Market Behavior in a Social Environment. Cambridge, MA: Harvard University Press.
10.4159/9780674020641
Google Scholar
Berry, S. (1992). Estimation of a model of entry in the airline industry. Econometrica 60, 889–917.
10.2307/2951571
Web of Science® Google Scholar
Bowles, S., S. Durlauf and K. Hoff (2006). Poverty Traps. Princeton, NJ: Princeton University Press.
Google Scholar
Bresnahan, T. and P. Reiss (1991). Entry and competition in concentrated markets. Journal of Political Economy 99, 977–1009.
10.1086/261786
Web of Science® Google Scholar
Card, D., A. Mas and J. Rothstein (2008). Tipping and the dynamics of segregation. Quarterly Journal of Economics 123, 177–218.
10.1162/qjec.2008.123.1.177
Web of Science® Google Scholar
Choirat, C. and R. Seri (2012). Estimation in discrete parameter models. Statistical Science 27, 278–93.
10.1214/11-STS371
Web of Science® Google Scholar
Dasgupta, P. and D. Ray (1986). Inequality as a determinant of malnutrition and unemployment: theory. Economic Journal 96(384), 1011–34.
10.2307/2233171
Web of Science® Google Scholar
De Paula, A. and X. Tang (2012). Inference of signs of interaction effects in simultaneous games with incomplete information. Econometrica 80, 143–72.
10.3982/ECTA9216
Web of Science® Google Scholar
Fischer, N., E. Mammen, and J. Marron (1994). Testing for multimodality. Computational Statistics and Data Analysis 18, 499–512.
10.1016/0167-9473(94)90080-9
Google Scholar
Giné, E., D. Mason and A. Zaitsev (2003). The l¹-norm density estimator process. Annals of Probability 31, 719–68.
10.1214/aop/1048516534
Web of Science® Google Scholar
Hoeffding, W. and H. Robbins (1994). The central limit theorem for dependent random variables. In N. I. Fisher and P. K. Sen (Eds.), The Collected Works of Wassily Hoeffding, 205–213. New York, NY: Springer.
10.1007/978-1-4612-0865-5_9
Google Scholar
Horowitz, J. (2001). The bootstrap. In J. Heckman and E. Leamer (Eds.), Handbook of Econometrics, Volume 5, 3159–228. Amsterdam: North-Holland.
10.1016/S1573-4412(01)05005-X
Google Scholar
Horváth, L. (1991). On $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0502$ -norms of multivariate density estimators. Annals of Statistics 19, 1933–49.
10.1214/aos/1176348379
Web of Science® Google Scholar
Imbens, G. and J. Wooldridge (2007). What's new in econometrics? Weak instruments and many instruments. NBER Lecture Notes 13, Summer 2007.
Google Scholar
Kasy, M. (2015). Identification in a model of sorting with social externalities and the causes of urban segregation. Journal of Urban Economics 85, 16–33.
10.1016/j.jue.2014.10.003
Web of Science® Google Scholar
Kong, E., O. Linton and Y. Xia (2010). Uniform Bahadur representation for local polynomial estimates of m-regression and its application to the additive model. Econometric Theory 26, 1–36.
10.1017/S0266466609990661
Web of Science® Google Scholar
Lewbel, A. and X. Tang (2011). Identification and estimation of games with incomplete information using excluded regressors. Working paper, Boston College.
Google Scholar
Mas-Colell, A., M. Whinston and J. Green (1995). Microeconomic Theory. New York, NY: Oxford University Press.
Google Scholar
Newey, W. K. (1994). Kernel estimation of partial means and a general variance estimator. Econometric Theory 10, 233–53.
10.1017/S0266466600008409
Web of Science® Google Scholar
Quah, D. (1996). Empirics for economic growth and convergence. European Economic Review 40, 1353–75.
10.1016/0014-2921(95)00051-8
Web of Science® Google Scholar
Staiger, D. and J. H. Stock (1997). Instrumental variables regression with weak instruments. Econometrica 65, 557–86.
10.2307/2171753
Web of Science® Google Scholar
van derVaart, A. (1998). Asymptotic Statistics. Cambridge: Cambridge University Press.
10.1017/CBO9780511802256
Google Scholar
Young, H. (2008). Social norms. In S. Durlauf and L. Blume (Eds.), The New Palgrave Dictionary of Economics ( 2nd ed.). Basingstoke: Palgrave Macmillan.
10.1057/978-1-349-95121-5_2338-1
Google Scholar

1 ‘System’ might refer to households, firms, urban neighbourhoods, national economies, etc.

2 An estimator is called superconsistent if it converges at a rate faster than the usual parametric rate, which equals the square root of the sample size.

3 Other related papers from the recent literature include Aradillas-Lopez (2010), Lewbel and Tang (2011) and de Paula and Tang (2012). In contrast to these, we do not assume additively separable heterogeneity in latent payoffs. We can do this because we are only interested in response functions, not in latent utility. Note that our paper does not contribute to the literature discussing identification and estimation problems in games of complete information with multiple equilibria.

4 Suppose that g has an infinite number of roots in the compact set $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0768$ . Then, the set of x such that $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0769$ has an accumulation point in $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0770$ . At this accumulation point, genericity is violated.

5 The following theorem requires uniform convergence in probability of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0771$ to $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0772$ . Note that this is a slightly different condition from convergence of $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0773$ w.r.t. the norm $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0774$ because $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0775$ need not equal $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0776$ .

6 Kong et al. (2010) provide regularity conditions under which $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0777$ uniformly in X, for some $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0778$ as $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0779$ for stationary mixing processes.

7 The approach of this paper, using local asymptotics, contrasts with the approach taken by most of the literature discussing inference on discrete valued parameters, testing and model selection. As argued by Choirat and Seri (2012), this literature has mostly focused on the use of large deviations asymptotics. The reason is that consistent estimators for discrete objects tend to converge at an exponential rate. Which type of asymptotics provides a more accurate approximation of finite sample distributions ultimately depends on the specific data-generating process; see Andrews and Cheng (2012). We should also mention the literature on testing for multimodality of densities (which is also based on i.i.d. asymptotics); see, e.g. Fischer et al. (1994).

8 We could also define an equivalent sequence of experiments holding constant the amounts of noise and shrinking the signal.

9 The proof of Theorem 2.2 uses somewhat similar arguments as Horváth (1991) and Giné et al. (2003), who discuss the asymptotic distribution of the L¹ norm ( $urn:x-wiley:13684221:media:ectj12043:ectj12043-math-0780$ norm) of kernel density estimators.

10 I thank an anonymous referee for the suggestions discussed in this subsection.

11 This is an important restriction. It precludes, in particular, application of this set-up to correlated value auctions.

12 The implementation of local linear quantile regression uses code downloaded from http://www.econ.uiuc.edu/∼roger/research/rq/rq.html.

13 The full set of results for all 115 MSAs in the dataset can be found in the online Appendix.

Citing Literature

Volume18, Issue1

February 2015

Pages 1-39

Non-parametric inference on the number of equilibria

Summary

1. INTRODUCTION

2. INFERENCE IN THE BASELINE CASE

2.1. Set-up

2.2. Basic properties and consistency

2.3. Asymptotic normality and relative efficiency

2.4. Alternative approaches

3. EXTENSIONS AND APPLICATIONS

3.1. Conditioning on covariates

3.2. Higher-dimensional systems

3.3. Stable and unstable roots

3.4. Static games of incomplete information

3.5. Stochastic difference equations

4. APPLICATION TO THE DYNAMICS OF NEIGHBOURHOOD COMPOSITION

Note

Note

5. SUMMARY AND CONCLUSION

ACKNOWLEDGEMENTS

Appendix A: MONTE CARLO EVIDENCE

Note

Appendix B: PROOFS

Supporting Information

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Non-parametric inference on the number of equilibria

Summary

1. INTRODUCTION

2. INFERENCE IN THE BASELINE CASE

2.1. Set-up

2.2. Basic properties and consistency

2.3. Asymptotic normality and relative efficiency

2.4. Alternative approaches

3. EXTENSIONS AND APPLICATIONS

3.1. Conditioning on covariates

3.2. Higher-dimensional systems

3.3. Stable and unstable roots

3.4. Static games of incomplete information

3.5. Stochastic difference equations

4. APPLICATION TO THE DYNAMICS OF NEIGHBOURHOOD COMPOSITION

Note

Note

5. SUMMARY AND CONCLUSION

ACKNOWLEDGEMENTS

Appendix A: MONTE CARLO EVIDENCE

Note

Appendix B: PROOFS

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information