Volume 87, Issue 4 pp. 1397-1432
Notes and Comments
Full Access

Confidence Intervals for Projections of Partially Identified Parameters

Hiroaki Kaido

Hiroaki Kaido

Department of Economics, Boston University

Search for more papers by this author
Francesca Molinari

Francesca Molinari

Department of Economics, Cornell University

Search for more papers by this author
Jörg Stoye

Jörg Stoye

Department of Economics, Cornell University

We are grateful to three anonymous reviewers for very useful suggestions that substantially improved the paper. We thank for their comments Ivan Canay and seminar and conference participants at Amsterdam, Bonn, BC/BU joint workshop, Brown, Cambridge, Chicago, Cologne, Columbia, Cornell, CREST, Duke, ECARES, Harvard/MIT, Kiel, Kobe, Luxembourg, Mannheim, Maryland, Michigan, Michigan State, NUS, NYU, Penn, Penn State, Rochester, Royal Holloway, SMU, Syracuse, Toronto, Toulouse, UCL, UCLA, UCSD, Vanderbilt, Vienna, Yale, Western, and Wisconsin as well as CEME, Cornell-Penn State IO/Econometrics 2015 Conference, ES Asia Meeting 2016, ES European Summer Meeting 2017, ES North American Winter Meeting 2015, ES World Congress 2015, Frontiers of Theoretical Econometrics Conference (Konstanz), KEA-KAEA International Conference, Notre Dame Second Econometrics Workshop, Verein für Socialpolitik Ausschuss für Ökonometrie 2017. We are grateful to Undral Byambadalai, Zhonghao Fu, Debi Mohapatra, Sida Peng, Talal Rahim, Matthew Thirkettle, and Yi Zhang for excellent research assistance. A MATLAB package implementing the method proposed in this paper, Kaido, Molinari, Stoye, and Thirkettle (2017), is available at https://github.com/MatthewThirkettle/calibrated-projection-MATLAB. We are especially grateful to Matthew Thirkettle for his contributions to this package. We gratefully acknowledge financial support through NSF Grants SES-1230071 and SES-1824344 (Kaido), SES-0922330 and SES-1824375 (Molinari), and SES-1260980 and SES-1824375 (Stoye).Search for more papers by this author
First published: 25 July 2019
Citations: 61

Abstract

We propose a bootstrap-based calibrated projection procedure to build confidence intervals for single components and for smooth functions of a partially identified parameter vector in moment (in)equality models. The method controls asymptotic coverage uniformly over a large class of data generating processes. The extreme points of the calibrated projection confidence interval are obtained by extremizing the value of the function of interest subject to a proper relaxation of studentized sample analogs of the moment (in)equality conditions. The degree of relaxation, or critical level, is calibrated so that the function of θ, not θ itself, is uniformly asymptotically covered with prespecified probability. This calibration is based on repeatedly checking feasibility of linear programming problems, rendering it computationally attractive.

Nonetheless, the program defining an extreme point of the confidence interval is generally nonlinear and potentially intricate. We provide an algorithm, based on the response surface method for global optimization, that approximates the solution rapidly and accurately, and we establish its rate of convergence. The algorithm is of independent interest for optimization problems with simple objectives and complicated constraints. An empirical application estimating an entry game illustrates the usefulness of the method. Monte Carlo simulations confirm the accuracy of the solution algorithm, the good statistical as well as computational performance of calibrated projection (including in comparison to other methods), and the algorithm's potential to greatly accelerate computation of other confidence intervals.

1 Introduction

This paper provides novel confidence intervals for projections and smooth functions of a parameter vector urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0001, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0002, that is partially or point identified through a finite number of moment (in)equalities. In addition, we develop a new algorithm for computing these confidence intervals and, more generally, for solving optimization problems with “black box” constraints, and obtain its rate of convergence.

Until recently, the rich literature on inference for moment (in)equalities focused on confidence sets for the entire vector θ, usually obtained by test inversion as
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0003
where the test statistic urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0004 aggregates violations of the sample analog of the moment (in)equalities and the critical value urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0005 controls asymptotic coverage, often uniformly over a large class of data generating processes (DGPs). However, applied researchers are frequently interested in a specific component (or function) of θ, for example, the returns to education. Even if not, they may simply want to report separate confidence intervals for components of a vector, as is standard practice in other contexts. Thus, consider inference on the projection urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0006, where p is a known unit vector. To date, it is common to report as confidence set the corresponding projection of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0007 or the interval
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0008(1.1)
which will miss any “gaps” in a disconnected projection but is much easier to compute. This approach yields asymptotically valid but typically conservative and therefore needlessly large confidence intervals. The potential severity of this effect is easily appreciated in a point identified example. Given a urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0009-consistent estimator urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0010 with limiting covariance matrix equal to the identity matrix, the usual 95% confidence interval for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0011 equals urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0012. Yet the analogy to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0013 would be projection of a 95% confidence ellipsoid, which with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0014 yields urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0015 and a true coverage of essentially 1.
Our first contribution is to provide a bootstrap-based calibrated projection method to largely anticipate and correct for the conservative effect of projection. The method uses an estimated critical level urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0016 calibrated so that the projection of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0017 covers urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0018 (but not necessarily θ) with probability at least urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0019. As a confidence region for the true urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0020, one may report this projection, that is,
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0021(1.2)
or, for computational simplicity and presentational convenience, the interval
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0022(1.3)
We prove uniform asymptotic validity of both over a large class of DGPs.

Computationally, calibration of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0023 is relatively attractive: We linearize all constraints around θ, so that coverage of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0024 can be calibrated by analyzing many linear programs. Nonetheless, computing the above objects is challenging in moderately high dimension. This brings us to our second contribution, namely, a general method to accurately and rapidly compute confidence intervals whose construction resembles (1.3). Additional applications within partial identification include projection of confidence regions defined in Chernozhukov, Hong, and Tamer (2007), Andrews and Soares (2010), or Andrews and Shi (2013), as well as (with minor tweaking; see Appendix B) the confidence interval proposed in Bugni, Canay, and Shi (2017, BCS henceforth) and further discussed later. In an application to a point identified setting, Freyberger and Reeves (2017, Supplement Section S.3) used our method to construct uniform confidence bands for an unknown function of interest under (nonparametric) shape restrictions. They benchmarked it against gridding and found it to be accurate at considerably improved speed. More generally, the method can be broadly used to compute confidence intervals for optimal values of optimization problems with estimated constraints.

Our algorithm (henceforth called E-A-M for Evaluation-Approximation-Maximization) is based on the response surface method; thus, it belongs to the family of expected improvement algorithms (see, e.g., Jones (2001), Jones, Schonlau, and Welch (1998), and references therein). Bull (2011) established convergence of an expected improvement algorithm for unconstrained optimization problems where the objective is a “black box” function. The rate of convergence that he derived depends on the smoothness of the black box objective function. We substantially extend his results to show convergence, at a slightly slower rate, of our similar algorithm for constrained optimization problems in which the constraints are sufficiently smooth “black box” functions. Extensive Monte Carlo experiments (see Appendix C and Section 5 of Kaido, Molinari, and Stoye (2017)) confirm that the E-A-M algorithm is fast and accurate.

Relation to Existing Literature. The main alternative inference procedure for projections—introduced in Romano and Shaikh (2008) and significantly advanced in BCS—is based on profiling out a test statistic. The classes of DGPs for which calibrated projection and the profiling-based method of BCS (BCS-profiling henceforth) can be shown to be uniformly valid are non-nested.

Computationally, calibrated projection has the advantage that the bootstrap iterates over linear as opposed to nonlinear programming problems. While the “outer” optimization problems in (1.3) are potentially intricate, our algorithm is geared toward them. Monte Carlo simulations suggest that these two factors give calibrated projection a considerable computational edge over profiling, though profiling can also benefit from the E-A-M algorithm. Indeed, in Appendix C, we replicate the Monte Carlo experiment of BCS and find that adapting E-A-M to their method improves computation time by a factor of about 4, while switching to calibrated projection improves it by a further factor of about 17.

In an influential paper, Pakes, Porter, Ho, and Ishii (2011, PPHI henceforth) also used linearization but, subject to this approximation, directly bootstrapped the sample projection. This is valid only under stringent conditions. Other related articles that explicitly consider inference on projections include Beresteanu and Molinari (2008), Bontemps, Magnac, and Maurin (2012), Kaido (2016), and Kline and Tamer (2016). None of these establish uniform validity of confidence sets. Chen, Christensen, and Tamer (2018) established uniform validity of MCMC-based confidence intervals for projections, but aimed at covering the projection of the entire identified region urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0025 (defined later) and not just of the true θ. Gafarov, Meier, and Montiel-Olea (2016) used our insight in the context of set identified spatial VARs.

Regarding computation, previous implementations of projection-based inference (e.g., Ciliberto and Tamer (2009), Grieco (2014), Dickstein and Morales (2018)) reported the smallest and largest value of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0026 among parameter values urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0027 that were discovered using, for example, grid-search or simulated annealing with no cooling. This becomes computationally cumbersome as d increases because it typically requires a number of evaluation points that grows exponentially with d. In contrast, using a probabilistic model, our method iteratively draws evaluation points from regions that are considered highly relevant for finding the confidence interval's endpoint. In applications, this tends to substantially reduce the number of evaluation points.

Structure of the Paper. Section 2 sets up notation and describes our approach in detail, including computational implementation of the method and choice of tuning parameters. Section 3.1 establishes uniform asymptotic validity of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0028, and Section 3.2 shows that our algorithm converges at a specific rate which depends on the smoothness of the constraints. Section 4 reports the results of an empirical application that revisits the analysis in Kline and Tamer (2016, Section 8). Section 5 draws conclusions. The proof of convergence of our algorithm is in Appendix A. Appendix B shows that our algorithm can be used to compute BCS-profiling confidence intervals. Appendix C reports the results of Monte Carlo simulations comparing our proposed method with that of BCS. All other proofs, background material for our algorithm, and additional results are in the Supplemental Material (Kaido, Molinari, and Stoye (2019)).

2 Detailed Explanation of the Method

2.1 Setup and Definition of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0031

Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0032 be a random vector with distribution P, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0033 denote the parameter space, and let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0034 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0035 denote known measurable functions characterizing the model. The true parameter value θ is assumed to satisfy the moment inequality and equality restrictions
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0036(2.1)
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0037(2.2)
The identification region urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0038 is the set of parameter values in Θ satisfying (2.1)–(2.2). For a random sample urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0039 of observations drawn from P, we write
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0040
for the sample moments and the analog estimators of the population moment functions' standard deviations urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0041. The confidence interval in (1.3) then is
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0042(2.3)
with
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0043(2.4)
and similarly for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0044. Henceforth, to simplify notation, we write urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0045 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0046. We also define urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0047 moments, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0048 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0049. That is, we treat moment equality constraints as two opposing inequality constraints.
For a class of DGPs urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0050 that we specify below, define the asymptotic size of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0051 by
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0053(2.5)
We next explain how to control this size and then how to compute urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0054.

2.2 Calibration of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0055

Calibration of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0056 requires careful analysis of the moment restrictions' local behavior at each point in the identification region. This is because the extent of projection conservatism depends on (i) the asymptotic behavior of the sample moments entering the inequality restrictions, which can change discontinuously depending on whether they bind at θ or not, and (ii) the local geometry of the identification region at θ, that is, the shape of the constraint set formed by the moment restrictions. Features (i) and (ii) can be quite different at different points in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0057, making uniform inference challenging. In particular, (ii) does not arise if one only considers inference for the entire parameter vector, and hence is a new challenge requiring new methods.

To build an intuition, fix urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0058 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0059. The projection of θ is covered when
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0060(2.6)
Here, we first substituted urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0061 and took λ to be the choice parameter; intuitively, this localizes around θ at rate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0062. We then make the event smaller by adding the constraint urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0063, with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0064 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0065 a tuning parameter. We motivate this step later.
Our goal is to set the probability of (2.6) equal to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0066. To ease computation, we approximate (2.6) by linear expansion in λ of the constraint set. For each j, add and subtract urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0067 and apply the mean value theorem to obtain
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0068(2.7)
Here urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0069 is a normalized empirical process indexed by urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0070, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0071 is the gradient of the normalized moment, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0072 is the studentized population moment, and the mean value urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0073 lies componentwise between θ and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0074.
We formally establish that the probability of the last event in (2.6) can be approximated by the probability that 0 lies between the optimal values of two stochastic linear programs. The components that characterize these programs can be estimated. Specifically, we replace urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0076 with a uniformly consistent (on compact sets) estimator, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0077, and the process urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0078 with its simple nonparametric bootstrap analog, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0079. Estimation of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0084 is more subtle because it enters (2.7) scaled by urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0085, so that a sample analog estimator will not do. However, this specific issue is well understood in the moment inequalities literature. Following Andrews and Soares (2010, AS henceforth) and others (Bugni (2010), Canay (2010), Stoye (2009)), we shrink this sample analog toward zero, leading to conservative (if any) distortion in the limit. Formally, we estimate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0086 by urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0087, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0088 is one of the Generalized Moment Selection (GMS henceforth) functions proposed by AS,
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0089
and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0090 is a user-specified thresholding sequence. In sum, we replace the random constraint set in (2.6) with the (bootstrap-based) random polyhedral set
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0096(2.8)
The critical level urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0097 to be used in (2.4) then is
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0098(2.9)
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0099(2.10)
where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0100 denotes the law of the random set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0101 induced by the bootstrap sampling process, that is, by the distribution of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0102 conditional on the data. Expression (2.10) uses convexity of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0103 and reveals that the probability inside curly brackets can be assessed by repeatedly checking feasibility of a linear program. We describe in detail in Supplemental Material Appendix D.4 how we compute urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0107 through a root-finding algorithm.

We conclude by motivating the “ρ-box constraint” in (2.6), which is a major novel contribution of this paper. The constraint induces conservative bias but has two fundamental benefits: First, it ensures that the linear approximation of the feasible set in (2.6) by (2.8) is used only in a neighborhood of θ, and therefore that it is uniformly accurate. More subtly, it ensures that coverage induced by a given c depends continuously on estimated parameters even in certain intricate cases. This renders calibrated projection valid in cases that other methods must exclude by assumption.

2.3 Computation of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0120 and of Similar Confidence Intervals

Projection-based methods as in (1.1) and (1.3) have nonlinear constraints involving a critical value which, in general, is an unknown function, with unknown gradient, of θ. Similar considerations often apply to critical values used to build confidence intervals for optimal values of optimization problems with estimated constraints. When the dimension of the parameter vector is large, directly solving optimization problems with such constraints can be expensive even if evaluating the critical value at each θ is cheap.

This concern motivates this paper's second main contribution, namely, a novel algorithm for constrained optimization problems of the following form:
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0121(2.11)
where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0122 is an optimal solution of the problem and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0123, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0124 as well as urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0125 are fixed functions of θ. In our own application, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0126 and, for calibrated projection, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0127.

The key issue is that evaluating urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0130 is costly. Our algorithm does so at relatively few values of θ. Elsewhere, it approximates urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0132 through a probabilistic model that gets updated as more values are computed. We use this model to determine the next evaluation point but report as tentative solution the best value of θ at which urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0133 was computed, not a value at which it was merely approximated. Under reasonable conditions, the tentative optimal values converge to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0134 at a rate (relative to iterations of the algorithm) that is formally established in Section 3.2.

After drawing an initial set of evaluation points that we set to grow linearly with d, the algorithm has three steps called E, A, and M below.

Initialization: Draw randomly (uniformly) over Θ a set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0135 of initial evaluation points. Evaluate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0136 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0137. Initialize urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0138.

E-step: Evaluate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0139 and record the tentative optimal value
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0140
with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0141.
A-step: Approximate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0142 by a flexible auxiliary model. We use a Gaussian-process regression model (or kriging), which for a mean-zero Gaussian process urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0143 indexed by θ and with constant variance urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0144 specifies
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0145(2.12)
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0146(2.13)
where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0147 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0148 is a kernel with parameter vector urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0149; for example, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0150. The unknown parameters urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0151 can be estimated by running a GLS regression of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0152 on a constant with the given correlation matrix. The unknown parameters β can be estimated by a (concentrated) MLE.
The (best linear) predictor of the critical value and its gradient at θ are then given by
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0153
where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0154 is a vector whose th component is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0155 as given above with estimated parameters, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0156, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0157 is an L-by-L matrix whose urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0158 entry is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0159 with estimated parameters. This surrogate model has the property that its predictor satisfies urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0160, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0161. Hence, it provides an analytical interpolation, with analytical gradient, of evaluation points of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0162. The uncertainty left in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0163 is captured by the variance
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0164
M-step: With probability urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0165, obtain the next evaluation point urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0166 as
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0167(2.14)
where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0168 is the expected improvement function. This step can be implemented by standard nonlinear optimization solvers, for example, MATLAB's fmincon or KNITRO (see Appendix D.3 for details). With probability urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0170, draw urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0171 randomly from a uniform distribution over Θ. Set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0172 and return to the E-step.

The algorithm yields an increasing sequence of tentative optimal values urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0173, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0174 , with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0175 satisfying the true constraints in (2.11) but the sequence of evaluation points leading to it obtained by maximization of expected improvement defined with respect to the approximated surface. Once a convergence criterion is met, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0176 is reported as the endpoint of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0177. We discuss convergence criteria in Appendix C.

The advantages of E-A-M are as follows. First, we control the number of points at which we evaluate the critical value; recall that this evaluation is the expensive step. Also, the initial k evaluations can easily be parallelized. For any additional E-step, one needs to evaluate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0178 only at a single point urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0179. The M-step is crucial for reducing the number of additional evaluation points. To determine the next evaluation point, it trades off “exploitation” (i.e., the benefit of drawing a point at which the optimal value is high) against “exploration” (i.e., the benefit of drawing a point in a region in which the approximation error of c is currently large) through maximizing expected improvement. Finally, the algorithm simplifies the M-step by providing constraints and their gradients for program (2.14) in closed form, thus greatly aiding fast and stable numerical optimization. The price is the additional approximation step. In the empirical application in Section 4 and in the numerical exercises of Appendix C, this price turns out to be low.

2.4 Choice of Tuning Parameters

Practical implementation of calibrated projection and the E-A-M algorithm is detailed in Kaido et al. (2017). It involves setting several tuning parameters, which we now discuss.

Calibration of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0180 in (2.10) must be tuned at two points, namely, the use of GMS and the choice of ρ. The trade-offs in setting these tuning parameters are apparent from inspection of (2.8). GMS is parameterized by a shrinkage function φ and a sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0181 that controls the rate of shrinkage. In practice, choice of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0182 is more delicate. A smaller urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0183 will make urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0184 larger, hence increase bootstrap coverage probability for any given c, hence reduce urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0185 and therefore make for shorter confidence intervals—but the uniform asymptotics will be misleading, and finite sample coverage therefore potentially off target, if urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0186 is too small. We follow the industry standard set by AS and recommend urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0187.

The trade-off in choosing ρ is similar but reversed. A larger ρ will expand urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0188 and therefore make for shorter confidence intervals, but (our proof of) uniform validity of inference requires urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0189. Indeed, calibrated projection with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0190 will disregard any projection conservatism and (as is easy to show) exactly recovers projection of the AS confidence set. Intuitively, we then want to choose ρ large but not too large.

To this end, we heuristically calibrate ρ based on how much conservative distortion one is willing to accept in well-behaved cases. This distortion—denote it η, for which we suggest a numerical value of 0.01—is compared against a bound on conservative distortion that is itself likely to be conservative but data-free and trivial to compute. In particular, we set
urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0191
The underlying heuristic is as follows: If all basic solutions (i.e., intersections of exactly d constraints) that potentially define vertices of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0192 realize inside the ρ-box, then the ρ-box cannot affect the values in (2.9) and hence not whether coverage obtains in a given bootstrap sample. Conversely, the probability that at least one basic solution realizes outside the ρ-box bounds from above the conservative distortion. This probability is, of course, dependent on unknown parameters. Our data-free approximation imputes multivariate standard normal distributions for all basic solutions and Bonferroni adjustment to handle their covariation.

The E-A-M algorithm also has two tuning parameters. One is k, the initial number of evaluation points. The other is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0198, the probability of drawing urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0199 randomly from a uniform distribution on Θ instead of by maximizing urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0200. In calibrated projection use of the E-A-M algorithm, there is a single “black box” function, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0201. We therefore suggest setting urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0202, similarly to the recommendation in Jones, Schonlau, and Welch (1998, p. 473). In our Monte Carlo exercises, we experimented with larger values, for example, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0203, and found that the increased number had no noticeable effect on the computed urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0204. If a user applies our E-A-M algorithm to a constrained optimization problem with many “black box” functions to approximate, we suggest using a larger number of initial points.

The role of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0205 (e.g., Bull (2011, p. 2889)) is to trade off the greediness of the urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0206 maximization criterion with the overarching goal of global optimization. Sutton and Barto (1998, pp. 28–29) explored the effect of setting urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0207 and 0.01 on different optimization problems, and found that for sufficiently large L, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0208 performs better. In our own simulations, we have found that drawing both a uniform point and computing the value of θ for each L (thereby sidestepping the choice of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0209) is fast and accurate, and that is what we recommend doing.

3 Theoretical Results

3.1 Asymptotic Validity of Inference

In this section, we establish that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0210 is uniformly asymptotically valid in the sense of ensuring that (2.5) equals at least urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0211. The result applies to: (i) confidence intervals for one projection; (ii) joint confidence regions for several projections, in particular confidence hyperrectangles for subvectors; (iii) confidence intervals for smooth nonlinear functions urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0212. Examples of the latter extension include policy analysis and estimation of partially identified counterfactuals as well as demand extrapolation subject to rationality constraints.

Theorem 3.1.Suppose Assumptions E.1, E.2, E.3, E.4, and E.5 hold. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0213.

  • (I) Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0214 be as defined in (1.3), with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0215 as in (2.10). Then:
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0216(3.1)
  • (II) Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0217 denote unit vectors in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0218, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0219. Then:
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0220
    where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0221 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0222.
  • (III) Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0223 be a confidence interval whose lower and upper points are obtained solving
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0224
    where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0225. Suppose that there exist urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0226 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0227 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0228 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0229, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0230 is the gradient of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0231. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0232. Then:
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0233

All assumptions can be found in Supplemental Material Appendix E.1. Assumptions E.1 and E.5 are mild regularity conditions typical in the literature; see, for example, Definition 4.2 and the corresponding discussion in BCS. Assumption E.2 is based on AS and constrains the GMS function urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0234 as well as the rate at which urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0235 diverges. Assumption E.4 requires normalized population moments to be sufficiently smooth and consistently estimable. Assumption E.3 is our key departure from the related literature. In essence, it requires that the correlation matrix of the moment functions corresponding to close-to-binding moment conditions has eigenvalues uniformly bounded from below. Under this condition, we are able to show that in the limit problem corresponding to (2.6)—where constraints are replaced with their local linearization using population gradients and Gaussian processes—the probability of coverage increases continuously in c. If such continuity is directly assumed (Assumption E.6), Theorem 3.1 remains valid (Supplemental Material Appendix G.2.2). While the high level Assumption E.6 is similar in spirit to a key condition (Assumption A.2) in BCS, we propose Assumption E.3 due to its familiarity and ease of interpretation; a similar condition is required for uniform validity of standard point identified Generalized Method of Moments inference. In Supplemental Material Appendix F.2, we verify that our assumptions hold in some of the canonical examples in the partial identification literature: mean with missing data, linear regression and best linear prediction with interval data (and discrete covariates), entry games with multiple equilibria (and discrete covariates), and semiparametric binary regression models with discrete or interval valued covariates (as in Magnac and Maurin (2008)).

Assumptions E.1–E.5 define the class of DGPs over which our proposed method yields uniformly asymptotically valid coverage. This class is non-nested with the class of DGPs over which the profiling-based methods of Romano and Shaikh (2008) and BCS are uniformly asymptotically valid. Kaido, Molinari, and Stoye (2017, Section 4.2 and Supplemental Appendix F) showed that in well-behaved cases, calibrated projection and BCS-profiling are asymptotically equivalent. They also provided conditions under which calibrated projection has lower probability of false coverage in finite sample, thereby establishing that the two methods' finite sample power properties are non-ranked.

3.2 Convergence of the E-A-M Algorithm

We next provide formal conditions under which the sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0236 generated by the E-A-M algorithm converges to the true endpoint of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0237 as urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0238 at a rate that we obtain. Although urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0239, so that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0240 satisfies the true constraints for each L, the sequence of evaluation points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0241 is mostly obtained through expected improvement maximization (M-step) with respect to the approximating surface urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0242. Because of this, a requirement for convergence is that the function urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0243 is sufficiently smooth, so that the approximation error in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0244 vanishes uniformly in θ as urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0245. We furthermore assume that the constraint set in (2.11) satisfies a degeneracy condition introduced to the partial identification literature by Chernozhukov, Hong, and Tamer (2007, Condition C.3). In our application, the condition requires that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0247 has an interior and that the inequalities in (2.4), when evaluated at points in a (small) τ-contraction of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0248, are satisfied with a slack that is proportional to τ. Theorem 3.2 below establishes that these conditions jointly ensure convergence of the E-A-M algorithm at a specific rate. This is a novel contribution to the literature on response surface methods for constrained optimization.

In the formal statement below, the expectation urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0249 is taken with respect to the law of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0250 determined by the Initialization step and the M-step but conditioning on the sample. We refer to Appendix A for a precise definition of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0251 and a proof of the theorem.

Theorem 3.2.Suppose urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0252 is a compact hyperrectangle with nonempty interior, that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0253, and that Assumptions A.1, A.2, and A.3 hold. Let the evaluation points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0254 be drawn according to the Initialization and M-steps. Then

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0255(3.2)
where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0256 is the urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0257-norm under urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0258, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0259, and the constants urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0260 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0261 are defined in Assumption A.1. If urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0262, the statement in (3.2) holds for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0263.

The requirement that Θ is a compact hyperrectangle with nonempty interior can be replaced by a requirement that Θ belongs to the interior of a closed hyperrectangle in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0264. Assumption A.1 specifies the types of kernel to be used to define the correlation functional in (2.13). Assumption A.2 collects requirements on differentiability of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0265, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0266, and smoothness of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0267. Assumption A.3 is the degeneracy condition discussed above.

To apply Theorem 3.2 to calibrated projection, we provide low-level conditions (Assumption D.1 in Supplemental Material Appendix D.1.1) under which the map urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0268 uniformly stochastically satisfies a Lipschitz-type condition. To get smoothness, we work with a mollified version of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0269, denoted urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0270 in equation (D.1), where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0271. Theorem D.1 in the Supplemental Material shows that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0272 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0273 can be made uniformly arbitrarily close, and that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0274 yields valid inference as in (3.1). In practice, we directly apply the E-A-M steps to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0275.

The key condition imposed in Theorem D.1 is Assumption D.1. It requires that the GMS function used is Lipschitz in its argument, and that the standardized moment functions are Lipschitz in θ. In Supplemental Material Appendix F.1, we establish that the latter condition is satisfied by some canonical examples in the moment (in)equality literature: mean with missing data, linear regression and best linear prediction with interval data (and discrete covariates), entry games with multiple equilibria (and discrete covariates), and semiparametric binary regression models with discrete or interval valued covariates (as in Magnac and Maurin (2008)).

The E-A-M algorithm is proposed as a method to implement our statistical procedure, not as part of the statistical procedure itself. As such, its approximation error is not taken into account in Theorem 3.1. Our comparisons of the confidence intervals obtained through the use of E-A-M as opposed to directly solving problems (2.4) through the use of MATLAB's fmincon in our empirical application in the next section suggest that such error is minimal.

4 Empirical Illustration: Estimating a Binary Game

We employ our method to revisit the study in Kline and Tamer (2016, Section 8) of “what explains the decision of an airline to provide service between two airports.” We use their data and model specification. Here, we briefly summarize the setup and refer to Kline and Tamer (2016) for a richer discussion.

The study examines entry decisions of two types of firms, namely, Low Cost Carriers (urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0277) versus Other Airlines (urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0278). A market is defined as a trip between two airports, irrespective of intermediate stops. The entry decision urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0279 of player urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0280 in market i is recorded as a 1 if a firm of type serves market i and 0 otherwise. Firm 's payoff equals urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0281, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0282 is the opponent's entry decision. Each firm enters if doing so generates nonnegative payoffs. The observable covariates in the vector urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0283 include the constant and the variables urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0284 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0285. The former is market size, a market-specific variable common to all airlines in that market and defined as the population at the endpoints of the trip. The latter is a firm-and-market-specific variable measuring the market presence of firms of type in market i (see Kline and Tamer (2016, p. 356 for its exact definition). While urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0286 enters the payoff function of both firms, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0287 (respectively, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0288) is excluded from the payoff of firm urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0289 (respectively, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0290). Each of market size and of the two market presence variables is transformed into binary variables based on whether they realized above or below their respective median. This leads to a total of eight market types, hence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0291 moment inequalities and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0292 moment equalities. The unobserved payoff shifters urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0293 are assumed to be i.i.d. across i and to have a bivariate normal distribution with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0294, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0295, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0296 for each i and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0297, where the correlation r is to be estimated. Following Kline and Tamer (2016), we assume that the strategic interaction parameters urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0298 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0299 are negative, that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0300, and that the researcher imposes these sign restrictions. To ensure that Assumption E.4 is satisfied, we furthermore assume that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0305 and use this value as its upper bound in the definition of the parameter space.

The results of the analysis are reported in Table I, which displays urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0306 nominal confidence intervals (our urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0307 as defined in equations (2.3)–(2.4)) for each parameter. The output of the E-A-M algorithm is displayed in the accordingly labeled column. The next column shows a robustness check, namely, the output of MATLAB's fmincon function, henceforth labeled “direct search,” that was started at each of a widely spaced set of feasible points that were previously discovered by the E-A-M algorithm. We emphasize that this is a robustness or accuracy check, not a horse race: Direct search mechanically improves on E-A-M because it starts (among other points) at the point reported by E-A-M as optimal feasible. Using the standard MultiStart function in MATLAB instead of the points discovered by E-A-M produces unreliable and extremely slow results. In 10 out of 18 optimization problems that we solved, the E-A-M algorithm's solution came within its set tolerance (0.005) from the direct search solution. The other optimization problems were solved by E-A-M with a minimal error of less than urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0308.

Table I. Results for Empirical Application, With urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0309, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0310, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0311, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0312a

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0313

Computational Time

E-A-M

Direct Search

E-A-M

Direct Search

Total

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0314

[−2.0603,−0.8510]

[−2.0827,−0.8492]

24.73

32.46

57.51

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0315

[0.1880,0.4029]

[0.1878,0.4163]

16.18

230.28

246.49

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0316

[1.7510,1.9550]

[1.7426,1.9687]

16.07

115.20

131.30

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0317

[0.3957,0.5898]

[0.3942,0.6132]

27.61

107.33

137.66

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0318

[0.3378,0.5654]

[0.3316,0.5661]

11.90

141.73

153.66

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0319

[0.3974,0.5808]

[0.3923,0.5850]

13.53

148.20

161.75

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0320

[−1.4423,−0.1884]

[−1.4433,−0.1786]

15.65

119.50

135.17

urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0321

[−1.4701,−0.7658]

[−1.4742,−0.7477]

13.06

114.14

127.23

r

[0.1855,0.85]

[0.1855,0.85]

5.37

42.38

47.78

  • a “Direct search” refers to fmincon performed after E-A-M and starting from feasible points discovered by E-A-M, including the E-A-M optimum.

Table I also reports computational time of the E-A-M algorithm, of the subsequent direct search, and the total time used to compute the confidence intervals. The direct search greatly increases computation time with small or negligible benefit. Also, computational time varied substantially across components. We suspect this might be due to the shape of the level sets of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0322: By manually searching around the optimal values of the program, we verified that the level sets in specific directions can be extremely thin, rendering search more challenging.

Comparing our findings with those in Kline and Tamer (2016), we see that the results qualitatively agree. The confidence intervals for the interaction effects (urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0323 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0324) and for the effect of market size on payoffs (urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0325 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0326) are similar to each other across the two types of firms. The payoffs of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0327 firms seem to be impacted more than those of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0328 firms by market presence. On the other hand, monopoly payoffs for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0329 firms seem to be smaller than for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0330 firms. The confidence interval on the correlation coefficient is quite large and includes our upper bound of 0.85.

For most components, our confidence intervals are narrower than the corresponding 95% credible sets reported in Kline and Tamer (2016). However, the intervals are not comparable for at least two reasons: We impose a stricter upper bound on r and we aim to cover the projections of the true parameter value as opposed to the identified set.

Overall, our results suggest that in a reasonably sized, empirically interesting problem, calibrated projection yields informative confidence intervals. Furthermore, the E-A-M algorithm appears to accurately and quickly approximate solutions to complex smooth nonlinear optimization problems.

5 Conclusion

This paper proposes a confidence interval for linear functions of parameter vectors that are partially identified through finitely many moment (in)equalities. The extreme points of our calibrated projection confidence interval are obtained by minimizing and maximizing urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0332 subject to properly relaxed sample analogs of the moment conditions. The relaxation amount, or critical level, is computed to insure uniform asymptotic coverage of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0333 rather than θ itself. Its calibration is computationally attractive because it is based on repeatedly checking feasibility of (bootstrap) linear programming problems. Computation of the extreme points of the confidence intervals is furthermore attractive thanks to an application of the response surface method for global optimization; this is a novel contribution of independent interest. Indeed, one key result is a convergence rate for this algorithm when applied to constrained optimization problems in which the objective function is easy to evaluate but the constraints are “black box” functions. The result is applicable to any instance when the researcher wants to compute confidence intervals for optimal values of constrained optimization problems. Our empirical application and Monte Carlo analysis show that, in the DGPs that we considered, calibrated projection is fast and accurate, and also that the E-A-M algorithm can greatly improve computation of other confidence intervals.

  • 1 See Kaido, Molinari, and Stoye (2017, Section 4.2 and Supplemental Appendix F) for a comparison of the statistical properties of calibrated projection and BCS-profiling, summarized here at the end of Section 3.2.
  • 2 The published version of PPHI, i.e., Pakes et al. (2015), does not contain the inference part. Kaido, Molinari, and Stoye (2017, Section 4.2) showed that calibrated projection can be much simplified under the conditions imposed by PPHI.
  • 3 Appendix D provides convergence-related results and background material for our algorithm and describes how to compute urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0029. Appendix E presents the assumptions under which we prove uniform asymptotic validity of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0030. Appendix F verifies, for a number of canonical partial identification problems, the assumptions that we invoke to show validity of our inference procedure and for our algorithm. Appendix G contains the proof of Theorem 3.1. Appendix H collects lemmas supporting this proof.
  • 4 Here we focus on the confidence interval urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0052 defined in (1.3). See Appendix G.2.3 for the analysis of the confidence region given by the mathematical projection in (1.2).
  • 5 The mean value urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0075 changes with j but we omit the dependence to ease notation.
  • 6 See Supplemental Material Appendix F for such estimators in some canonical moment (in)equality examples.
  • 7 BCS approximated urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0080 by urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0081 with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0082 i.i.d. This approximation is equally valid in our approach, and can be faster as it avoids repeated evaluation of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0083.
  • 8 A common choice of φ is given componentwise by
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0091
    Restrictions on φ and the rate at which urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0092 diverges are imposed in Assumption E.2. While for concreteness here we write out the “hard thresholding” GMS function, Theorem 3.1 below applies to all but one of the GMS functions in AS, namely, to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0093, all of which depend on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0094. We do not consider GMS function urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0095, which depends also on the covariance matrix of the moment functions.
  • 9 Here, we implicitly assume that Θ is a polyhedral set. If it is instead defined by smooth convex (in)equalities, these can be linearized, too.
  • 10 We implement a program in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0104 for simplicity but, because urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0105, one could reduce this to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0106.
  • 11 In (2.8), set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0108, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0109, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0110, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0111. Then simple algebra reveals that (with or without ρ-box) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0112. If urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0113 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0114, then without ρ-box we have urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0115 for any small urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0116, and we therefore cannot expect to get urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0117 right if gradients are estimated. With ρ-box, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0118 as urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0119, so the problem goes away. This stylized example is relevant because it resembles polyhedral identified sets where one face is near orthogonal to p. It violates assumptions in BCS and PPHI.
  • 12 We emphasize that, in analyzing the computational problem, we take the data, including bootstrap data, as given. Thus, while an econometrician would usually think of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0128 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0129 as random variables, for this section's purposes they are indeed just functions of θ.
  • 13 For simplicity and to mirror our motivating application, we suppose that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0131 is easy to compute. The algorithm is easily adapted to the case where it is not. Indeed, in Appendix B, we show how E-A-M can be employed to compute BCS-profiling confidence intervals, where the profiled test statistic itself is costly to compute and is approximated together with the critical value.
  • 14 See details in Jones, Schonlau, and Welch (1998). We use the DACE MATLAB kriging toolbox (http://www2.imm.dtu.dk/projects/dace/) for this step in our empirical application and Monte Carlo experiments.
  • 15 Heuristically, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0169 is the expected improvement gained from analyzing parameter value θ for a Bayesian whose current beliefs about c are described by the estimated model. Indeed, for each θ, the maximand in (2.14) multiplies improvement from learning that θ is feasible with this Bayesian's probability that it is.
  • 16 It is also possible to draw multiple points in each iteration (Schonlau, Welch, and Jones (1998)), as we do in our implementation of the method.
  • 17 To reproduce the expression, recall that if urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0193 random variables in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0194 are individually multivariate standard normal, then a Bonferroni upper bound on the probability that not all of them realize inside the ρ-box equals urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0195. Also, if Bonferroni is replaced with an independence assumption, the expression changes to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0196. The numerical difference is negligible for moderate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0197.
  • 18 In Appendix G.2.3, we show that the result actually applies to the mathematical projection in (1.2).
  • 19 Because the function f is known, these conditions can be easily verified in practice (especially if the first one is strengthened to hold over Θ).
  • 20 Assumption E.3 allows for high correlation among moment inequalities that cannot cross. This covers equality constraints but also entry games as the ones studied in Ciliberto and Tamer (2009).
  • 21 As in Bull (2011), our convergence result accounts for the fact that the parameters of the Gaussian process prior in (2.12) are re-estimated for each iteration of the A-step using the “training data” urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0246.
  • 22 Chernozhukov, Hong, and Tamer (2007, eq. (4.6)) imposed the condition on the population identified set.
  • 23 For a discussion of mollification, see, for example, Rockafellar and Wets (2005, Example 7.19).
  • 24 This requirement rules out the GMS function in footnote , but it is satisfied by other GMS functions proposed by AS.
  • 25 For these same examples, we verify the differentiability requirement in Assumption A.2 on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0276.
  • 26 The data, which pertain to the second quarter of the year 2010, are downloaded from http://qeconomics.org/ojs/index.php/qe/article/downloadSuppFile/371/1173.
  • 27 This assumption, common in the literature on projection inference, requires that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0301 are Lipschitz in θ and have bounded norm. But urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0302 includes a denominator equal to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0303. As urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0304, this leads to a violation of the assumption and to numerical instability.
  • 28 Monopoly payoffs are those associated with a market with below-median size and below-median market presence (i.e., the constant terms).
  • 29 Being on the boundary of the parameter space is not a problem for calibrated projection; indeed, it is accounted for in the calibration of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0331 in equations (2.8)–(2.10).
  • 30 For the interaction parameters δ, Kline and Tamer's upper confidence points are lower than ours; for the correlation coefficient r, their lower confidence point is higher than ours.
  • 31 We use urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0347 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0348 to denote the probability and expectation for the prior and posterior distributions of c to distinguish them from P and E used for the sampling uncertainty for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0349.
  • 32 This requirement holds in the canonical partial identification examples discussed in Supplemental Material Appendix F, using the same arguments as in Supplemental Material Appendix F.1, provided urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0408.
  • 33 Chernozhukov, Hong, and Tamer (2007) imposed the degeneracy condition on the population identified set.
  • 34 The left endpoint is the optimal value of a program that replaces urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0815 with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0816.
  • 35 One may view (B.1) as a special case of (2.11) with a scalar control variable and a single constraint urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0819 with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0820.
  • 36 See http://qeconomics.org/ojs/index.php/qe/article/downloadSuppFile/431/1411.
  • 37 The specialization in which we compare to BCS also fulfils their assumptions. The assumptions in Pakes et al. (2011) exclude any DGP that has moment equalities.
  • 38 This allows for market-type homogeneous fixed effects but not for player-specific covariates nor for observed heterogeneity in interaction effects.
  • 39 We implement this step using the high-speed solver CVXGEN, available from http://cvxgen.com and described in Mattingley and Boyd (2012).
  • 40 This is only one of several individually necessary stopping criteria. Others include that the current optimum urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0909 and the expected improvement maximizer urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0910 (see equation (2.14)) satisfy urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0911. See Kaido et al. (2017) for the full list of convergence requirements.
  • 41 Based on some trial runs of BCS-profiling for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0914, we estimate that running it with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0915 throughout would take 3.14 times longer than the computation times reported in Table II. By comparison, calibrated projection takes only 1.75 times longer when implemented with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0916 instead of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0917.
  • Appendix A: Convergence of the E-A-M Algorithm

    In this appendix, we provide details on the algorithm used to solve the outer maximization problem as described in Section 2.3. Below, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0334 be a measurable space and ω a generic element of Ω. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0335 and let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0336 be a measurable map on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0337 whose law is specified below. The value of the function c in (2.11) is unknown ex ante. Once the evaluation points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0338, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0339, realize, the corresponding values of c, that is, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0340, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0341, are known. We may therefore define the information set
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0342
    Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0343 be the set of feasible evaluation points. Then urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0344 is measurable with respect to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0345 and we take a measurable selection urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0346 from it.
    Our algorithm iteratively determines evaluation points based on the expected improvement criterion (Jones, Schonlau, and Welch (1998)). For this, we formally introduce a model that describes the uncertainty associated with the values of c outside the current evaluation points. Specifically, the unknown function c is modeled as a Gaussian process such that
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0350
    where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0351 controls the length-scales of the process. Two values urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0352 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0353 are highly correlated when urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0354 is small relative to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0355. Throughout, we assume urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0356 for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0357 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0358. We let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0359. Specific suggestions on the forms of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0360 are given in Appendix D.2.
    For a given urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0361, the posterior distribution of c given urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0362 is then another Gaussian process whose mean urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0363 and variance urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0364 are given as follows (Santner, Williams, and Notz (2013, Section 4.1.3)):
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0365
    Given this, the expected improvement function can be written as
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0366
    The evaluation points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0367 are then generated according to the following algorithm (M-step in Section 2.3).

    Algorithm A.1.Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0368.

    Step 1: Initial evaluation points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0369 are drawn uniformly over Θ independent of c.

    Step 2: For urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0370, with probability urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0371, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0372. With probability urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0373, draw urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0374 uniformly at random from Θ.

    Below, we use urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0375 to denote the law of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0376 determined by the algorithm above. We also note that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0377 is a function of the evaluation points and therefore is a random variable whose law is governed by urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0378. We let
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0379(A.1)

    We require that the kernel used to define the correlation functional for the Gaussian process in (2.13) satisfies some basic regularity conditions. For this, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0380 denote the Fourier transform of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0381. Note also that, for real valued functions f, g, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0382 means urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0383 as urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0384 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0385.

    Assumption A.1. (Kernel Function)(i) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0386 is continuous and integrable. (ii) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0387 for some non-increasing function urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0388. (iii) As urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0389, either urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0390 for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0391 or urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0392 for all urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0393. (iv) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0394 is k-times continuously differentiable for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0395, and at the origin K has kth-order Taylor approximation urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0396 satisfying urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0397 as urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0398, for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0399.

    Assumption A.1 is essentially the same as Assumptions 1–4 in Bull (2011). When a kernel satisfies the second condition of Assumption A.1(iii), that is, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0400, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0401, we say urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0402. Assumption A.1 is satisfied by popular kernels such as the Matérn kernel (with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0403 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0404) and the Gaussian kernel (urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0405 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0406). These kernels are discussed in Appendix D.2.

    Finally, we require that the functions urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0407 are differentiable with continuous Lipschitz gradient, that the function c is smooth, and we impose on the constraint set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0409 (which is a confidence set in our application) a degeneracy condition inspired by Chernozhukov, Hong, and Tamer (2007, Condition C.3). Below, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0410 is the reproducing kernel Hilbert space (RKHS) on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0411 determined by the kernel used to define the correlation functional in (2.13). The norm on this space is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0412; see Supplemental Material Appendix D.2 for details.

    Assumption A.2. (Continuity and Smoothness)(i) For each urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0413, the function urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0414 is differentiable in θ with Lipschitz continuous gradient. (ii) The function urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0415 satisfies urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0416 for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0417, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0418.

    Assumption A.3. (Degeneracy)There exist constants urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0419 such that for all urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0420,

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0421
    where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0422.

    Assumptions A.2 and A.3 jointly imply a linear minorant property on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0423:
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0424(A.2)
    To see this, define urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0425, so that the l.h.s. of the above inequality is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0426. By Assumptions A.2A.3 and compactness of Θ, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0427 is differentiable with Lipschitz continuous gradient. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0428 denote its gradient and let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0429 denote the corresponding Lipschitz constant. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0430, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0431 are from Assumption A.3. We will show that, for constants urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0432 to be determined, (i) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0433 and (ii) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0434, so that the minimum between these bounds applies to any θ.
    To see (i), write urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0435, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0436 is the projection of θ onto urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0437. Fix a sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0438. By Assumption A.3, there exists a corresponding sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0439 with (for m large enough) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0440 but also urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0441. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0442 be the sequence of corresponding directions. Then, for any accumulation point t of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0443 and any active constraint j (i.e., urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0444; such j necessarily exists due to continuity of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0445), one has urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0446. We note for future reference that this finding implies urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0447. It also implies that the Mangasarian–Fromowitz constraint qualification holds at urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0448, hence r (being in the normal cone of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0449 at urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0450) is in the positive span of the active constraints' gradients. Thus, j can be chosen such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0451 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0452. For any such j, write
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0453
    In the inequality steps, we successively substituted bounds stated before the display, evaluated the integral in k, and (in the last step) used urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0454. This establishes (i), where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0455. Next, by continuity of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0456 and compactness of the constraint set, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0457 is well-defined and strictly positive. This establishes (ii) with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0458.

    A.1 Proof of Theorem 3.2

    For each urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0459, let
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0460

    Proof of Theorem 3.2.First, note that

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0461
    where the last equality follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0462, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0463-a.s. Hence, it suffices to show
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0464

    Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0465 be a measurable space. Below, we let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0466. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0467. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0468 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0469 be the event that at least urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0470 of the points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0471 are drawn independently from a uniform distribution on Θ. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0472 be the event that one of the points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0473 is chosen by maximizing the expected improvement. For each L, define the mesh norm:

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0474
    For a given urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0475, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0476 be the event that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0477. We then let
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0478(A.3)
    For each urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0479, let
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0480(A.4)
    This is a (random) index that is associated with the first maximizer of the expected improvement between L and 2L.

    Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0481 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0482 and note that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0483 is a positive sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0484 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0485. We further define the following events:

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0486
    Note that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0487 can be partitioned into urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0488, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0489, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0490. By Lemmas A.2, A.3, and A.4, there exists a constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0491 such that, respectively,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0492(A.5)
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0493(A.6)
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0494(A.7)
    where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0495. Note that
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0496(A.8)
    Hence, by taking M sufficiently large so that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0497,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0498(A.9)
    where the inequality follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0499 by urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0500. By (A.5)–(A.9),
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0501
    for some constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0502 for all L sufficiently large. Since urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0503, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0504 is non-decreasing in L, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0505 is non-increasing in L, we have
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0506(A.10)
    where the last equality follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0507 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0508.

    Now consider the case urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0509. By (A.3),

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0510(A.11)
    Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0511 be a Bernoulli random variable such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0512 if urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0513 is randomly drawn from a uniform distribution. Then, by the Chernoff bounds (see, e.g., Boucheron, Lugosi, and Massart (2013, p.48)),
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0514(A.12)
    Further, by the definition of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0515,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0516(A.13)
    and finally, by taking urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0517 large upon defining the event urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0518 and applying Lemma 12 in Bull (2011), one has
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0519(A.14)
    for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0520. Combining (A.11)–(A.14), for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0521,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0522(A.15)
    Finally, noting that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0523 is bounded by some constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0524 due to the boundedness of Θ, we have
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0525(A.16)
    where the second equality follows from (A.10) and (A.15). Since urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0526 can be made aribitrarily large, one may let the second term on the right-hand side of (A.16) converge to 0 faster than the first term. Therefore,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0527
    which establishes the claim of the theorem for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0528. When the second condition of Assumption A.1(iii) holds (i.e., urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0529), the argument above holds for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0530. □

    A.2 Auxiliary Lemmas for the Proof of Theorem 3.2

    Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0531 be defined as in (A.3). The following lemma shows that on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0532, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0533 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0534 are close to each other, where we recall that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0535 is the expected improvement maximizer (but does not belong to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0536 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0537).

    Lemma A.1.Suppose Assumptions A.1, A.2, and A.3 hold. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0538 be a positive sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0539 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0540. Then, there exists a constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0541 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0542 for all L sufficiently large.

    Proof.We show the result by contradiction. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0543 be a sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0544 for all L. First, assume that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0545, there is a subsequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0546 for all L. This occurs if it contains a further subsequence along which, for all L, (i) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0547 or (ii) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0548.

    Case (i): urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0549 for all L for some subsequence.

    To simplify notation, we select a further subsequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0550 of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0551 such that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0552, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0553. This then induces a sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0554 of expected improvement maximizers such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0555 for all , where each equals urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0556 for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0557. In what follows, we therefore omit the arguments of , but this sequence's dependence on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0558 should be implicitly understood.

    Recall that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0559 defined in equation (A.1) is a compact set and that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0560 denotes the projection of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0561 on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0562. Then

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0563
    where the first inequality follows from the Cauchy–Schwarz inequality, and the second inequality follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0564 due to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0565. Therefore, by equation (A.2), for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0566,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0567
    for all sufficiently large, where the last inequality follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0568. Take M such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0569. Then urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0570 for all sufficiently large, contradicting urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0571.

    Case (ii): Similarly to Case (i), we work with a further subsequence along which urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0572 for all . Recall that along this subsequence, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0573 because urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0574. We will construct urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0575 s.t. urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0576, contradicting the definition of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0577.

    By Assumption A.3,

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0578(A.17)
    for all such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0579. By the Cauchy–Schwarz inequality, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0580,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0581(A.18)
    Therefore, minimizing both sides with respect to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0582 and noting that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0583, we obtain
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0584(A.19)
    Further, noting that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0585,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0586(A.20)
    By (A.17)–(A.20),
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0587
    for all sufficiently large. Therefore, for all sufficiently large, one has
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0588
    implying existence of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0589 s.t.
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0590(A.21)
    By Lemma A.6, for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0591, one can write
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0592(A.22)
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0593(A.23)
    where the last inequality uses urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0594. Lemma A.6 also yields
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0595
    for all sufficiently large, where the second inequality follows from (A.21). Next, by Assumption A.3,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0596(A.24)
    for all sufficiently large. Note that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0597 by (A.32) and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0598 by assumption. Hence, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0599. This in turn implies
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0600(A.25)
    for all sufficiently large. Equations (A.23) and (A.25) jointly establish the desired contradiction. □

    The next lemma shows that, on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0601, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0602 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0603 are close to each other, where we recall that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0604 is the optimum value among the available feasible points (it belongs to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0605).

    Lemma A.2.Suppose Assumptions A.1, A.2, and A.3 hold. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0606 be a positive sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0607 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0608. Then, there exists a constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0609 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0610 for all L sufficiently large.

    Proof.We show below urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0611 uniformly over urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0612 for some decreasing sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0613 satisfying the assumptions of the lemma. The claim then follows by relabeling urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0614.

    Suppose by contradiction that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0615, there is a subsequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0616 along which urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0617 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0618 for all L sufficiently large. To simplify notation, we select a subsequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0619 of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0620 such that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0621, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0622. This then induces a sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0623 for all , where each equals urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0624 for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0625. Similarly to the proof of Lemma A.1, we omit the arguments of below and construct a sequence of points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0626 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0627.

    Arguing as in (A.17)–(A.20), one may find a sequence of points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0628 such that

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0629(A.26)
    for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0630 and for all sufficiently large. Furthermore, by Lemma A.1,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0631(A.27)
    for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0632 and for all sufficiently large. Arguing as in (A.23),
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0633(A.28)
    where the last inequality follows from the triangle inequality, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0634, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0635. Similarly, by Lemma A.6,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0636(A.29)
    where the last inequality holds for all sufficiently large because urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0637 and one can find a subsequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0638 so that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0639 for all sufficiently large.

    Subtracting (A.28) from (A.29) yields

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0640
    where the last inequality follows from (A.26) and (A.27). Note that there is a constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0641 s.t.
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0642
    due to urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0643 by (A.24), (A.32), and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0644. Therefore, for all sufficiently large,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0645
    One may take M large enough so that, for some positive constant γ, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0646 for all sufficiently large, which implies urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0647 for all sufficiently large. However, this contradicts the assumption that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0648 is the expected improvement maximizer. □

    The next lemma shows that, on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0649, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0650 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0651 are close to each other.

    Lemma A.3.Suppose Assumptions A.1, A.2, and A.3 hold. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0652 be a positive sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0653 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0654. Then, there exists a constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0655 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0656 for all L sufficiently large.

    Proof.Note that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0657, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0658, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0659, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0660 satisfies urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0661, hence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0662, which in turn implies

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0663
    Therefore, it suffices to show the existence of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0664 that ensures urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0665 uniformly over urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0666 for all L. Suppose by contradiction that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0667, there is a subsequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0668 along which urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0669 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0670 for all L sufficiently large. Again, we select a subsequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0671 of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0672 such that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0673, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0674. This then induces a sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0675 of expected improvement maximizers such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0676 for all , where each equals urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0677 for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0678.

    Similarly to the proof of Lemma A.1, we omit the arguments of below and prove the claim by contradiction. Below, we assume that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0679, there is a further subsequence along which urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0680 for all sufficiently large.

    Now let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0681 with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0682 specified below. By Assumption A.3, for all urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0683, it holds that

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0684
    for all sufficiently large. Noting that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0685 and taking urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0686 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0687, it follows that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0688 for all sufficiently large.

    Arguing as in (A.17)–(A.20), one may find a sequence of points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0689 such that

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0690
    This and the assumption that one can find a subsequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0691 for all imply
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0692
    for all sufficiently large. Now mimic the argument along (A.23)–(A.25) to deduce
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0693
    for all sufficiently large. However, this contradicts the assumption that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0694 is the expected improvement maximizer. □

    The next lemma shows that, on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0695, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0696 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0697 are close to each other.

    Lemma A.4.Suppose Assumptions A.1, A.2, and A.3 hold. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0698 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0699. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0700. Then, there exists a constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0701 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0702 for all L sufficiently large.

    Proof.Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0703 be a sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0704 for all L. Since urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0705, there is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0706 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0707 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0708 is chosen by maximizing the expected improvement.

    For later use, we note that, for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0709, it can be shown that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0710, which in turn implies that there exists a constant urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0711 such that

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0712(A.30)
    for all L sufficiently large.

    For urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0713 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0714, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0715. Recall that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0716 is an optimal solution to (2.11). Then, for all L sufficiently large,

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0717
    where (1) follows by construction, (2) follows from Lemma A.6(ii), (3) follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0718 being the maximizer of the expected improvement, (4) follows from Lemma A.5, (5) follows from (A.30) with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0719, (6) follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0720, (7) follows from Lemma A.5, (8) follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0721 being the expected improvement maximizer, (9) follows from Lemma A.5, and (10) follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0722 due to the definition of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0723. This establishes the claim. □

    For evaluation points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0724 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0725, the following lemma is an analog of Lemma 8 in Bull (2011), which links the expected improvement to the actual improvement achieved by a new evaluation point θ.

    Lemma A.5.Suppose urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0726 is bounded and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0727. Suppose the evaluation points urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0728 are drawn by Algorithm A.1 and let Assumptions A.1 and A.2(ii) hold. For urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0729 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0730, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0731. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0732 be a positive sequence such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0733 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0734. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0735. Then, for any sequence urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0736 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0737,

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0738
    where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0739.

    Proof of Lemma A.5.If urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0740, then the posterior variance of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0741 is zero. Hence, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0742, and the claim of the lemma holds.

    Suppose urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0743. We first show the upper bound. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0744 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0745. By Lemma 6 in Bull (2011), we have urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0746. Starting from Lemma A.6(i), we can write

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0747(A.31)
    where the last inequality used urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0748 for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0749. Note that one may write
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0750
    To be clear about the hyperparameter value at which we evaluate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0751, we will write urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0752. By the hypothesis that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0753 and Lemma 4 in Bull (2011), we have
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0754
    Note that there are urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0755 uniformly sampled points, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0756 is associated with index urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0757. As shown in the proof of Theorem 5 in Bull (2011), this ensures that
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0758(A.32)
    Below, we simply write this result urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0759. This, together with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0760 and the fact that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0761 is decreasing, yields
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0762(A.33)
    for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0763 and where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0764. Note that, by the triangle inequality,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0765(A.34)
    and
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0766(A.35)
    for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0767, where ϕ is the density of the standard normal distribution, and the inequality follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0768. The second term on the right-hand side of (A.34) can be bounded as
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0769(A.36)
    by the mean value theorem, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0770 is a point between urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0771 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0772. The claim of the lemma then follows from (A.31), (A.33)–(A.36), and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0773 being bounded because Θ is bounded.

    Similarly, for the lower bound, we have

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0774(A.37)
    Note that we may write
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0775(A.38)
    by urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0776. Arguing as in (A.37) and noting that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0777 is increasing, one has
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0778(A.39)
    for some urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0779 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0780. By the triangle inequality,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0781(A.40)
    where arguing as in (A.35),
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0782(A.41)
    The second term on the right-hand side of (A.40) can be bounded as
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0783(A.42)
    by the mean value theorem, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0784 is a point between urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0785 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0786. The claim of the lemma then follows from (A.37)–(A.42), and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0787 being bounded because Θ is bounded. □

    Lemma A.6.Suppose urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0788 is bounded and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0789 and let Assumptions A.1 and A.2(ii) hold. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0790. For urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0791 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0792, let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0793. Then, (i) for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0794 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0795,

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0796
    Further, (ii) for any urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0797 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0798 such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0799,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0800(A.43)

    Proof.(i) Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0801 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0802. By Lemma 6 in Bull (2011), we have urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0803. Since urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0804 is decreasing, we have

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0805
    Similarly,
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0806
    (ii) For the lower bound in (A.43), we have
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0807
    where the last inequality follows from urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0808 and the fact that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0809 is decreasing. □

    Appendix B: Applying the E-A-M Algorithm to Profiling

    We describe below how to use the E-A-M procedure to compute BCS-profiling based confidence intervals. Let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0810 denote the parameter space for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0811. The (one-dimensional) profiling confidence region is
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0812
    where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0813 is the critical value proposed in Bugni, Canay, and Shi (2017) and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0814 is any test statistic that they allowed for. The E-A-M algorithm can be used to compute the endpoints of this set so that the researcher may report an interval.
    For ease of exposition, we discuss below the computation of the right endpoint of the confidence interval, which is the optimal value of the following problem:
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0817(B.1)
    We then take urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0818 as a black box function and apply the E-A-M algorithm. We include the profiled statistic in the black box function because it involves a nonlinear optimization problem, which is also relatively expensive. The modified procedure is as follows.
    • Initialization: Draw randomly (uniformly) over urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0821 a set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0822 of initial evaluation points and evaluate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0823 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0824. Initialize urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0825.
    • E-step: Evaluate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0826 and record the tentative optimal value
      urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0827
    • A-step: (Approximation):

      Approximate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0828 by a flexible auxiliary model. We again use the kriging approximation, which for a mean-zero Gaussian process urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0829 indexed by τ and with constant variance urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0830 specifies

      urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0831
      where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0832 is a kernel with a scalar parameter urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0833. The parameters are estimated in the same way as before.

      The (best linear) predictor of c and its derivative are then given by

      urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0834
      where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0835 is a vector whose th component is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0836 as given above with estimated parameters, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0837, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0838 is an L-by-L matrix whose urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0839 entry is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0840 with estimated parameters. The amount of uncertainty left in urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0841 is captured by the following variance:
      urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0842

    • M-step: (Maximization): With probability urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0843, maximize the expected improvement function urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0844 to obtain the next evaluation point, with
      urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0845
      With probability urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0846, draw urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0847 randomly from a uniform distribution over urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0848.

    As before, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0849 is reported as endpoint of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0850 upon convergence. In order for Theorem 3.2 to apply to this algorithm, the profiled statistic urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0851 and the critical value urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0852 need to be sufficiently smooth. We leave derivation of sufficient conditions for this to be the case to future research.

    Appendix C: An Entry Game Model and Some Monte Carlo Simulations

    We evaluate the statistical and numerical performance of calibrated projection and E-A-M in comparison with BCS-profiling in a Monte Carlo experiment run on a server with two Intel Xeon X5680 processors rated at 3.33 GHz with six cores each and with a memory capacity of 24 Gb rated at 1333 MHz. The experiment simulates a two-player entry game in the Monte Carlo exercise of BCS, using their code to implement their method.

    C.1 The General Entry Game Model

    We consider a two-player entry game based on Ciliberto and Tamer (2009):

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0853

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0854

    Y1 = 0

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0855

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0856

    Y1 = 1

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0857

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0858

    Here, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0859, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0860, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0861 denote player 's binary action, observed characteristics, and unobserved characteristics. The strategic interaction effects urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0862 measure the impact of the opponent's entry into the market. We let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0863. We generate urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0864 as an i.i.d. random vector taking values in a finite set whose distribution urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0865 is known. We let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0866 be independent of Z and such that urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0867 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0868, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0869. We let urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0870. For a given set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0871, we define urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0872. We choose urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0873 so that the c.d.f. of u is continuous, differentiable, and has a bounded p.d.f. The outcome urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0874 results from pure strategy Nash equilibrium play. For some value of Z and u, the model predicts monopoly outcomes urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0875 and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0876 as multiple equilibria. When this occurs, we select outcome urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0877 by independent Bernoulli trials with parameter urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0878. This gives rise to the following restrictions:
    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0879(C.1)
    We show in Supplemental Material Appendix F that this model satisfies Assumptions D.1 and E.3-2. Throughout, we analytically compute the moments' gradients and studentize them using sample analogs of their standard deviations.

    C.2 A Comparison to BCS-Profiling

    BCS specialized this model as follows. First, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0880, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0881 are independently uniformly distributed on urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0882 and the researcher knows urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0883. Equality (C.1) disappears because urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0884 is never an equilibrium. Next, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0885, where urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0886 are observed market-type indicators, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0887 for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0888, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0889. The parameter vector is urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0890 with parameter space urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0891. This leaves four moment equalities and eight moment inequalities (so urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0892); compare equation (5.1) in BCS. We set urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0893, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0894, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0895, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0896, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0897. The implied true bounds on parameters are urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0898, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0899, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0900, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0901, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0902.

    The BCS-profiling confidence interval urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0903 inverts a test of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0904 over a grid for τ. We do not, in practice, exhaust the grid but search inward from the extreme points of Θ in directions ±p. At each τ that is visited, we use BCS code to compute a profiled test statistic and the corresponding critical value urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0905. The latter is a quantile of the minimum of two distinct bootstrap approximations, each of which solves a nonlinear program for each bootstrap draw. Computational cost quickly increases with grid resolution, bootstrap size, and the number of starting points used to solve the nonlinear programs.

    Calibrated projection computes urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0906 by solving a series of linear programs for each bootstrap draw. It computes the extreme points of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0907 by solving the nonlinear program (2.4) twice, a task that is much accelerated by the E-A-M algorithm. Projection of Andrews and Soares (2010) operates very similarly but computes its critical value urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0908 through bootstrap simulation without any optimization.

    We align grid resolution in BCS-profiling with the E-A-M algorithm's convergence threshold of 0.005. We run all methods with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0912 bootstrap draws, and calibrated and “uncalibrated” (i.e., based on Andrews and Soares (2010)) projection also with urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0913. Some other choices differ: BCS-profiling is implemented with their own choice to multi-start the nonlinear programs at three oracle starting points, that is, using knowledge of the true DGP; our implementation of both other methods multi-starts the nonlinear programs from 30 data-dependent random points (see Kaido et al. (2017) for details).

    Table II displays results for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0918 and for 300 Monte Carlo repetitions of all three methods. All confidence intervals are conservative, reflecting the effect of GMS. As expected, uncalibrated projection is most conservative, with coverage of essentially 1. Also, BCS-profiling is more conservative than calibrated projection. The most striking contrast is in computational effort. Here, uncalibrated projection is fastest—indeed, in contrast to received wisdom, this procedure is computationally somewhat easy. This is due to our use of the E-A-M algorithm and therefore part of this paper's contribution. Next, our implementation of calibrated projection beats BCS-profiling with gridding by a factor of about 70. This can be disentangled into the gain from using calibrated projection, with its advantage of bootstrapping linear programs, and the gain afforded by the E-A-M algorithm. It turns out that implementing BCS-profiling with the adapted E-A-M algorithm (see Appendix B) improves computation by a factor of about 4; switching to calibrated projection leads to a further improvement by a factor of about 17. Finally, Table III extends the analysis to all components of θ and to 1000 Monte Carlo repetitions. We were unable to compute this for BCS-profiling.

    Table II. Results for Set 1 With urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0919, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0920, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0921, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0922, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0923a

    Median CI

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0924

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0925

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0926

    Implementation

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0927

    Grid

    E-A-M

    E-A-M

    E-A-M

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0928

    0.95

    [0.330,0.495]

    [0.331,0.495]

    [0.336,0.482]

    [0.290,0.558]

    0.90

    [0.340,0.485]

    [0.340,0.485]

    [0.343,0.474]

    [0.298,0.543]

    0.85

    [0.345,0.475]

    [0.346,0.479]

    [0.348,0.466]

    [0.303,0.537]

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0929

    0.95

    [0.515,0.655]

    [0.514,0.655]

    [0.519,0.650]

    [0.461,0.682]

    0.90

    [0.525,0.647]

    [0.525,0.648]

    [0.531,0.643]

    [0.473,0.675]

    0.85

    [0.530,0.640]

    [0.531,0.642]

    [0.539,0.639]

    [0.481,0.671]

    Coverage

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0930

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0931

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0932

    Grid

    E-A-M

    E-A-M

    E-A-M

    Implementation

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0933

    Lower

    Upper

    Lower

    Upper

    Lower

    Upper

    Lower

    Upper

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0934

    0.95

    0.997

    0.990

    1.000

    0.993

    0.993

    0.977

    1.000

    1.000

    0.90

    0.990

    0.980

    0.993

    0.977

    0.987

    0.960

    1.000

    1.000

    0.85

    0.970

    0.970

    0.973

    0.960

    0.957

    0.930

    1.000

    1.000

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0935

    0.95

    0.987

    0.993

    0.990

    0.993

    0.973

    0.987

    1.000

    1.000

    0.90

    0.977

    0.973

    0.980

    0.977

    0.940

    0.953

    1.000

    1.000

    0.85

    0.967

    0.957

    0.963

    0.960

    0.943

    0.927

    1.000

    1.000

    Average Time

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0936

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0937

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0938

    Implementation

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0939

    Grid

    E-A-M

    E-A-M

    E-A-M

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0940

    0.95

    1858.42

    425.49

    26.40

    18.22

    0.90

    1873.23

    424.11

    25.71

    18.55

    0.85

    1907.84

    444.45

    25.67

    18.18

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0941

    0.95

    1753.54

    461.30

    26.61

    22.49

    0.90

    1782.91

    472.55

    25.79

    21.38

    0.85

    1809.65

    458.58

    25.00

    21.00

    • a (1) Projections of urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0942 are: urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0943, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0944, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0945, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0946, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0947. (2) “Upper” coverage is for urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0948, and similarly for “Lower”. (3) “Average time” is computation time in seconds averaged over MC replications. (4) urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0949 results from BCS-profiling, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0950 is calibrated projection, and urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0951 is uncalibrated projection. (5) “Implementation” refers to the method used to compute the extreme points of the confidence interval.
    Table III. Results for Set 1 With urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0952, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0953, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0954, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0955, urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0956a

    Median CI

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0957 Coverage

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0958 Coverage

    Average Time

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0959

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0960

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0961

    Lower

    Upper

    Lower

    Upper

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0962

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0963

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0964

    0.95

    [0.333,0.478]

    [0.288,0.555]

    0.988

    0.982

    1

    1

    42.41

    22.23

    0.90

    [0.341,0.470]

    [0.296,0.542]

    0.976

    0.957

    1

    1

    41.56

    22.11

    0.85

    [0.346,0.464]

    [0.302,0.534]

    0.957

    0.937

    1

    1

    40.47

    19.79

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0965

    0.95

    [0.525,0.653]

    [0.466,0.683]

    0.969

    0.983

    1

    1

    42.11

    24.39

    0.90

    [0.538,0.646]

    [0.478,0.677]

    0.947

    0.960

    1

    1

    40.15

    28.13

    0.85

    [0.545,0.642]

    [0.485,0.672]

    0.925

    0.941

    1

    1

    41.38

    26.44

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0966

    0.95

    [0.054,0.142]

    [0.020,0.180]

    0.956

    0.958

    1

    1

    40.31

    22.53

    0.90

    [0.060,0.136]

    [0.028,0.172]

    0.911

    0.911

    1

    1

    36.80

    24.15

    0.85

    [0.064,0.132]

    [0.032,0.167]

    0.861

    0.860

    0.999

    0.999

    39.10

    21.81

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0967

    0.95

    [0.156,0.245]

    [0.121,0.281]

    0.952

    0.952

    1

    1

    39.23

    24.66

    0.90

    [0.162,0.238]

    [0.128,0.273]

    0.914

    0.910

    0.998

    0.998

    41.53

    21.66

    0.85

    [0.165,0.234]

    [0.133,0.268]

    0.876

    0.872

    0.996

    0.996

    39.44

    22.83

    urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0968

    0.95

    [0.257,0.344]

    [0.222,0.379]

    0.946

    0.946

    1

    1

    41.45

    22.91

    0.90

    [0.263,0.338]

    [0.230,0.371]

    0.910

    0.909

    0.997

    0.999

    42.09

    22.83

    0.85

    [0.267,0.334]

    [0.235,0.366]

    0.882

    0.870

    0.994

    0.993

    42.19

    23.69

    • a Same DGP and conventions as in Table II.

    In sum, the Monte Carlo experiment on the same DGP used in BCS yields three interesting findings: (i) the E-A-M algorithm accelerates projection of the Andrews and Soares (2010) confidence region to the point that this method becomes reasonably cheap; (ii) it also substantially accelerates computation of profiling intervals, and (iii) for this DGP, calibrated projection combined with the E-A-M algorithm has the most accurate size control while also being computationally attractive.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.