Volume 87, Issue 4 pp. 1397-1432

Notes and Comments

Full Access

Confidence Intervals for Projections of Partially Identified Parameters

Hiroaki Kaido,

Hiroaki Kaido

[email protected]

Department of Economics, Boston University

Search for more papers by this author

Francesca Molinari,

Francesca Molinari

[email protected]

Department of Economics, Cornell University

Search for more papers by this author

Jörg Stoye,

Jörg Stoye

[email protected]

Department of Economics, Cornell University

We are grateful to three anonymous reviewers for very useful suggestions that substantially improved the paper. We thank for their comments Ivan Canay and seminar and conference participants at Amsterdam, Bonn, BC/BU joint workshop, Brown, Cambridge, Chicago, Cologne, Columbia, Cornell, CREST, Duke, ECARES, Harvard/MIT, Kiel, Kobe, Luxembourg, Mannheim, Maryland, Michigan, Michigan State, NUS, NYU, Penn, Penn State, Rochester, Royal Holloway, SMU, Syracuse, Toronto, Toulouse, UCL, UCLA, UCSD, Vanderbilt, Vienna, Yale, Western, and Wisconsin as well as CEME, Cornell-Penn State IO/Econometrics 2015 Conference, ES Asia Meeting 2016, ES European Summer Meeting 2017, ES North American Winter Meeting 2015, ES World Congress 2015, Frontiers of Theoretical Econometrics Conference (Konstanz), KEA-KAEA International Conference, Notre Dame Second Econometrics Workshop, Verein für Socialpolitik Ausschuss für Ökonometrie 2017. We are grateful to Undral Byambadalai, Zhonghao Fu, Debi Mohapatra, Sida Peng, Talal Rahim, Matthew Thirkettle, and Yi Zhang for excellent research assistance. A MATLAB package implementing the method proposed in this paper, Kaido, Molinari, Stoye, and Thirkettle (2017), is available at https://github.com/MatthewThirkettle/calibrated-projection-MATLAB. We are especially grateful to Matthew Thirkettle for his contributions to this package. We gratefully acknowledge financial support through NSF Grants SES-1230071 and SES-1824344 (Kaido), SES-0922330 and SES-1824375 (Molinari), and SES-1260980 and SES-1824375 (Stoye).Search for more papers by this author

Hiroaki Kaido,

Hiroaki Kaido

[email protected]

Department of Economics, Boston University

Search for more papers by this author

Francesca Molinari,

Francesca Molinari

[email protected]

Department of Economics, Cornell University

Search for more papers by this author

Jörg Stoye,

Jörg Stoye

[email protected]

Department of Economics, Cornell University

First published: 25 July 2019

https://doi.org/10.3982/ECTA14075

Citations: 61

Share a link

Email
Wechat
Bluesky

Abstract

We propose a bootstrap-based calibrated projection procedure to build confidence intervals for single components and for smooth functions of a partially identified parameter vector in moment (in)equality models. The method controls asymptotic coverage uniformly over a large class of data generating processes. The extreme points of the calibrated projection confidence interval are obtained by extremizing the value of the function of interest subject to a proper relaxation of studentized sample analogs of the moment (in)equality conditions. The degree of relaxation, or critical level, is calibrated so that the function of θ, not θ itself, is uniformly asymptotically covered with prespecified probability. This calibration is based on repeatedly checking feasibility of linear programming problems, rendering it computationally attractive.

Nonetheless, the program defining an extreme point of the confidence interval is generally nonlinear and potentially intricate. We provide an algorithm, based on the response surface method for global optimization, that approximates the solution rapidly and accurately, and we establish its rate of convergence. The algorithm is of independent interest for optimization problems with simple objectives and complicated constraints. An empirical application estimating an entry game illustrates the usefulness of the method. Monte Carlo simulations confirm the accuracy of the solution algorithm, the good statistical as well as computational performance of calibrated projection (including in comparison to other methods), and the algorithm's potential to greatly accelerate computation of other confidence intervals.

1 Introduction

This paper provides novel confidence intervals for projections and smooth functions of a parameter vector $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0001$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0002$ , that is partially or point identified through a finite number of moment (in)equalities. In addition, we develop a new algorithm for computing these confidence intervals and, more generally, for solving optimization problems with “black box” constraints, and obtain its rate of convergence.

Until recently, the rich literature on inference for moment (in)equalities focused on confidence sets for the entire vector θ, usually obtained by test inversion as

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0003$

where the test statistic $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0004$ aggregates violations of the sample analog of the moment (in)equalities and the critical value $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0005$ controls asymptotic coverage, often uniformly over a large class of data generating processes (DGPs). However, applied researchers are frequently interested in a specific component (or function) of θ, for example, the returns to education. Even if not, they may simply want to report separate confidence intervals for components of a vector, as is standard practice in other contexts. Thus, consider inference on the projection $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0006$ , where p is a known unit vector. To date, it is common to report as confidence set the corresponding projection of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0007$ or the interval

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0008$ (1.1)

which will miss any “gaps” in a disconnected projection but is much easier to compute. This approach yields asymptotically valid but typically conservative and therefore needlessly large confidence intervals. The potential severity of this effect is easily appreciated in a point identified example. Given a $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0009$ -consistent estimator $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0010$ with limiting covariance matrix equal to the identity matrix, the usual 95% confidence interval for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0011$ equals $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0012$ . Yet the analogy to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0013$ would be projection of a 95% confidence ellipsoid, which with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0014$ yields $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0015$ and a true coverage of essentially 1.

Our first contribution is to provide a bootstrap-based calibrated projection method to largely anticipate and correct for the conservative effect of projection. The method uses an estimated critical level $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0016$ calibrated so that the projection of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0017$ covers $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0018$ (but not necessarily θ) with probability at least $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0019$ . As a confidence region for the true $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0020$ , one may report this projection, that is,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0021$ (1.2)

or, for computational simplicity and presentational convenience, the interval

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0022$ (1.3)

We prove uniform asymptotic validity of both over a large class of DGPs.

Computationally, calibration of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0023$ is relatively attractive: We linearize all constraints around θ, so that coverage of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0024$ can be calibrated by analyzing many linear programs. Nonetheless, computing the above objects is challenging in moderately high dimension. This brings us to our second contribution, namely, a general method to accurately and rapidly compute confidence intervals whose construction resembles (1.3). Additional applications within partial identification include projection of confidence regions defined in Chernozhukov, Hong, and Tamer (2007), Andrews and Soares (2010), or Andrews and Shi (2013), as well as (with minor tweaking; see Appendix B) the confidence interval proposed in Bugni, Canay, and Shi (2017, BCS henceforth) and further discussed later. In an application to a point identified setting, Freyberger and Reeves (2017, Supplement Section S.3) used our method to construct uniform confidence bands for an unknown function of interest under (nonparametric) shape restrictions. They benchmarked it against gridding and found it to be accurate at considerably improved speed. More generally, the method can be broadly used to compute confidence intervals for optimal values of optimization problems with estimated constraints.

Our algorithm (henceforth called E-A-M for Evaluation-Approximation-Maximization) is based on the response surface method; thus, it belongs to the family of expected improvement algorithms (see, e.g., Jones (2001), Jones, Schonlau, and Welch (1998), and references therein). Bull (2011) established convergence of an expected improvement algorithm for unconstrained optimization problems where the objective is a “black box” function. The rate of convergence that he derived depends on the smoothness of the black box objective function. We substantially extend his results to show convergence, at a slightly slower rate, of our similar algorithm for constrained optimization problems in which the constraints are sufficiently smooth “black box” functions. Extensive Monte Carlo experiments (see Appendix C and Section 5 of Kaido, Molinari, and Stoye (2017)) confirm that the E-A-M algorithm is fast and accurate.

Relation to Existing Literature. The main alternative inference procedure for projections—introduced in Romano and Shaikh (2008) and significantly advanced in BCS—is based on profiling out a test statistic. The classes of DGPs for which calibrated projection and the profiling-based method of BCS (BCS-profiling henceforth) can be shown to be uniformly valid are non-nested.1

Computationally, calibrated projection has the advantage that the bootstrap iterates over linear as opposed to nonlinear programming problems. While the “outer” optimization problems in (1.3) are potentially intricate, our algorithm is geared toward them. Monte Carlo simulations suggest that these two factors give calibrated projection a considerable computational edge over profiling, though profiling can also benefit from the E-A-M algorithm. Indeed, in Appendix C, we replicate the Monte Carlo experiment of BCS and find that adapting E-A-M to their method improves computation time by a factor of about 4, while switching to calibrated projection improves it by a further factor of about 17.

In an influential paper, Pakes, Porter, Ho, and Ishii (2011, PPHI henceforth) also used linearization but, subject to this approximation, directly bootstrapped the sample projection. This is valid only under stringent conditions.2 Other related articles that explicitly consider inference on projections include Beresteanu and Molinari (2008), Bontemps, Magnac, and Maurin (2012), Kaido (2016), and Kline and Tamer (2016). None of these establish uniform validity of confidence sets. Chen, Christensen, and Tamer (2018) established uniform validity of MCMC-based confidence intervals for projections, but aimed at covering the projection of the entire identified region $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0025$ (defined later) and not just of the true θ. Gafarov, Meier, and Montiel-Olea (2016) used our insight in the context of set identified spatial VARs.

Regarding computation, previous implementations of projection-based inference (e.g., Ciliberto and Tamer (2009), Grieco (2014), Dickstein and Morales (2018)) reported the smallest and largest value of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0026$ among parameter values $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0027$ that were discovered using, for example, grid-search or simulated annealing with no cooling. This becomes computationally cumbersome as d increases because it typically requires a number of evaluation points that grows exponentially with d. In contrast, using a probabilistic model, our method iteratively draws evaluation points from regions that are considered highly relevant for finding the confidence interval's endpoint. In applications, this tends to substantially reduce the number of evaluation points.

Structure of the Paper. Section 2 sets up notation and describes our approach in detail, including computational implementation of the method and choice of tuning parameters. Section 3.1 establishes uniform asymptotic validity of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0028$ , and Section 3.2 shows that our algorithm converges at a specific rate which depends on the smoothness of the constraints. Section 4 reports the results of an empirical application that revisits the analysis in Kline and Tamer (2016, Section 8). Section 5 draws conclusions. The proof of convergence of our algorithm is in Appendix A. Appendix B shows that our algorithm can be used to compute BCS-profiling confidence intervals. Appendix C reports the results of Monte Carlo simulations comparing our proposed method with that of BCS. All other proofs, background material for our algorithm, and additional results are in the Supplemental Material (Kaido, Molinari, and Stoye (2019)).3

2 Detailed Explanation of the Method

2.1 Setup and Definition of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0031$

Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0032$ be a random vector with distribution P, let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0033$ denote the parameter space, and let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0034$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0035$ denote known measurable functions characterizing the model. The true parameter value θ is assumed to satisfy the moment inequality and equality restrictions

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0036$ (2.1)

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0037$ (2.2)

The identification region $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0038$ is the set of parameter values in Θ satisfying (2.1)–(2.2). For a random sample $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0039$ of observations drawn from P, we write

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0040$

for the sample moments and the analog estimators of the population moment functions' standard deviations $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0041$ . The confidence interval in (1.3) then is

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0042$ (2.3)

with

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0043$ (2.4)

and similarly for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0044$ . Henceforth, to simplify notation, we write $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0045$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0046$ . We also define $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0047$ moments, where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0048$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0049$ . That is, we treat moment equality constraints as two opposing inequality constraints.

For a class of DGPs $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0050$ that we specify below, define the asymptotic size of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0051$ by4

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0053$ (2.5)

We next explain how to control this size and then how to compute $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0054$ .

2.2 Calibration of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0055$

Calibration of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0056$ requires careful analysis of the moment restrictions' local behavior at each point in the identification region. This is because the extent of projection conservatism depends on (i) the asymptotic behavior of the sample moments entering the inequality restrictions, which can change discontinuously depending on whether they bind at θ or not, and (ii) the local geometry of the identification region at θ, that is, the shape of the constraint set formed by the moment restrictions. Features (i) and (ii) can be quite different at different points in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0057$ , making uniform inference challenging. In particular, (ii) does not arise if one only considers inference for the entire parameter vector, and hence is a new challenge requiring new methods.

To build an intuition, fix $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0058$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0059$ . The projection of θ is covered when

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0060$ (2.6)

Here, we first substituted $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0061$ and took λ to be the choice parameter; intuitively, this localizes around θ at rate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0062$ . We then make the event smaller by adding the constraint $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0063$ , with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0064$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0065$ a tuning parameter. We motivate this step later.

Our goal is to set the probability of (2.6) equal to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0066$ . To ease computation, we approximate (2.6) by linear expansion in λ of the constraint set. For each j, add and subtract $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0067$ and apply the mean value theorem to obtain

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0068$ (2.7)

Here $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0069$ is a normalized empirical process indexed by $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0070$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0071$ is the gradient of the normalized moment, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0072$ is the studentized population moment, and the mean value $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0073$ lies componentwise between θ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0074$ .5

We formally establish that the probability of the last event in (2.6) can be approximated by the probability that 0 lies between the optimal values of two stochastic linear programs. The components that characterize these programs can be estimated. Specifically, we replace $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0076$ with a uniformly consistent (on compact sets) estimator, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0077$ ,6 and the process $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0078$ with its simple nonparametric bootstrap analog, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0079$ .7 Estimation of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0084$ is more subtle because it enters (2.7) scaled by $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0085$ , so that a sample analog estimator will not do. However, this specific issue is well understood in the moment inequalities literature. Following Andrews and Soares (2010, AS henceforth) and others (Bugni (2010), Canay (2010), Stoye (2009)), we shrink this sample analog toward zero, leading to conservative (if any) distortion in the limit. Formally, we estimate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0086$ by $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0087$ , where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0088$ is one of the Generalized Moment Selection (GMS henceforth) functions proposed by AS,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0089$

and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0090$ is a user-specified thresholding sequence.8 In sum, we replace the random constraint set in (2.6) with the (bootstrap-based) random polyhedral set9

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0096$ (2.8)

The critical level $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0097$ to be used in (2.4) then is

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0098$ (2.9)

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0099$ (2.10)

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0100$ denotes the law of the random set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0101$ induced by the bootstrap sampling process, that is, by the distribution of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0102$ conditional on the data. Expression (2.10) uses convexity of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0103$ and reveals that the probability inside curly brackets can be assessed by repeatedly checking feasibility of a linear program.10 We describe in detail in Supplemental Material Appendix D.4 how we compute $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0107$ through a root-finding algorithm.

We conclude by motivating the “ρ-box constraint” in (2.6), which is a major novel contribution of this paper. The constraint induces conservative bias but has two fundamental benefits: First, it ensures that the linear approximation of the feasible set in (2.6) by (2.8) is used only in a neighborhood of θ, and therefore that it is uniformly accurate. More subtly, it ensures that coverage induced by a given c depends continuously on estimated parameters even in certain intricate cases. This renders calibrated projection valid in cases that other methods must exclude by assumption.11

2.3 Computation of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0120$ and of Similar Confidence Intervals

Projection-based methods as in (1.1) and (1.3) have nonlinear constraints involving a critical value which, in general, is an unknown function, with unknown gradient, of θ. Similar considerations often apply to critical values used to build confidence intervals for optimal values of optimization problems with estimated constraints. When the dimension of the parameter vector is large, directly solving optimization problems with such constraints can be expensive even if evaluating the critical value at each θ is cheap.

This concern motivates this paper's second main contribution, namely, a novel algorithm for constrained optimization problems of the following form:

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0121$ (2.11)

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0122$ is an optimal solution of the problem and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0123$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0124$ as well as $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0125$ are fixed functions of θ. In our own application, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0126$ and, for calibrated projection, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0127$ .12

The key issue is that evaluating $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0130$ is costly.13 Our algorithm does so at relatively few values of θ. Elsewhere, it approximates $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0132$ through a probabilistic model that gets updated as more values are computed. We use this model to determine the next evaluation point but report as tentative solution the best value of θ at which $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0133$ was computed, not a value at which it was merely approximated. Under reasonable conditions, the tentative optimal values converge to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0134$ at a rate (relative to iterations of the algorithm) that is formally established in Section 3.2.

After drawing an initial set of evaluation points that we set to grow linearly with d, the algorithm has three steps called E, A, and M below.

Initialization: Draw randomly (uniformly) over Θ a set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0135$ of initial evaluation points. Evaluate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0136$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0137$ . Initialize $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0138$ .

E-step: Evaluate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0139$ and record the tentative optimal value

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0140$

with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0141$ .

A-step: Approximate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0142$ by a flexible auxiliary model. We use a Gaussian-process regression model (or kriging), which for a mean-zero Gaussian process $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0143$ indexed by θ and with constant variance $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0144$ specifies

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0145$ (2.12)

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0146$ (2.13)

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0147$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0148$ is a kernel with parameter vector $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0149$ ; for example, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0150$ . The unknown parameters $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0151$ can be estimated by running a GLS regression of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0152$ on a constant with the given correlation matrix. The unknown parameters β can be estimated by a (concentrated) MLE.

The (best linear) predictor of the critical value and its gradient at θ are then given by

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0153$

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0154$ is a vector whose ℓth component is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0155$ as given above with estimated parameters, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0156$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0157$ is an L-by-L matrix whose $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0158$ entry is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0159$ with estimated parameters. This surrogate model has the property that its predictor satisfies $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0160$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0161$ . Hence, it provides an analytical interpolation, with analytical gradient, of evaluation points of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0162$ .14 The uncertainty left in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0163$ is captured by the variance

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0164$

M-step: With probability $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0165$ , obtain the next evaluation point $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0166$ as

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0167$ (2.14)

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0168$ is the expected improvement function.15 This step can be implemented by standard nonlinear optimization solvers, for example, MATLAB's fmincon or KNITRO (see Appendix D.3 for details). With probability $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0170$ , draw $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0171$ randomly from a uniform distribution over Θ. Set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0172$ and return to the E-step.

The algorithm yields an increasing sequence of tentative optimal values $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0173$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0174$ , with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0175$ satisfying the true constraints in (2.11) but the sequence of evaluation points leading to it obtained by maximization of expected improvement defined with respect to the approximated surface. Once a convergence criterion is met, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0176$ is reported as the endpoint of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0177$ . We discuss convergence criteria in Appendix C.

The advantages of E-A-M are as follows. First, we control the number of points at which we evaluate the critical value; recall that this evaluation is the expensive step. Also, the initial k evaluations can easily be parallelized. For any additional E-step, one needs to evaluate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0178$ only at a single point $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0179$ . The M-step is crucial for reducing the number of additional evaluation points. To determine the next evaluation point, it trades off “exploitation” (i.e., the benefit of drawing a point at which the optimal value is high) against “exploration” (i.e., the benefit of drawing a point in a region in which the approximation error of c is currently large) through maximizing expected improvement.16 Finally, the algorithm simplifies the M-step by providing constraints and their gradients for program (2.14) in closed form, thus greatly aiding fast and stable numerical optimization. The price is the additional approximation step. In the empirical application in Section 4 and in the numerical exercises of Appendix C, this price turns out to be low.

2.4 Choice of Tuning Parameters

Practical implementation of calibrated projection and the E-A-M algorithm is detailed in Kaido et al. (2017). It involves setting several tuning parameters, which we now discuss.

Calibration of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0180$ in (2.10) must be tuned at two points, namely, the use of GMS and the choice of ρ. The trade-offs in setting these tuning parameters are apparent from inspection of (2.8). GMS is parameterized by a shrinkage function φ and a sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0181$ that controls the rate of shrinkage. In practice, choice of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0182$ is more delicate. A smaller $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0183$ will make $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0184$ larger, hence increase bootstrap coverage probability for any given c, hence reduce $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0185$ and therefore make for shorter confidence intervals—but the uniform asymptotics will be misleading, and finite sample coverage therefore potentially off target, if $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0186$ is too small. We follow the industry standard set by AS and recommend $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0187$ .

The trade-off in choosing ρ is similar but reversed. A larger ρ will expand $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0188$ and therefore make for shorter confidence intervals, but (our proof of) uniform validity of inference requires $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0189$ . Indeed, calibrated projection with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0190$ will disregard any projection conservatism and (as is easy to show) exactly recovers projection of the AS confidence set. Intuitively, we then want to choose ρ large but not too large.

To this end, we heuristically calibrate ρ based on how much conservative distortion one is willing to accept in well-behaved cases. This distortion—denote it η, for which we suggest a numerical value of 0.01—is compared against a bound on conservative distortion that is itself likely to be conservative but data-free and trivial to compute. In particular, we set

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0191$

The underlying heuristic is as follows: If all basic solutions (i.e., intersections of exactly d constraints) that potentially define vertices of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0192$ realize inside the ρ-box, then the ρ-box cannot affect the values in (2.9) and hence not whether coverage obtains in a given bootstrap sample. Conversely, the probability that at least one basic solution realizes outside the ρ-box bounds from above the conservative distortion. This probability is, of course, dependent on unknown parameters. Our data-free approximation imputes multivariate standard normal distributions for all basic solutions and Bonferroni adjustment to handle their covariation.17

The E-A-M algorithm also has two tuning parameters. One is k, the initial number of evaluation points. The other is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0198$ , the probability of drawing $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0199$ randomly from a uniform distribution on Θ instead of by maximizing $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0200$ . In calibrated projection use of the E-A-M algorithm, there is a single “black box” function, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0201$ . We therefore suggest setting $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0202$ , similarly to the recommendation in Jones, Schonlau, and Welch (1998, p. 473). In our Monte Carlo exercises, we experimented with larger values, for example, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0203$ , and found that the increased number had no noticeable effect on the computed $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0204$ . If a user applies our E-A-M algorithm to a constrained optimization problem with many “black box” functions to approximate, we suggest using a larger number of initial points.

The role of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0205$ (e.g., Bull (2011, p. 2889)) is to trade off the greediness of the $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0206$ maximization criterion with the overarching goal of global optimization. Sutton and Barto (1998, pp. 28–29) explored the effect of setting $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0207$ and 0.01 on different optimization problems, and found that for sufficiently large L, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0208$ performs better. In our own simulations, we have found that drawing both a uniform point and computing the value of θ for each L (thereby sidestepping the choice of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0209$ ) is fast and accurate, and that is what we recommend doing.

3 Theoretical Results

3.1 Asymptotic Validity of Inference

In this section, we establish that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0210$ is uniformly asymptotically valid in the sense of ensuring that (2.5) equals at least $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0211$ .18 The result applies to: (i) confidence intervals for one projection; (ii) joint confidence regions for several projections, in particular confidence hyperrectangles for subvectors; (iii) confidence intervals for smooth nonlinear functions $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0212$ . Examples of the latter extension include policy analysis and estimation of partially identified counterfactuals as well as demand extrapolation subject to rationality constraints.

Theorem 3.1.Suppose Assumptions E.1, E.2, E.3, E.4, and E.5 hold. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0213$ .

(I) Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0214$ be as defined in (1.3), with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0215$ as in (2.10). Then:
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0216$ (3.1)
(II) Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0217$ denote unit vectors in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0218$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0219$ . Then:
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0220$
where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0221$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0222$ .
(III) Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0223$ be a confidence interval whose lower and upper points are obtained solving
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0224$
where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0225$ . Suppose that there exist $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0226$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0227$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0228$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0229$ , where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0230$ is the gradient of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0231$ .19 Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0232$ . Then:
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0233$

All assumptions can be found in Supplemental Material Appendix E.1. Assumptions E.1 and E.5 are mild regularity conditions typical in the literature; see, for example, Definition 4.2 and the corresponding discussion in BCS. Assumption E.2 is based on AS and constrains the GMS function $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0234$ as well as the rate at which $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0235$ diverges. Assumption E.4 requires normalized population moments to be sufficiently smooth and consistently estimable. Assumption E.3 is our key departure from the related literature. In essence, it requires that the correlation matrix of the moment functions corresponding to close-to-binding moment conditions has eigenvalues uniformly bounded from below.20 Under this condition, we are able to show that in the limit problem corresponding to (2.6)—where constraints are replaced with their local linearization using population gradients and Gaussian processes—the probability of coverage increases continuously in c. If such continuity is directly assumed (Assumption E.6), Theorem 3.1 remains valid (Supplemental Material Appendix G.2.2). While the high level Assumption E.6 is similar in spirit to a key condition (Assumption A.2) in BCS, we propose Assumption E.3 due to its familiarity and ease of interpretation; a similar condition is required for uniform validity of standard point identified Generalized Method of Moments inference. In Supplemental Material Appendix F.2, we verify that our assumptions hold in some of the canonical examples in the partial identification literature: mean with missing data, linear regression and best linear prediction with interval data (and discrete covariates), entry games with multiple equilibria (and discrete covariates), and semiparametric binary regression models with discrete or interval valued covariates (as in Magnac and Maurin (2008)).

Assumptions E.1–E.5 define the class of DGPs over which our proposed method yields uniformly asymptotically valid coverage. This class is non-nested with the class of DGPs over which the profiling-based methods of Romano and Shaikh (2008) and BCS are uniformly asymptotically valid. Kaido, Molinari, and Stoye (2017, Section 4.2 and Supplemental Appendix F) showed that in well-behaved cases, calibrated projection and BCS-profiling are asymptotically equivalent. They also provided conditions under which calibrated projection has lower probability of false coverage in finite sample, thereby establishing that the two methods' finite sample power properties are non-ranked.

3.2 Convergence of the E-A-M Algorithm

We next provide formal conditions under which the sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0236$ generated by the E-A-M algorithm converges to the true endpoint of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0237$ as $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0238$ at a rate that we obtain. Although $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0239$ , so that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0240$ satisfies the true constraints for each L, the sequence of evaluation points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0241$ is mostly obtained through expected improvement maximization (M-step) with respect to the approximating surface $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0242$ . Because of this, a requirement for convergence is that the function $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0243$ is sufficiently smooth, so that the approximation error in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0244$ vanishes uniformly in θ as $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0245$ .21 We furthermore assume that the constraint set in (2.11) satisfies a degeneracy condition introduced to the partial identification literature by Chernozhukov, Hong, and Tamer (2007, Condition C.3).22 In our application, the condition requires that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0247$ has an interior and that the inequalities in (2.4), when evaluated at points in a (small) τ-contraction of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0248$ , are satisfied with a slack that is proportional to τ. Theorem 3.2 below establishes that these conditions jointly ensure convergence of the E-A-M algorithm at a specific rate. This is a novel contribution to the literature on response surface methods for constrained optimization.

In the formal statement below, the expectation $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0249$ is taken with respect to the law of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0250$ determined by the Initialization step and the M-step but conditioning on the sample. We refer to Appendix A for a precise definition of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0251$ and a proof of the theorem.

Theorem 3.2.Suppose $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0252$ is a compact hyperrectangle with nonempty interior, that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0253$ , and that Assumptions A.1, A.2, and A.3 hold. Let the evaluation points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0254$ be drawn according to the Initialization and M-steps. Then

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0255$ (3.2)

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0256$ is the $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0257$ -norm under $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0258$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0259$ , and the constants $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0260$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0261$ are defined in Assumption A.1. If $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0262$ , the statement in (3.2) holds for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0263$ .

The requirement that Θ is a compact hyperrectangle with nonempty interior can be replaced by a requirement that Θ belongs to the interior of a closed hyperrectangle in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0264$ . Assumption A.1 specifies the types of kernel to be used to define the correlation functional in (2.13). Assumption A.2 collects requirements on differentiability of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0265$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0266$ , and smoothness of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0267$ . Assumption A.3 is the degeneracy condition discussed above.

To apply Theorem 3.2 to calibrated projection, we provide low-level conditions (Assumption D.1 in Supplemental Material Appendix D.1.1) under which the map $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0268$ uniformly stochastically satisfies a Lipschitz-type condition. To get smoothness, we work with a mollified version of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0269$ , denoted $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0270$ in equation (D.1), where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0271$ .23 Theorem D.1 in the Supplemental Material shows that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0272$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0273$ can be made uniformly arbitrarily close, and that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0274$ yields valid inference as in (3.1). In practice, we directly apply the E-A-M steps to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0275$ .

The key condition imposed in Theorem D.1 is Assumption D.1. It requires that the GMS function used is Lipschitz in its argument,24 and that the standardized moment functions are Lipschitz in θ. In Supplemental Material Appendix F.1, we establish that the latter condition is satisfied by some canonical examples in the moment (in)equality literature: mean with missing data, linear regression and best linear prediction with interval data (and discrete covariates), entry games with multiple equilibria (and discrete covariates), and semiparametric binary regression models with discrete or interval valued covariates (as in Magnac and Maurin (2008)).25

The E-A-M algorithm is proposed as a method to implement our statistical procedure, not as part of the statistical procedure itself. As such, its approximation error is not taken into account in Theorem 3.1. Our comparisons of the confidence intervals obtained through the use of E-A-M as opposed to directly solving problems (2.4) through the use of MATLAB's fmincon in our empirical application in the next section suggest that such error is minimal.

4 Empirical Illustration: Estimating a Binary Game

We employ our method to revisit the study in Kline and Tamer (2016, Section 8) of “what explains the decision of an airline to provide service between two airports.” We use their data and model specification.26 Here, we briefly summarize the setup and refer to Kline and Tamer (2016) for a richer discussion.

The study examines entry decisions of two types of firms, namely, Low Cost Carriers ( $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0277$ ) versus Other Airlines ( $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0278$ ). A market is defined as a trip between two airports, irrespective of intermediate stops. The entry decision $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0279$ of player $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0280$ in market i is recorded as a 1 if a firm of type ℓ serves market i and 0 otherwise. Firm ℓ's payoff equals $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0281$ , where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0282$ is the opponent's entry decision. Each firm enters if doing so generates nonnegative payoffs. The observable covariates in the vector $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0283$ include the constant and the variables $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0284$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0285$ . The former is market size, a market-specific variable common to all airlines in that market and defined as the population at the endpoints of the trip. The latter is a firm-and-market-specific variable measuring the market presence of firms of type ℓ in market i (see Kline and Tamer (2016, p. 356 for its exact definition). While $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0286$ enters the payoff function of both firms, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0287$ (respectively, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0288$ ) is excluded from the payoff of firm $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0289$ (respectively, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0290$ ). Each of market size and of the two market presence variables is transformed into binary variables based on whether they realized above or below their respective median. This leads to a total of eight market types, hence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0291$ moment inequalities and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0292$ moment equalities. The unobserved payoff shifters $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0293$ are assumed to be i.i.d. across i and to have a bivariate normal distribution with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0294$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0295$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0296$ for each i and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0297$ , where the correlation r is to be estimated. Following Kline and Tamer (2016), we assume that the strategic interaction parameters $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0298$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0299$ are negative, that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0300$ , and that the researcher imposes these sign restrictions. To ensure that Assumption E.4 is satisfied,27 we furthermore assume that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0305$ and use this value as its upper bound in the definition of the parameter space.

The results of the analysis are reported in Table I, which displays $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0306$ nominal confidence intervals (our $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0307$ as defined in equations (2.3)–(2.4)) for each parameter. The output of the E-A-M algorithm is displayed in the accordingly labeled column. The next column shows a robustness check, namely, the output of MATLAB's fmincon function, henceforth labeled “direct search,” that was started at each of a widely spaced set of feasible points that were previously discovered by the E-A-M algorithm. We emphasize that this is a robustness or accuracy check, not a horse race: Direct search mechanically improves on E-A-M because it starts (among other points) at the point reported by E-A-M as optimal feasible. Using the standard MultiStart function in MATLAB instead of the points discovered by E-A-M produces unreliable and extremely slow results. In 10 out of 18 optimization problems that we solved, the E-A-M algorithm's solution came within its set tolerance (0.005) from the direct search solution. The other optimization problems were solved by E-A-M with a minimal error of less than $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0308$ .

Table I. Results for Empirical Application, With $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0309$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0310$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0311$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0312$ a

	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0313$		Computational Time
	E-A-M	Direct Search	E-A-M	Direct Search	Total
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0314$	[−2.0603,−0.8510]	[−2.0827,−0.8492]	24.73	32.46	57.51
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0315$	[0.1880,0.4029]	[0.1878,0.4163]	16.18	230.28	246.49
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0316$	[1.7510,1.9550]	[1.7426,1.9687]	16.07	115.20	131.30
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0317$	[0.3957,0.5898]	[0.3942,0.6132]	27.61	107.33	137.66
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0318$	[0.3378,0.5654]	[0.3316,0.5661]	11.90	141.73	153.66
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0319$	[0.3974,0.5808]	[0.3923,0.5850]	13.53	148.20	161.75
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0320$	[−1.4423,−0.1884]	[−1.4433,−0.1786]	15.65	119.50	135.17
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0321$	[−1.4701,−0.7658]	[−1.4742,−0.7477]	13.06	114.14	127.23
r	[0.1855,0.85]	[0.1855,0.85]	5.37	42.38	47.78

a “Direct search” refers to fmincon performed after E-A-M and starting from feasible points discovered by E-A-M, including the E-A-M optimum.

Table I also reports computational time of the E-A-M algorithm, of the subsequent direct search, and the total time used to compute the confidence intervals. The direct search greatly increases computation time with small or negligible benefit. Also, computational time varied substantially across components. We suspect this might be due to the shape of the level sets of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0322$ : By manually searching around the optimal values of the program, we verified that the level sets in specific directions can be extremely thin, rendering search more challenging.

Comparing our findings with those in Kline and Tamer (2016), we see that the results qualitatively agree. The confidence intervals for the interaction effects ( $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0323$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0324$ ) and for the effect of market size on payoffs ( $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0325$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0326$ ) are similar to each other across the two types of firms. The payoffs of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0327$ firms seem to be impacted more than those of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0328$ firms by market presence. On the other hand, monopoly payoffs for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0329$ firms seem to be smaller than for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0330$ firms.28 The confidence interval on the correlation coefficient is quite large and includes our upper bound of 0.85.29

For most components, our confidence intervals are narrower than the corresponding 95% credible sets reported in Kline and Tamer (2016).30 However, the intervals are not comparable for at least two reasons: We impose a stricter upper bound on r and we aim to cover the projections of the true parameter value as opposed to the identified set.

Overall, our results suggest that in a reasonably sized, empirically interesting problem, calibrated projection yields informative confidence intervals. Furthermore, the E-A-M algorithm appears to accurately and quickly approximate solutions to complex smooth nonlinear optimization problems.

5 Conclusion

This paper proposes a confidence interval for linear functions of parameter vectors that are partially identified through finitely many moment (in)equalities. The extreme points of our calibrated projection confidence interval are obtained by minimizing and maximizing $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0332$ subject to properly relaxed sample analogs of the moment conditions. The relaxation amount, or critical level, is computed to insure uniform asymptotic coverage of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0333$ rather than θ itself. Its calibration is computationally attractive because it is based on repeatedly checking feasibility of (bootstrap) linear programming problems. Computation of the extreme points of the confidence intervals is furthermore attractive thanks to an application of the response surface method for global optimization; this is a novel contribution of independent interest. Indeed, one key result is a convergence rate for this algorithm when applied to constrained optimization problems in which the objective function is easy to evaluate but the constraints are “black box” functions. The result is applicable to any instance when the researcher wants to compute confidence intervals for optimal values of constrained optimization problems. Our empirical application and Monte Carlo analysis show that, in the DGPs that we considered, calibrated projection is fast and accurate, and also that the E-A-M algorithm can greatly improve computation of other confidence intervals.

1 See Kaido, Molinari, and Stoye (2017, Section 4.2 and Supplemental Appendix F) for a comparison of the statistical properties of calibrated projection and BCS-profiling, summarized here at the end of Section 3.2.

2 The published version of PPHI, i.e., Pakes et al. (2015), does not contain the inference part. Kaido, Molinari, and Stoye (2017, Section 4.2) showed that calibrated projection can be much simplified under the conditions imposed by PPHI.

3 Appendix D provides convergence-related results and background material for our algorithm and describes how to compute $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0029$ . Appendix E presents the assumptions under which we prove uniform asymptotic validity of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0030$ . Appendix F verifies, for a number of canonical partial identification problems, the assumptions that we invoke to show validity of our inference procedure and for our algorithm. Appendix G contains the proof of Theorem 3.1. Appendix H collects lemmas supporting this proof.

4 Here we focus on the confidence interval $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0052$ defined in (1.3). See Appendix G.2.3 for the analysis of the confidence region given by the mathematical projection in (1.2).

5 The mean value $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0075$ changes with j but we omit the dependence to ease notation.

6 See Supplemental Material Appendix F for such estimators in some canonical moment (in)equality examples.

7 BCS approximated $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0080$ by $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0081$ with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0082$ i.i.d. This approximation is equally valid in our approach, and can be faster as it avoids repeated evaluation of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0083$ .

8 A common choice of φ is given componentwise by

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0091$

Restrictions on φ and the rate at which $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0092$ diverges are imposed in Assumption E.2. While for concreteness here we write out the “hard thresholding” GMS function, Theorem 3.1 below applies to all but one of the GMS functions in AS, namely, to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0093$ , all of which depend on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0094$ . We do not consider GMS function $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0095$ , which depends also on the covariance matrix of the moment functions.

9 Here, we implicitly assume that Θ is a polyhedral set. If it is instead defined by smooth convex (in)equalities, these can be linearized, too.

10 We implement a program in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0104$ for simplicity but, because $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0105$ , one could reduce this to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0106$ .

11 In (2.8), set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0108$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0109$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0110$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0111$ . Then simple algebra reveals that (with or without ρ-box) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0112$ . If $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0113$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0114$ , then without ρ-box we have $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0115$ for any small $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0116$ , and we therefore cannot expect to get $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0117$ right if gradients are estimated. With ρ-box, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0118$ as $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0119$ , so the problem goes away. This stylized example is relevant because it resembles polyhedral identified sets where one face is near orthogonal to p. It violates assumptions in BCS and PPHI.

12 We emphasize that, in analyzing the computational problem, we take the data, including bootstrap data, as given. Thus, while an econometrician would usually think of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0128$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0129$ as random variables, for this section's purposes they are indeed just functions of θ.

13 For simplicity and to mirror our motivating application, we suppose that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0131$ is easy to compute. The algorithm is easily adapted to the case where it is not. Indeed, in Appendix B, we show how E-A-M can be employed to compute BCS-profiling confidence intervals, where the profiled test statistic itself is costly to compute and is approximated together with the critical value.

14 See details in Jones, Schonlau, and Welch (1998). We use the DACE MATLAB kriging toolbox (http://www2.imm.dtu.dk/projects/dace/) for this step in our empirical application and Monte Carlo experiments.

15 Heuristically, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0169$ is the expected improvement gained from analyzing parameter value θ for a Bayesian whose current beliefs about c are described by the estimated model. Indeed, for each θ, the maximand in (2.14) multiplies improvement from learning that θ is feasible with this Bayesian's probability that it is.

16 It is also possible to draw multiple points in each iteration (Schonlau, Welch, and Jones (1998)), as we do in our implementation of the method.

17 To reproduce the expression, recall that if $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0193$ random variables in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0194$ are individually multivariate standard normal, then a Bonferroni upper bound on the probability that not all of them realize inside the ρ-box equals $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0195$ . Also, if Bonferroni is replaced with an independence assumption, the expression changes to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0196$ . The numerical difference is negligible for moderate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0197$ .

18 In Appendix G.2.3, we show that the result actually applies to the mathematical projection in (1.2).

19 Because the function f is known, these conditions can be easily verified in practice (especially if the first one is strengthened to hold over Θ).

20 Assumption E.3 allows for high correlation among moment inequalities that cannot cross. This covers equality constraints but also entry games as the ones studied in Ciliberto and Tamer (2009).

21 As in Bull (2011), our convergence result accounts for the fact that the parameters of the Gaussian process prior in (2.12) are re-estimated for each iteration of the A-step using the “training data” $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0246$ .

22 Chernozhukov, Hong, and Tamer (2007, eq. (4.6)) imposed the condition on the population identified set.

23 For a discussion of mollification, see, for example, Rockafellar and Wets (2005, Example 7.19).

24 This requirement rules out the GMS function in footnote 8, but it is satisfied by other GMS functions proposed by AS.

25 For these same examples, we verify the differentiability requirement in Assumption A.2 on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0276$ .

26 The data, which pertain to the second quarter of the year 2010, are downloaded from http://qeconomics.org/ojs/index.php/qe/article/downloadSuppFile/371/1173.

27 This assumption, common in the literature on projection inference, requires that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0301$ are Lipschitz in θ and have bounded norm. But $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0302$ includes a denominator equal to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0303$ . As $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0304$ , this leads to a violation of the assumption and to numerical instability.

28 Monopoly payoffs are those associated with a market with below-median size and below-median market presence (i.e., the constant terms).

29 Being on the boundary of the parameter space is not a problem for calibrated projection; indeed, it is accounted for in the calibration of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0331$ in equations (2.8)–(2.10).

30 For the interaction parameters δ, Kline and Tamer's upper confidence points are lower than ours; for the correlation coefficient r, their lower confidence point is higher than ours.

31 We use $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0347$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0348$ to denote the probability and expectation for the prior and posterior distributions of c to distinguish them from P and E used for the sampling uncertainty for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0349$ .

32 This requirement holds in the canonical partial identification examples discussed in Supplemental Material Appendix F, using the same arguments as in Supplemental Material Appendix F.1, provided $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0408$ .

33 Chernozhukov, Hong, and Tamer (2007) imposed the degeneracy condition on the population identified set.

34 The left endpoint is the optimal value of a program that replaces $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0815$ with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0816$ .

35 One may view (B.1) as a special case of (2.11) with a scalar control variable and a single constraint $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0819$ with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0820$ .

36 See http://qeconomics.org/ojs/index.php/qe/article/downloadSuppFile/431/1411.

37 The specialization in which we compare to BCS also fulfils their assumptions. The assumptions in Pakes et al. (2011) exclude any DGP that has moment equalities.

38 This allows for market-type homogeneous fixed effects but not for player-specific covariates nor for observed heterogeneity in interaction effects.

39 We implement this step using the high-speed solver CVXGEN, available from http://cvxgen.com and described in Mattingley and Boyd (2012).

40 This is only one of several individually necessary stopping criteria. Others include that the current optimum $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0909$ and the expected improvement maximizer $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0910$ (see equation (2.14)) satisfy $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0911$ . See Kaido et al. (2017) for the full list of convergence requirements.

41 Based on some trial runs of BCS-profiling for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0914$ , we estimate that running it with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0915$ throughout would take 3.14 times longer than the computation times reported in Table II. By comparison, calibrated projection takes only 1.75 times longer when implemented with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0916$ instead of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0917$ .

Appendix A: Convergence of the E-A-M Algorithm

In this appendix, we provide details on the algorithm used to solve the outer maximization problem as described in Section 2.3. Below, let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0334$ be a measurable space and ω a generic element of Ω. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0335$ and let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0336$ be a measurable map on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0337$ whose law is specified below. The value of the function c in (2.11) is unknown ex ante. Once the evaluation points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0338$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0339$ , realize, the corresponding values of c, that is, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0340$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0341$ , are known. We may therefore define the information set

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0342$

Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0343$ be the set of feasible evaluation points. Then $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0344$ is measurable with respect to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0345$ and we take a measurable selection $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0346$ from it.

Our algorithm iteratively determines evaluation points based on the expected improvement criterion (Jones, Schonlau, and Welch (1998)). For this, we formally introduce a model that describes the uncertainty associated with the values of c outside the current evaluation points. Specifically, the unknown function c is modeled as a Gaussian process such that31

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0350$

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0351$ controls the length-scales of the process. Two values $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0352$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0353$ are highly correlated when $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0354$ is small relative to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0355$ . Throughout, we assume $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0356$ for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0357$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0358$ . We let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0359$ . Specific suggestions on the forms of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0360$ are given in Appendix D.2.

For a given $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0361$ , the posterior distribution of c given $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0362$ is then another Gaussian process whose mean $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0363$ and variance $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0364$ are given as follows (Santner, Williams, and Notz (2013, Section 4.1.3)):

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0365$

Given this, the expected improvement function can be written as

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0366$

The evaluation points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0367$ are then generated according to the following algorithm (M-step in Section 2.3).

Algorithm A.1.Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0368$ .

Step 1: Initial evaluation points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0369$ are drawn uniformly over Θ independent of c.

Step 2: For $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0370$ , with probability $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0371$ , let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0372$ . With probability $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0373$ , draw $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0374$ uniformly at random from Θ.

Below, we use $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0375$ to denote the law of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0376$ determined by the algorithm above. We also note that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0377$ is a function of the evaluation points and therefore is a random variable whose law is governed by $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0378$ . We let

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0379$ (A.1)

We require that the kernel used to define the correlation functional for the Gaussian process in (2.13) satisfies some basic regularity conditions. For this, let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0380$ denote the Fourier transform of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0381$ . Note also that, for real valued functions f, g, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0382$ means $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0383$ as $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0384$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0385$ .

Assumption A.1. (Kernel Function)(i) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0386$ is continuous and integrable. (ii) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0387$ for some non-increasing function $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0388$ . (iii) As $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0389$ , either $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0390$ for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0391$ or $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0392$ for all $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0393$ . (iv) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0394$ is k-times continuously differentiable for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0395$ , and at the origin K has kth-order Taylor approximation $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0396$ satisfying $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0397$ as $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0398$ , for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0399$ .

Assumption A.1 is essentially the same as Assumptions 1–4 in Bull (2011). When a kernel satisfies the second condition of Assumption A.1(iii), that is, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0400$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0401$ , we say $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0402$ . Assumption A.1 is satisfied by popular kernels such as the Matérn kernel (with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0403$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0404$ ) and the Gaussian kernel ( $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0405$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0406$ ). These kernels are discussed in Appendix D.2.

Finally, we require that the functions $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0407$ are differentiable with continuous Lipschitz gradient,32 that the function c is smooth, and we impose on the constraint set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0409$ (which is a confidence set in our application) a degeneracy condition inspired by Chernozhukov, Hong, and Tamer (2007, Condition C.3).33 Below, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0410$ is the reproducing kernel Hilbert space (RKHS) on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0411$ determined by the kernel used to define the correlation functional in (2.13). The norm on this space is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0412$ ; see Supplemental Material Appendix D.2 for details.

Assumption A.2. (Continuity and Smoothness)(i) For each $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0413$ , the function $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0414$ is differentiable in θ with Lipschitz continuous gradient. (ii) The function $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0415$ satisfies $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0416$ for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0417$ , where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0418$ .

Assumption A.3. (Degeneracy)There exist constants $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0419$ such that for all $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0420$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0421$

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0422$ .

Assumptions A.2 and A.3 jointly imply a linear minorant property on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0423$ :

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0424$ (A.2)

To see this, define $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0425$ , so that the l.h.s. of the above inequality is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0426$ . By Assumptions A.2–A.3 and compactness of Θ, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0427$ is differentiable with Lipschitz continuous gradient. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0428$ denote its gradient and let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0429$ denote the corresponding Lipschitz constant. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0430$ , where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0431$ are from Assumption A.3. We will show that, for constants $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0432$ to be determined, (i) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0433$ and (ii) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0434$ , so that the minimum between these bounds applies to any θ.

To see (i), write $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0435$ , where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0436$ is the projection of θ onto $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0437$ . Fix a sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0438$ . By Assumption A.3, there exists a corresponding sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0439$ with (for m large enough) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0440$ but also $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0441$ . Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0442$ be the sequence of corresponding directions. Then, for any accumulation point t of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0443$ and any active constraint j (i.e., $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0444$ ; such j necessarily exists due to continuity of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0445$ ), one has $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0446$ . We note for future reference that this finding implies $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0447$ . It also implies that the Mangasarian–Fromowitz constraint qualification holds at $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0448$ , hence r (being in the normal cone of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0449$ at $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0450$ ) is in the positive span of the active constraints' gradients. Thus, j can be chosen such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0451$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0452$ . For any such j, write

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0453$

In the inequality steps, we successively substituted bounds stated before the display, evaluated the integral in k, and (in the last step) used $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0454$ . This establishes (i), where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0455$ . Next, by continuity of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0456$ and compactness of the constraint set, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0457$ is well-defined and strictly positive. This establishes (ii) with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0458$ .

A.1 Proof of Theorem 3.2

For each $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0459$ , let

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0460$

Proof of Theorem 3.2.First, note that

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0461$

where the last equality follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0462$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0463$ -a.s. Hence, it suffices to show

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0464$

Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0465$ be a measurable space. Below, we let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0466$ . Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0467$ . Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0468$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0469$ be the event that at least $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0470$ of the points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0471$ are drawn independently from a uniform distribution on Θ. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0472$ be the event that one of the points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0473$ is chosen by maximizing the expected improvement. For each L, define the mesh norm:

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0474$

For a given $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0475$ , let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0476$ be the event that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0477$ . We then let

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0478$ (A.3)

For each $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0479$ , let

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0480$ (A.4)

This is a (random) index that is associated with the first maximizer of the expected improvement between L and 2L.

Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0481$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0482$ and note that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0483$ is a positive sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0484$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0485$ . We further define the following events:

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0486$

Note that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0487$ can be partitioned into $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0488$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0489$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0490$ . By Lemmas A.2, A.3, and A.4, there exists a constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0491$ such that, respectively,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0492$ (A.5)

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0493$ (A.6)

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0494$ (A.7)

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0495$ . Note that

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0496$ (A.8)

Hence, by taking M sufficiently large so that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0497$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0498$ (A.9)

where the inequality follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0499$ by $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0500$ . By (A.5)–(A.9),

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0501$

for some constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0502$ for all L sufficiently large. Since $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0503$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0504$ is non-decreasing in L, and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0505$ is non-increasing in L, we have

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0506$ (A.10)

where the last equality follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0507$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0508$ .

Now consider the case $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0509$ . By (A.3),

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0510$ (A.11)

Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0511$ be a Bernoulli random variable such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0512$ if $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0513$ is randomly drawn from a uniform distribution. Then, by the Chernoff bounds (see, e.g., Boucheron, Lugosi, and Massart (2013, p.48)),

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0514$ (A.12)

Further, by the definition of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0515$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0516$ (A.13)

and finally, by taking $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0517$ large upon defining the event $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0518$ and applying Lemma 12 in Bull (2011), one has

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0519$ (A.14)

for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0520$ . Combining (A.11)–(A.14), for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0521$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0522$ (A.15)

Finally, noting that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0523$ is bounded by some constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0524$ due to the boundedness of Θ, we have

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0525$ (A.16)

where the second equality follows from (A.10) and (A.15). Since $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0526$ can be made aribitrarily large, one may let the second term on the right-hand side of (A.16) converge to 0 faster than the first term. Therefore,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0527$

which establishes the claim of the theorem for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0528$ . When the second condition of Assumption A.1(iii) holds (i.e., $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0529$ ), the argument above holds for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0530$ . □

A.2 Auxiliary Lemmas for the Proof of Theorem 3.2

Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0531$ be defined as in (A.3). The following lemma shows that on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0532$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0533$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0534$ are close to each other, where we recall that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0535$ is the expected improvement maximizer (but does not belong to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0536$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0537$ ).

Lemma A.1.Suppose Assumptions A.1, A.2, and A.3 hold. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0538$ be a positive sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0539$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0540$ . Then, there exists a constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0541$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0542$ for all L sufficiently large.

Proof.We show the result by contradiction. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0543$ be a sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0544$ for all L. First, assume that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0545$ , there is a subsequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0546$ for all L. This occurs if it contains a further subsequence along which, for all L, (i) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0547$ or (ii) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0548$ .

Case (i): $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0549$ for all L for some subsequence.

To simplify notation, we select a further subsequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0550$ of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0551$ such that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0552$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0553$ . This then induces a sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0554$ of expected improvement maximizers such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0555$ for all ℓ, where each ℓ equals $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0556$ for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0557$ . In what follows, we therefore omit the arguments of ℓ, but this sequence's dependence on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0558$ should be implicitly understood.

Recall that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0559$ defined in equation (A.1) is a compact set and that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0560$ denotes the projection of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0561$ on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0562$ . Then

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0563$

where the first inequality follows from the Cauchy–Schwarz inequality, and the second inequality follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0564$ due to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0565$ . Therefore, by equation (A.2), for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0566$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0567$

for all ℓ sufficiently large, where the last inequality follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0568$ . Take M such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0569$ . Then $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0570$ for all ℓ sufficiently large, contradicting $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0571$ .

Case (ii): Similarly to Case (i), we work with a further subsequence along which $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0572$ for all ℓ. Recall that along this subsequence, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0573$ because $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0574$ . We will construct $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0575$ s.t. $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0576$ , contradicting the definition of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0577$ .

By Assumption A.3,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0578$ (A.17)

for all ℓ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0579$ . By the Cauchy–Schwarz inequality, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0580$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0581$ (A.18)

Therefore, minimizing both sides with respect to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0582$ and noting that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0583$ , we obtain

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0584$ (A.19)

Further, noting that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0585$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0586$ (A.20)

By (A.17)–(A.20),

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0587$

for all ℓ sufficiently large. Therefore, for all ℓ sufficiently large, one has

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0588$

implying existence of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0589$ s.t.

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0590$ (A.21)

By Lemma A.6, for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0591$ , one can write

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0592$ (A.22)

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0593$ (A.23)

where the last inequality uses $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0594$ . Lemma A.6 also yields

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0595$

for all ℓ sufficiently large, where the second inequality follows from (A.21). Next, by Assumption A.3,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0596$ (A.24)

for all ℓ sufficiently large. Note that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0597$ by (A.32) and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0598$ by assumption. Hence, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0599$ . This in turn implies

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0600$ (A.25)

for all ℓ sufficiently large. Equations (A.23) and (A.25) jointly establish the desired contradiction. □

The next lemma shows that, on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0601$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0602$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0603$ are close to each other, where we recall that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0604$ is the optimum value among the available feasible points (it belongs to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0605$ ).

Lemma A.2.Suppose Assumptions A.1, A.2, and A.3 hold. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0606$ be a positive sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0607$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0608$ . Then, there exists a constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0609$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0610$ for all L sufficiently large.

Proof.We show below $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0611$ uniformly over $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0612$ for some decreasing sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0613$ satisfying the assumptions of the lemma. The claim then follows by relabeling $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0614$ .

Suppose by contradiction that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0615$ , there is a subsequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0616$ along which $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0617$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0618$ for all L sufficiently large. To simplify notation, we select a subsequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0619$ of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0620$ such that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0621$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0622$ . This then induces a sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0623$ for all ℓ, where each ℓ equals $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0624$ for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0625$ . Similarly to the proof of Lemma A.1, we omit the arguments of ℓ below and construct a sequence of points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0626$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0627$ .

Arguing as in (A.17)–(A.20), one may find a sequence of points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0628$ such that

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0629$ (A.26)

for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0630$ and for all ℓ sufficiently large. Furthermore, by Lemma A.1,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0631$ (A.27)

for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0632$ and for all ℓ sufficiently large. Arguing as in (A.23),

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0633$ (A.28)

where the last inequality follows from the triangle inequality, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0634$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0635$ . Similarly, by Lemma A.6,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0636$ (A.29)

where the last inequality holds for all ℓ sufficiently large because $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0637$ and one can find a subsequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0638$ so that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0639$ for all ℓ sufficiently large.

Subtracting (A.28) from (A.29) yields

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0640$

where the last inequality follows from (A.26) and (A.27). Note that there is a constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0641$ s.t.

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0642$

due to $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0643$ by (A.24), (A.32), and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0644$ . Therefore, for all ℓ sufficiently large,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0645$

One may take M large enough so that, for some positive constant γ, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0646$ for all ℓ sufficiently large, which implies $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0647$ for all ℓ sufficiently large. However, this contradicts the assumption that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0648$ is the expected improvement maximizer. □

The next lemma shows that, on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0649$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0650$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0651$ are close to each other.

Lemma A.3.Suppose Assumptions A.1, A.2, and A.3 hold. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0652$ be a positive sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0653$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0654$ . Then, there exists a constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0655$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0656$ for all L sufficiently large.

Proof.Note that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0657$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0658$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0659$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0660$ satisfies $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0661$ , hence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0662$ , which in turn implies

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0663$

Therefore, it suffices to show the existence of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0664$ that ensures $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0665$ uniformly over $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0666$ for all L. Suppose by contradiction that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0667$ , there is a subsequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0668$ along which $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0669$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0670$ for all L sufficiently large. Again, we select a subsequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0671$ of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0672$ such that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0673$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0674$ . This then induces a sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0675$ of expected improvement maximizers such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0676$ for all ℓ, where each ℓ equals $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0677$ for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0678$ .

Similarly to the proof of Lemma A.1, we omit the arguments of ℓ below and prove the claim by contradiction. Below, we assume that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0679$ , there is a further subsequence along which $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0680$ for all ℓ sufficiently large.

Now let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0681$ with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0682$ specified below. By Assumption A.3, for all $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0683$ , it holds that

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0684$

for all ℓ sufficiently large. Noting that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0685$ and taking $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0686$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0687$ , it follows that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0688$ for all ℓ sufficiently large.

Arguing as in (A.17)–(A.20), one may find a sequence of points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0689$ such that

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0690$

This and the assumption that one can find a subsequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0691$ for all ℓ imply

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0692$

for all ℓ sufficiently large. Now mimic the argument along (A.23)–(A.25) to deduce

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0693$

for all ℓ sufficiently large. However, this contradicts the assumption that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0694$ is the expected improvement maximizer. □

The next lemma shows that, on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0695$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0696$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0697$ are close to each other.

Lemma A.4.Suppose Assumptions A.1, A.2, and A.3 hold. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0698$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0699$ . Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0700$ . Then, there exists a constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0701$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0702$ for all L sufficiently large.

Proof.Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0703$ be a sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0704$ for all L. Since $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0705$ , there is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0706$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0707$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0708$ is chosen by maximizing the expected improvement.

For later use, we note that, for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0709$ , it can be shown that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0710$ , which in turn implies that there exists a constant $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0711$ such that

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0712$ (A.30)

for all L sufficiently large.

For $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0713$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0714$ , let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0715$ . Recall that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0716$ is an optimal solution to (2.11). Then, for all L sufficiently large,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0717$

where (1) follows by construction, (2) follows from Lemma A.6(ii), (3) follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0718$ being the maximizer of the expected improvement, (4) follows from Lemma A.5, (5) follows from (A.30) with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0719$ , (6) follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0720$ , (7) follows from Lemma A.5, (8) follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0721$ being the expected improvement maximizer, (9) follows from Lemma A.5, and (10) follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0722$ due to the definition of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0723$ . This establishes the claim. □

For evaluation points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0724$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0725$ , the following lemma is an analog of Lemma 8 in Bull (2011), which links the expected improvement to the actual improvement achieved by a new evaluation point θ.

Lemma A.5.Suppose $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0726$ is bounded and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0727$ . Suppose the evaluation points $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0728$ are drawn by Algorithm A.1 and let Assumptions A.1 and A.2(ii) hold. For $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0729$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0730$ , let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0731$ . Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0732$ be a positive sequence such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0733$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0734$ . Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0735$ . Then, for any sequence $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0736$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0737$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0738$

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0739$ .

Proof of Lemma A.5.If $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0740$ , then the posterior variance of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0741$ is zero. Hence, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0742$ , and the claim of the lemma holds.

Suppose $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0743$ . We first show the upper bound. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0744$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0745$ . By Lemma 6 in Bull (2011), we have $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0746$ . Starting from Lemma A.6(i), we can write

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0747$ (A.31)

where the last inequality used $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0748$ for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0749$ . Note that one may write

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0750$

To be clear about the hyperparameter value at which we evaluate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0751$ , we will write $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0752$ . By the hypothesis that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0753$ and Lemma 4 in Bull (2011), we have

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0754$

Note that there are $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0755$ uniformly sampled points, and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0756$ is associated with index $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0757$ . As shown in the proof of Theorem 5 in Bull (2011), this ensures that

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0758$ (A.32)

Below, we simply write this result $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0759$ . This, together with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0760$ and the fact that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0761$ is decreasing, yields

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0762$ (A.33)

for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0763$ and where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0764$ . Note that, by the triangle inequality,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0765$ (A.34)

and

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0766$ (A.35)

for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0767$ , where ϕ is the density of the standard normal distribution, and the inequality follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0768$ . The second term on the right-hand side of (A.34) can be bounded as

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0769$ (A.36)

by the mean value theorem, where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0770$ is a point between $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0771$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0772$ . The claim of the lemma then follows from (A.31), (A.33)–(A.36), and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0773$ being bounded because Θ is bounded.

Similarly, for the lower bound, we have

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0774$ (A.37)

Note that we may write

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0775$ (A.38)

by $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0776$ . Arguing as in (A.37) and noting that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0777$ is increasing, one has

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0778$ (A.39)

for some $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0779$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0780$ . By the triangle inequality,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0781$ (A.40)

where arguing as in (A.35),

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0782$ (A.41)

The second term on the right-hand side of (A.40) can be bounded as

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0783$ (A.42)

by the mean value theorem, where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0784$ is a point between $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0785$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0786$ . The claim of the lemma then follows from (A.37)–(A.42), and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0787$ being bounded because Θ is bounded. □

Lemma A.6.Suppose $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0788$ is bounded and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0789$ and let Assumptions A.1 and A.2(ii) hold. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0790$ . For $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0791$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0792$ , let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0793$ . Then, (i) for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0794$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0795$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0796$

Further, (ii) for any $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0797$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0798$ such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0799$ ,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0800$ (A.43)

Proof.(i) Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0801$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0802$ . By Lemma 6 in Bull (2011), we have $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0803$ . Since $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0804$ is decreasing, we have

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0805$

Similarly,

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0806$

(ii) For the lower bound in (A.43), we have

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0807$

where the last inequality follows from $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0808$ and the fact that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0809$ is decreasing. □

Appendix B: Applying the E-A-M Algorithm to Profiling

We describe below how to use the E-A-M procedure to compute BCS-profiling based confidence intervals. Let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0810$ denote the parameter space for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0811$ . The (one-dimensional) profiling confidence region is

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0812$

where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0813$ is the critical value proposed in Bugni, Canay, and Shi (2017) and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0814$ is any test statistic that they allowed for. The E-A-M algorithm can be used to compute the endpoints of this set so that the researcher may report an interval.

For ease of exposition, we discuss below the computation of the right endpoint of the confidence interval, which is the optimal value of the following problem:34

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0817$ (B.1)

We then take $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0818$ as a black box function and apply the E-A-M algorithm.35 We include the profiled statistic in the black box function because it involves a nonlinear optimization problem, which is also relatively expensive. The modified procedure is as follows.

Initialization: Draw randomly (uniformly) over $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0821$ a set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0822$ of initial evaluation points and evaluate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0823$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0824$ . Initialize $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0825$ .
E-step: Evaluate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0826$ and record the tentative optimal value
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0827$
A-step: (Approximation):
Approximate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0828$ by a flexible auxiliary model. We again use the kriging approximation, which for a mean-zero Gaussian process $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0829$ indexed by τ and with constant variance $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0830$ specifies
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0831$
where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0832$ is a kernel with a scalar parameter $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0833$ . The parameters are estimated in the same way as before.

The (best linear) predictor of c and its derivative are then given by
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0834$
where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0835$ is a vector whose ℓth component is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0836$ as given above with estimated parameters, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0837$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0838$ is an L-by-L matrix whose $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0839$ entry is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0840$ with estimated parameters. The amount of uncertainty left in $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0841$ is captured by the following variance:
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0842$
M-step: (Maximization): With probability $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0843$ , maximize the expected improvement function $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0844$ to obtain the next evaluation point, with
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0845$
With probability $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0846$ , draw $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0847$ randomly from a uniform distribution over $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0848$ .

As before, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0849$ is reported as endpoint of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0850$ upon convergence. In order for Theorem 3.2 to apply to this algorithm, the profiled statistic $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0851$ and the critical value $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0852$ need to be sufficiently smooth. We leave derivation of sufficient conditions for this to be the case to future research.

Appendix C: An Entry Game Model and Some Monte Carlo Simulations

We evaluate the statistical and numerical performance of calibrated projection and E-A-M in comparison with BCS-profiling in a Monte Carlo experiment run on a server with two Intel Xeon X5680 processors rated at 3.33 GHz with six cores each and with a memory capacity of 24 Gb rated at 1333 MHz. The experiment simulates a two-player entry game in the Monte Carlo exercise of BCS, using their code to implement their method.36

C.1 The General Entry Game Model

We consider a two-player entry game based on Ciliberto and Tamer (2009):

	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0853$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0854$
Y₁ = 0	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0855$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0856$
Y₁ = 1	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0857$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0858$

Here, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0859$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0860$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0861$ denote player ℓ's binary action, observed characteristics, and unobserved characteristics. The strategic interaction effects $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0862$ measure the impact of the opponent's entry into the market. We let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0863$ . We generate $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0864$ as an i.i.d. random vector taking values in a finite set whose distribution $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0865$ is known. We let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0866$ be independent of Z and such that $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0867$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0868$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0869$ . We let $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0870$ . For a given set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0871$ , we define $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0872$ . We choose $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0873$ so that the c.d.f. of u is continuous, differentiable, and has a bounded p.d.f. The outcome $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0874$ results from pure strategy Nash equilibrium play. For some value of Z and u, the model predicts monopoly outcomes $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0875$ and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0876$ as multiple equilibria. When this occurs, we select outcome $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0877$ by independent Bernoulli trials with parameter $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0878$ . This gives rise to the following restrictions:

$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0879$ (C.1)

We show in Supplemental Material Appendix F that this model satisfies Assumptions D.1 and E.3-2.37 Throughout, we analytically compute the moments' gradients and studentize them using sample analogs of their standard deviations.

C.2 A Comparison to BCS-Profiling

BCS specialized this model as follows. First, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0880$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0881$ are independently uniformly distributed on $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0882$ and the researcher knows $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0883$ . Equality (C.1) disappears because $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0884$ is never an equilibrium. Next, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0885$ , where $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0886$ are observed market-type indicators, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0887$ for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0888$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0889$ .38 The parameter vector is $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0890$ with parameter space $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0891$ . This leaves four moment equalities and eight moment inequalities (so $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0892$ ); compare equation (5.1) in BCS. We set $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0893$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0894$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0895$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0896$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0897$ . The implied true bounds on parameters are $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0898$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0899$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0900$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0901$ , and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0902$ .

The BCS-profiling confidence interval $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0903$ inverts a test of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0904$ over a grid for τ. We do not, in practice, exhaust the grid but search inward from the extreme points of Θ in directions ±p. At each τ that is visited, we use BCS code to compute a profiled test statistic and the corresponding critical value $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0905$ . The latter is a quantile of the minimum of two distinct bootstrap approximations, each of which solves a nonlinear program for each bootstrap draw. Computational cost quickly increases with grid resolution, bootstrap size, and the number of starting points used to solve the nonlinear programs.

Calibrated projection computes $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0906$ by solving a series of linear programs for each bootstrap draw.39 It computes the extreme points of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0907$ by solving the nonlinear program (2.4) twice, a task that is much accelerated by the E-A-M algorithm. Projection of Andrews and Soares (2010) operates very similarly but computes its critical value $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0908$ through bootstrap simulation without any optimization.

We align grid resolution in BCS-profiling with the E-A-M algorithm's convergence threshold of 0.005.40 We run all methods with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0912$ bootstrap draws, and calibrated and “uncalibrated” (i.e., based on Andrews and Soares (2010)) projection also with $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0913$ .41 Some other choices differ: BCS-profiling is implemented with their own choice to multi-start the nonlinear programs at three oracle starting points, that is, using knowledge of the true DGP; our implementation of both other methods multi-starts the nonlinear programs from 30 data-dependent random points (see Kaido et al. (2017) for details).

Table II displays results for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0918$ and for 300 Monte Carlo repetitions of all three methods. All confidence intervals are conservative, reflecting the effect of GMS. As expected, uncalibrated projection is most conservative, with coverage of essentially 1. Also, BCS-profiling is more conservative than calibrated projection. The most striking contrast is in computational effort. Here, uncalibrated projection is fastest—indeed, in contrast to received wisdom, this procedure is computationally somewhat easy. This is due to our use of the E-A-M algorithm and therefore part of this paper's contribution. Next, our implementation of calibrated projection beats BCS-profiling with gridding by a factor of about 70. This can be disentangled into the gain from using calibrated projection, with its advantage of bootstrapping linear programs, and the gain afforded by the E-A-M algorithm. It turns out that implementing BCS-profiling with the adapted E-A-M algorithm (see Appendix B) improves computation by a factor of about 4; switching to calibrated projection leads to a further improvement by a factor of about 17. Finally, Table III extends the analysis to all components of θ and to 1000 Monte Carlo repetitions. We were unable to compute this for BCS-profiling.

Table II. Results for Set 1 With $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0919$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0920$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0921$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0922$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0923$ a

		Median CI
		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0924$		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0925$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0926$
Implementation	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0927$	Grid	E-A-M	E-A-M	E-A-M
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0928$	0.95	[0.330,0.495]	[0.331,0.495]	[0.336,0.482]	[0.290,0.558]
	0.90	[0.340,0.485]	[0.340,0.485]	[0.343,0.474]	[0.298,0.543]
	0.85	[0.345,0.475]	[0.346,0.479]	[0.348,0.466]	[0.303,0.537]
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0929$	0.95	[0.515,0.655]	[0.514,0.655]	[0.519,0.650]	[0.461,0.682]
	0.90	[0.525,0.647]	[0.525,0.648]	[0.531,0.643]	[0.473,0.675]
	0.85	[0.530,0.640]	[0.531,0.642]	[0.539,0.639]	[0.481,0.671]

		Coverage
		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0930$				$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0931$		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0932$
		Grid		E-A-M		E-A-M		E-A-M
Implementation	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0933$	Lower	Upper	Lower	Upper	Lower	Upper	Lower	Upper
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0934$	0.95	0.997	0.990	1.000	0.993	0.993	0.977	1.000	1.000
	0.90	0.990	0.980	0.993	0.977	0.987	0.960	1.000	1.000
	0.85	0.970	0.970	0.973	0.960	0.957	0.930	1.000	1.000
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0935$	0.95	0.987	0.993	0.990	0.993	0.973	0.987	1.000	1.000
	0.90	0.977	0.973	0.980	0.977	0.940	0.953	1.000	1.000
	0.85	0.967	0.957	0.963	0.960	0.943	0.927	1.000	1.000

		Average Time
		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0936$		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0937$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0938$
Implementation	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0939$	Grid	E-A-M	E-A-M	E-A-M
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0940$	0.95	1858.42	425.49	26.40	18.22
	0.90	1873.23	424.11	25.71	18.55
	0.85	1907.84	444.45	25.67	18.18
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0941$	0.95	1753.54	461.30	26.61	22.49
	0.90	1782.91	472.55	25.79	21.38
	0.85	1809.65	458.58	25.00	21.00

a (1) Projections of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0942$ are: $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0943$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0944$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0945$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0946$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0947$ . (2) “Upper” coverage is for $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0948$ , and similarly for “Lower”. (3) “Average time” is computation time in seconds averaged over MC replications. (4) $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0949$ results from BCS-profiling, $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0950$ is calibrated projection, and $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0951$ is uncalibrated projection. (5) “Implementation” refers to the method used to compute the extreme points of the confidence interval.

Table III. Results for Set 1 With $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0952$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0953$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0954$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0955$ , $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0956$ a

		Median CI		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0957$ Coverage		$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0958$ Coverage		Average Time
	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0959$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0960$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0961$	Lower	Upper	Lower	Upper	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0962$	$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0963$
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0964$	0.95	[0.333,0.478]	[0.288,0.555]	0.988	0.982	1	1	42.41	22.23
	0.90	[0.341,0.470]	[0.296,0.542]	0.976	0.957	1	1	41.56	22.11
	0.85	[0.346,0.464]	[0.302,0.534]	0.957	0.937	1	1	40.47	19.79
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0965$	0.95	[0.525,0.653]	[0.466,0.683]	0.969	0.983	1	1	42.11	24.39
	0.90	[0.538,0.646]	[0.478,0.677]	0.947	0.960	1	1	40.15	28.13
	0.85	[0.545,0.642]	[0.485,0.672]	0.925	0.941	1	1	41.38	26.44
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0966$	0.95	[0.054,0.142]	[0.020,0.180]	0.956	0.958	1	1	40.31	22.53
	0.90	[0.060,0.136]	[0.028,0.172]	0.911	0.911	1	1	36.80	24.15
	0.85	[0.064,0.132]	[0.032,0.167]	0.861	0.860	0.999	0.999	39.10	21.81
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0967$	0.95	[0.156,0.245]	[0.121,0.281]	0.952	0.952	1	1	39.23	24.66
	0.90	[0.162,0.238]	[0.128,0.273]	0.914	0.910	0.998	0.998	41.53	21.66
	0.85	[0.165,0.234]	[0.133,0.268]	0.876	0.872	0.996	0.996	39.44	22.83
$urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0968$	0.95	[0.257,0.344]	[0.222,0.379]	0.946	0.946	1	1	41.45	22.91
	0.90	[0.263,0.338]	[0.230,0.371]	0.910	0.909	0.997	0.999	42.09	22.83
	0.85	[0.267,0.334]	[0.235,0.366]	0.882	0.870	0.994	0.993	42.19	23.69

a Same DGP and conventions as in Table II.

In sum, the Monte Carlo experiment on the same DGP used in BCS yields three interesting findings: (i) the E-A-M algorithm accelerates projection of the Andrews and Soares (2010) confidence region to the point that this method becomes reasonably cheap; (ii) it also substantially accelerates computation of profiling intervals, and (iii) for this DGP, calibrated projection combined with the E-A-M algorithm has the most accurate size control while also being computationally attractive.

Supporting Information

References

Andrews, D. W. K., and X. Shi (2013): “Inference Based on Conditional Moment Inequalities,” Econometrica, 81, 609–666.
10.3982/ECTA9370
Web of Science® Google Scholar
Andrews, D. W. K., and G. Soares (2010): “Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection,” Econometrica, 78, 119–157.
10.3982/ECTA7502
Web of Science® Google Scholar
Beresteanu, A., and F. Molinari (2008): “Asymptotic Properties for a Class of Partially Identified Models,” Econometrica, 76, 763–814.
10.1111/j.1468-0262.2008.00859.x
Web of Science® Google Scholar
Bontemps, C., T. Magnac, and E. Maurin (2012): “Set Identified Linear Models,” Econometrica, 80, 1129–1155.
10.3982/ECTA7637
Web of Science® Google Scholar
Boucheron, S., G. Lugosi, and P. Massart (2013): Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press.
10.1093/acprof:oso/9780199535255.001.0001
Google Scholar
Bugni, F. A. (2010): “Bootstrap Inference in Partially Identified Models Defined by Moment Inequalities: Coverage of the Identified Set,” Econometrica, 78 (2), 735–753.
10.3982/ECTA8056
Web of Science® Google Scholar
Bugni, F. A., I. A. Canay, and X. Shi (2017): “Inference for Subvectors and Other Functions of Partially Identified Parameters in Moment Inequality Models,” Quantitative Economics, 8 (1), 1–38.
10.3982/QE490
Web of Science® Google Scholar
Bull, A. D. (2011): “Convergence Rates of Efficient Global Optimization Algorithms,” Journal of Machine Learning Research, 12 (Oct), 2879–2904.
Web of Science® Google Scholar
Canay, I. A. (2010): “EL Inference for Partially Identified Models: Large Deviations Optimality and Bootstrap Validity,” Journal of Econometrics, 156 (2), 408–425.
10.1016/j.jeconom.2009.11.009
Web of Science® Google Scholar
Chen, X., T. M. Christensen, and E. Tamer (2018): “Monte Carlo Confidence Sets for Identified Sets,” Econometrica, 86 (6), 1965–2018.
10.3982/ECTA14525
Web of Science® Google Scholar
Chernozhukov, V., H. Hong, and E. Tamer (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models,” Econometrica, 75, 1243–1284.
10.1111/j.1468-0262.2007.00794.x
Web of Science® Google Scholar
Ciliberto, F., and E. Tamer (2009): “Market Structure and Multiple Equilibria in Airline Markets,” Econometrica, 77, 1791–1828.
10.3982/ECTA5368
Web of Science® Google Scholar
Dickstein, M. J., and E. Morales (2018): “What Do Exporters Know?” The Quarterly Journal of Economics, 133 (4), 1753–1801.
10.1093/qje/qjy015
Web of Science® Google Scholar
Freyberger, J., and B. Reeves (2017): “ Inference Under Shape Restrictions,” Working Paper.
Google Scholar
Gafarov, B., M. Meier, and J. L. Montiel-Olea (2016): “ Projection Inference for Set-Identified SVARs,” Working Paper.
Google Scholar
Grieco, P. L. E. (2014): “Discrete Games With Flexible Information Structures: An Application to Local Grocery Markets,” The RAND Journal of Economics, 45 (2), 303–340.
10.1111/1756-2171.12052
Web of Science® Google Scholar
Jones, D. R. (2001): “A Taxonomy of Global Optimization Methods Based on Response Surfaces,” Journal of Global Optimization, 21 (4), 345–383.
10.1023/A:1012771025575
Web of Science® Google Scholar
Jones, D. R., M. Schonlau, and W. J. Welch (1998): “Efficient Global Optimization of Expensive Black-Box Functions,” Journal of Global Optimization, 13 (4), 455–492.
10.1023/A:1008306431147
Web of Science® Google Scholar
Kaido, H. (2016): “A Dual Approach to Inference for Partially Identified Econometric Models,” Journal of Econometrics, 192 (1), 269–290.
10.1016/j.jeconom.2015.12.017
Web of Science® Google Scholar
Kaido, H., F. Molinari, and J. Stoye (2017): “ Confidence Intervals for Projections of Partially Identified Parameters,” CeMMAP Working Paper CWP 49/17, Available at https://www.cemmap.ac.uk/publication/id/10139.
Google Scholar
Kaido, H., F. Molinari, and J. Stoye (2019): “ Supplement to ‘Confidence Intervals for Projections of Partially Identified Parameters’,” Econometrica Supplemental Material, 87, https://doi.org/10.3982/ECTA14075.
Google Scholar
Kaido, H., F. Molinari, J. Stoye, and M. Thirkettle (2017): “ Calibrated Projection in MATLAB,” Technical Report, Available at https://molinari.economics.cornell.edu/docs/KMST_Manual.pdf.
Google Scholar
Kline, B., and E. Tamer (2016): “Bayesian Inference in a Class of Partially Identified Models,” Quantitative Economics, 7 (2), 329–366.
10.3982/QE399
Web of Science® Google Scholar
Magnac, T., and E. Maurin (2008): “Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data,” Review of Economic Studies, 75, 835–864.
10.1111/j.1467-937X.2008.00490.x
Web of Science® Google Scholar
Mattingley, J., and S. Boyd (2012): “CVXGEN: A Code Generator for Embedded Convex Optimization,” Optimization and Engineering, 13 (1), 1–27.
10.1007/s11081-011-9176-9
Web of Science® Google Scholar
Pakes, A., J. Porter, K. Ho, and J. Ishii (2011): “ Moment Inequalities and Their Application,” Discussion Paper, Harvard University.
Google Scholar
Pakes, A., J. Porter, K. Ho, and J. Ishii (2015): “Moment Inequalities and Their Application,” Econometrica, 83, 315–334.
10.3982/ECTA6865
Web of Science® Google Scholar
Rockafellar, R. T., and R. J.-B. Wets (2005): Variational Analysis ( Second Ed.). Berlin: Springer-Verlag.
Google Scholar
Romano, J. P., and A. M. Shaikh (2008): “Inference for Identifiable Parameters in Partially Identified Econometric Models,” Journal of Statistical Planning and Inference, 138, 2786–2807.
10.1016/j.jspi.2008.03.015
Web of Science® Google Scholar
Santner, T. J., B. J. Williams, and W. I. Notz (2013): The Design and Analysis of Computer Experiments. Springer Science & Business Media.
Google Scholar
Schonlau, M., W. J. Welch, and D. R. Jones (1998): “ Global versus Local Search in Constrained Optimization of Computer Models,” in New Developments and Applications in Experimental Design, Lecture Notes-Monograph Series, Vol. 34, 11–25.
10.1214/lnms/1215456182
Google Scholar
Stoye, J. (2009): “More on Confidence Intervals for Partially Identified Parameters,” Econometrica, 77, 1299–1315.
10.3982/ECTA7347
Web of Science® Google Scholar
Sutton, R. S., and A. G. Barto (1998): Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press.
Google Scholar

Citing Literature

Volume87, Issue4

July 2019

Pages 1397-1432

Confidence Intervals for Projections of Partially Identified Parameters

Abstract

1 Introduction

2 Detailed Explanation of the Method

2.1 Setup and Definition of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0031$

2.2 Calibration of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0055$

2.3 Computation of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0120$ and of Similar Confidence Intervals

2.4 Choice of Tuning Parameters

3 Theoretical Results

3.1 Asymptotic Validity of Inference

3.2 Convergence of the E-A-M Algorithm

4 Empirical Illustration: Estimating a Binary Game

5 Conclusion

Appendix A: Convergence of the E-A-M Algorithm

A.1 Proof of Theorem 3.2

A.2 Auxiliary Lemmas for the Proof of Theorem 3.2

Appendix B: Applying the E-A-M Algorithm to Profiling

Appendix C: An Entry Game Model and Some Monte Carlo Simulations

C.1 The General Entry Game Model

C.2 A Comparison to BCS-Profiling

Supporting Information

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Confidence Intervals for Projections of Partially Identified Parameters

Abstract

1 Introduction

2 Detailed Explanation of the Method

2.1 Setup and Definition of

2.2 Calibration of

2.3 Computation of and of Similar Confidence Intervals

2.4 Choice of Tuning Parameters

3 Theoretical Results

3.1 Asymptotic Validity of Inference

3.2 Convergence of the E-A-M Algorithm

4 Empirical Illustration: Estimating a Binary Game

5 Conclusion

Appendix A: Convergence of the E-A-M Algorithm

A.1 Proof of Theorem 3.2

A.2 Auxiliary Lemmas for the Proof of Theorem 3.2

Appendix B: Applying the E-A-M Algorithm to Profiling

Appendix C: An Entry Game Model and Some Monte Carlo Simulations

C.1 The General Entry Game Model

C.2 A Comparison to BCS-Profiling

Supporting Information

References

Citing Literature

References

Related

Information

2.1 Setup and Definition of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0031$

2.2 Calibration of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0055$

2.3 Computation of $urn:x-wiley:00129682:media:ecta200040:ecta200040-math-0120$ and of Similar Confidence Intervals