On bootstrap validity for specification tests with weak instruments
Summary
We study the asymptotic validity of the bootstrap for Durbin–Wu–Hausman tests of exogeneity, with or without identification. We provide an analysis of the limiting distributions of the proposed bootstrap statistics under both the null hypothesis of exogeneity (size) and the alternative hypothesis of endogeneity (power). We show that when identification is strong, the bootstrap provides a high-order approximation of the null limiting distributions of the statistics and is consistent under the alternative hypothesis if the endogeneity parameter is fixed. However, the bootstrap only provides a first-order approximation when instruments are weak. Moreover, we provide the necessary and sufficient condition under which the proposed bootstrap tests exhibit power under (fixed) endogeneity and weak instruments. The latter condition may still hold over a wide range of cases as long as at least one instrument is relevant. Nevertheless, all bootstrap tests have low power when all instruments are irrelevant, a case of little interest in empirical work. We present a Monte Carlo experiment that confirms our theoretical findings.
1. INTRODUCTION
Exogeneity tests of the type proposed by Durbin (1954), Wu (1973) and Hausman (1978), henceforth DWH tests, are widely used in applied work to determine whether the ordinary least-squares (OLS) or the instrumental variable (IV) method is appropriate. There is now a considerable body of research on this topic, and most studies often impose identifying assumptions on model coefficients, thus leaving out issues associated with weak instruments. It is well known that IV estimators can be imprecise and that inference procedures (such as tests and confidence sets) can be highly unreliable in the presence of weak instruments. In recent years, concerns have been raised about the reliability of DWH procedures in the presence of weak instruments because they mainly rely on IV estimators; see Staiger and Stock (1997), Guggenberger (2010), Hahn et al. (2010), Doko Tchatoka and Dufour (2011) and Kiviet and Niemczyk (2007), among others.
Staiger and Stock (1997) show that the limiting distributions of Hausman (1978) type statistics depend on the concentration matrix, which usually determines the strength of the identification. Doko Tchatoka and Dufour (2011) show that all DWH exogeneity statistics, including Wu (1973) T3 and alternative Hausman (1978) type statistics, are identification-robust even in a finite sample with or without Gaussian errors. However, applying the usual χ2 critical values to the T3 and Hausman (1978) statistic can lead to overly conservative procedures when identification is weak. Size correction can be achieved by resorting to the method of exact Monte Carlo tests, such as in Dufour (2006). However, the exact Monte Carlo method requires that the conditional distribution of the structural disturbance, given the instruments, be specified. In practice, researchers usually do not know the distribution of the errors even conditionally on available instruments. So, implementing the exact Monte Carlo tests can be difficult.
In this paper, we examine whether a distribution-free method, such as a bootstrap method, can improve the properties of the DWH statistics, especially when identification is not very strong. To be more specific, we exploit the score interpretation of these statistics (see Engle, 1982, and Smith, 1983) to suggest a bootstrap method similar to those of Moreira et al. (2009) for the score test of the null hypothesis in the structural parameters. Our results provide some new insights and extensions of earlier studies.
We show that when identification is strong, the bootstrap method offers a high-order approximation of the null limiting distributions of the DWH statistics. Furthermore, the bootstrap test consistency holds under fixed alternative hypotheses (i.e. when endogeneity is present and does not depend on the sample size). However, the bootstrap only provides a first-order approximation when identification is weak. Moreover, we provide the necessary and sufficient condition under which the proposed bootstrap tests exhibit power under fixed endogeneity and weak instruments. The latter condition may still hold over a wide range of cases, provided that at least one instrument is not irrelevant. However, all the proposed bootstrap DWH tests have low power when all instruments are irrelevant or close to irrelevant.
This paper is organized as follows. In Section 2, the model and assumptions are formulated, and the studied statistics are presented. In Section 3, the proposed bootstrap method is discussed and the limiting distributions of the corresponding DWH statistics are characterized. In Section 4, the Monte Carlo experiment is presented, and the auxiliary theorem and proofs are provided in the Appendix. Throughout the paper, stands for the identity matrix of order q. For any full-column rank
matrix A,
is the projection matrix on the space of A, and
. The notation vec(A) is the
dimensional column vectorization of A.
for a squared matrix B means that B is positive definite. Convergence almost surely is symbolized by
,
stands for convergence in probability, while
means convergence in distribution. The usual orders of magnitude are denoted by
O(1) and o(1).
denotes the usual Euclidian or Frobenius norm for a matrix U, while rank(U) is the rank of U. For any set
,
is the boundary of
and
is the ε-neighbourhood of
. Finally,
is the supremum norm on the space of bounded continuous real functions, with topological space Ω.
2. FRAMEWORK


























Assumption 2.1. for some
and
.
Assumption 2.2.For sample size ,
,
and
, where
,
and
.
































Engle (1982) and Smith (1983) show that T2, T4 and in 2.7 are score (LM) statistics, while T3,
and
are quasi-Wald statistics.2 Staiger and Stock (1997) and Doko Tchatoka and Dufour (2011), among others, show that the quasi-Wald statistics can be overly conservative when IVs are weak. We investigate whether the bootstrap can improve the size and power of these tests, especially when identification is not very strong.
It is worth noting that the exogeneity tests in 2.7 also have their own shortcomings. Indeed, Moreira (2009, footnote 1) shows that testing is equivalent to test
in model 2.1–2.2, where
,
,
and
are given in 2.3. This means that doing a pre-test on
may imply important size distortions when making inference on β using a t-type test after the pre-test. Guggenberger (2010) shows that the asymptotic size of the two-stage t-test where a DWH pre-test is used equals 1 for some choices of the parameter space. Despite this issue, the DWH tests are widely used in many empirical works. This paper studies only the asymptotic validity of the bootstrap for the DWH statistics and does not address the issues of pre-testing.
3. BOOTSTRAP VALIDITY FOR THE DURBIN–WU–HAUSMAN TESTS




- Step 1. From the observed data, compute
and
along with all other things necessary to obtain the realizations of the statistics
,
, and the residuals from the reduced-form equation 2.3:
,
. These residuals are then re-centred by subtracting sample means to yield
.
- Step 2. For each bootstrap sample
, the data are generated following
where(3.1.)
and
are drawn independently from the joint empirical distribution of Z and
. The corresponding bootstrap statistics
and
are then computed for each bootstrap sample
.
- Step 3. The simulated bootstrap p-value of each statistic is obtained as the proportion of bootstrap statistics that are more extreme than the computed statistic from the observed data.
- Step 4. The corresponding bootstrap test rejects exogeneity at level α if its p-value is less than α.
Although the above bootstrap steps are similar to those in Moreira et al. (2009), it is worth noting that there is a substantial difference. In contrast to Moreira et al. (2009), where the two-stage least-squares (2SLS) or the limited information maximum likelihood (LIML) estimators are suggested as the pseudo-true value of β under the bootstrap data-generating process (DGP), our algorithm uses the OLS estimator of β in 2.1. Indeed, Moreira et al. (2009) show that the validity of their bootstrap requires using an estimator that satisfies
and
(i.e.
is a (strong) consistent estimator of
). In a linear classical setting of this paper, both the 2SLS and LIML estimators satisfy the sufficient conditions for strong consistency; see footnote 3 of Moreira et al. (2009, p. 55). The OLS estimator
is not qualified for strong consistency when
(endogeneity). However, when
(exogeneity),
is consistent and efficient even when IVs are weak. Based on this fact,
is preferred to an alternative 2SLS or LIML estimator because the choice of the latter should imply a sizable efficiency loss under H0. Moreover, choosing
as the pseudo-true value of β when H0 is false is suggested in Horowitz (2001) to approximate the bootstrap power.3
In the remainder of the paper, denotes the empirical distribution of
,
given
,
is the probability under the empirical distribution function (given
) and
is its corresponding expectation operator. Also, let
and
be the cumulative density function (cdf) and the probability density function (pdf) of a χ2-distributed random variable with one degree of freedom. To ease the exposition of our results, we shall deal separately with the case where identification is strong and the case where it is weak.
3.1. Strong identification
We focus here on the case where identification is strong and we study the limiting distributions of the bootstrap DWH statistics under both the null and alternative hypotheses. Let and
(
and
) denote the empirical cumulative distributions of
and
, respectively, evaluated at τ. Theorem 3.1 states the bootstrap validity for the DWH statistics under strong instruments.
Theorem 3.1.Suppose that Assumptions 2.1 and 2.2 are satisfied and that is fixed. Then for some integer
, we have (a)
,
if
; (b)
,
as
if
is fixed. Here,
,
are polynomials in τ with coefficients depending on
,
and the moments of
The proof of Theorem 3.1(a) follows the same steps as Theorem 3 in Moreira et al. (2009) and is therefore omitted. The proof of Theorem 3.1(b) follows similar steps to those of Lemma A.1(b) of the online Appendix, and thus it is omitted. Theorem 3.1(a) shows that the bootstrap estimates and the -term empirical Edgeworth expansion in Theorem A.1(a) (see the Appendix) for all statistics are asymptotically equivalent up to the
order under H0. Furthermore, the bootstrap makes an error of size
under H0, which is smaller as
than both
and the error made by the first-order asymptotic approximations. The bootstrap provides a greater accuracy than the
order because all the DWH statistics are quadratic functions of symmetric pivotal statistics (see Horowitz, 2001, Chapter 52, equation (3.13)) under exogeneity (
) and strong identification (
). Theorem 3.1(b) implies that all bootstrap tests are consistent when endogeneity is fixed and model identification is strong. Note that the bootstrap DWH test consistency holds under fixed endogeneity and strong identification no matter which critical value τ is used in the bootstrap procedure. In particular, this would be the case if the bootstrap critical value or the empirical size-corrected critical value were used in the bootstrap procedure, as suggested by Horowitz (2001). Although Theorem 3.1(b) only considers fixed alternative hypotheses, there is no impediment to expanding it to local to zero alternatives of the form
(
). This proof is omitted in order to shorten the exposition.
3.2. Local to zero weak instruments
We now analyse the Staiger and Stock (1997) local to zero weak instruments framework. To be more specific, we assume that , where
is a fixed vector (possibly zero). Because of the lack of identification, an Edgeworth expansion, such as in Theorem 3.1(a), is no longer valid. Indeed, we can express
, for example, as a quadratic function in
, where
itself is a function
of the sample.4 However, this function
is not differentiable when IVs are weak.5 Nonetheless, we can prove the first-order validity of the bootstrap under both H0 and the alternative hypothesis (endogeneity).
Theorem 3.2.Suppose that Assumption 2.2 holds and (exogeneity). Let
, where
is fixed. If, for some
,
, then
,
, conditional on
, where
and
are given in 2.5.
Theorem 3.3.Suppose that Assumption 2.2 is satisfied and is fixed (endogeneity). Let
, where
is fixed. If, for some
,
, then the necessary and sufficient condition under which
and
exhibit power is that
. More precisely, we have (a)
;
conditional on
, if
, where
and
are given in 2.5; (b)
;
, conditional on
, if
.
The proofs of Theorems 3.2 and 3.3 follow directly from those of Lemmata A.5 and A.6 in the online Appendix, and thus are omitted. Theorem 3.2 shows that the bootstrap provides a first-order approximation of the empirical distributions of all DWH statistics. Therefore, the bootstrap DWH tests almost surely have the correct size, despite the lack of identification. Because of the score nature of T2, T4 and , the validity of the bootstrap for these statistics can be viewed as an extension of Moreira et al. (2009) to LM tests for exogeneity. However, the bootstrap validity for the Wald-type tests T3,
and
is not intuitive. In general, the bootstrap often fails for Wald-type statistics when instruments are weak because their limiting distributions often involve nuisance parameters; see Moreira et al. (2009), among others. The validity of the bootstrap for the DWH statistics is mainly justified by the fact that they do not directly depend on the unidentified structural coefficient β, even when endogeneity is present; see Section 3 of Wu (1973). So, the lack of identification of β has no impact on the size of the bootstrap tests. This is not, however, the case for the power of the bootstrap tests. Indeed, Theorem 3.3(b) shows that the power of the bootstrap tests cannot exceed the nominal level if IVs are irrelevant (
), no matter which critical value is used in the bootstrap procedure. This is because the limiting distributions of all bootstrap statistics are the same as when H0 holds, although
is fixed. This result is intuitive because
is not identifiable if β is not identifiable (see Doko Tchatoka and Dufour, 2014), which is the case when
. Clearly, all values of
are observationally equivalent when
so that the bootstrap tests fail to discriminate between
and
. However, the bootstrap tests exhibit power, provided that at least one instrument is not irrelevant (
).
4. MONTE CARLO EXPERIMENT




































Bootstrap DWH tests | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
|||||||||||||
Statistics↓ ![]() |
0 | 0.05 | 0.1 | 1 | 0 | 0.05 | 0.1 | 1 | 0 | 0.05 | 0.1 | 1 | 0 | 0.05 | 0.1 | 1 |
![]() |
6.8 | 17.9 | 73.4 | 100 | 5.2 | 6.6 | 5.8 | 4.5 | 4.6 | 11.2 | 35.7 | 100 | 6.2 | 24.1 | 82.2 | 100 |
![]() |
5.5 | 21.0 | 71.7 | 100 | 4.4 | 4.4 | 6.6 | 4.4 | 4.1 | 5.9 | 35.5 | 100 | 4.0 | 24.4 | 77.1 | 100 |
![]() |
6.8 | 17.9 | 73.4 | 100 | 5.2 | 6.6 | 5.8 | 4.5 | 4.6 | 11.2 | 35.7 | 100 | 6.2 | 24.1 | 82.2 | 100 |
![]() |
5.5 | 21.0 | 71.7 | 100 | 4.4 | 4.4 | 6.6 | 4.4 | 4.1 | 5.9 | 35.5 | 100 | 4.0 | 24.4 | 77.1 | 100 |
![]() |
5.5 | 21.0 | 71.7 | 100 | 4.4 | 4.4 | 6.6 | 4.4 | 4.1 | 5.9 | 35.5 | 100 | 4.0 | 24.4 | 77.1 | 100 |
![]() |
6.8 | 17.9 | 73.4 | 100 | 5.2 | 6.6 | 5.8 | 4.5 | 4.6 | 11.2 | 35.7 | 100 | 6.2 | 24.1 | 82.2 | 100 |
Standard DWH test | ||||||||||||||||
![]() |
![]() |
![]() |
![]() |
|||||||||||||
Statistics↓ ![]() |
0 | 0.05 | 0.1 | 1 | 0 | 0.05 | 0.1 | 1 | 0 | 0.05 | 0.1 | 1 | 0 | 0.05 | 0.1 | 1 |
T2 | 5.0 | 19.3 | 74.8 | 100 | 4.8 | 6.2 | 6.0 | 4.9 | 3.8 | 12.0 | 38.4 | 100 | 5.1 | 25.6 | 82.8 | 100 |
T3 | 0.2 | 6.7 | 64.3 | 100 | 0.2 | 0.9 | 3.0 | 4.9 | 0.4 | 2.1 | 26.9 | 100 | 0.4 | 8.0 | 74.4 | 100 |
T4 | 4.9 | 19.0 | 74.7 | 100 | 4.8 | 6.2 | 5.9 | 4.9 | 3.8 | 11.4 | 38.2 | 100 | 4.9 | 25.3 | 82.7 | 100 |
![]() |
0.2 | 6.6 | 64.0 | 100 | 0.2 | 0.8 | 3.0 | 4.8 | 0.4 | 2.0 | 26.7 | 100 | 0.4 | 7.7 | 73.7 | 100 |
![]() |
0.2 | 6.7 | 64.4 | 100 | 0.2 | 0.9 | 3.0 | 4.9 | 0.4 | 2.3 | 27.2 | 100 | 0.4 | 8.1 | 74.8 | 100 |
![]() |
5.0 | 19.0 | 74.7 | 100 | 4.8 | 6.2 | 5.9 | 4.9 | 3.8 | 11.7 | 38.3 | 100 | 4.9 | 25.3 | 82.7 | 100 |
ACKNOWLEDGEMENTS
We are grateful to Professor Richard J. Smith, Managing Editor of the Econometrics Journal and to two anonymous referees for their constructive comments and suggestions. We also thank Jean-Marie Dufour, Mardi Dungey, Ngoc Thien Ahn Pham and Robert Garrard for several useful comments. This project is supported by a School of Economics and Finance (University of Tasmania) research grant, and I am grateful for the support.
Appendix A: AUXILIARY THEOREM
Let and
, where
,
,
and
are given in 2.7.
Theorem A.1.Suppose that Assumptions 2.1 and 2.2 are satisfied and that is fixed. Then for some integer
, we have (a)
,
if
; (b)
,
as
if
is fixed, where
and
depend on
,
, and the moments of the distribution F of
.
Proof.First, we can write and
as
(with
as
) and
. (a) Suppose that H0 is satisfied. We want to approximate
and
uniformly in τ. First, we can write both
and
as
and
, where
are convex sets. From Bhattacharya and Rao (1976, Corollary 3.2), we have
for some constant d and
. So, Theorem 1 of Bhattacharya and Ghosh (1978) holds with
and
. By using the approximation of
and
in Lemma A.1(a) of the online Appendix and the definition of
, Theorem A.1(a) follows directly from the fact that the odd terms of the quadratic expansion are even. (b) When
is fixed, the proof follows similar steps to those of Lemma A.1(b) of the online Appendix.
REFERENCES













