Adaptive pre-specification in randomized trials with and without pair-matching
Corresponding Author
Laura B. Balzer
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115 MA, U.S.A.
Correspondence to: Laura Balzer, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, U.S.A.
E-mail: [email protected]
Search for more papers by this authorMark J. van der Laan
Division of Biostatistics, University of California, Berkeley, 94110-7358 CA, U.S.A.
Search for more papers by this authorMaya L. Petersen
Division of Biostatistics, University of California, Berkeley, 94110-7358 CA, U.S.A.
Search for more papers by this authorthe SEARCH Collaboration
Division of HIV, Infectious Diseases and Global Medicine, University of California, San Francisco, 94143-0874 CA, U.S.A.
Search for more papers by this authorCorresponding Author
Laura B. Balzer
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115 MA, U.S.A.
Correspondence to: Laura Balzer, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, U.S.A.
E-mail: [email protected]
Search for more papers by this authorMark J. van der Laan
Division of Biostatistics, University of California, Berkeley, 94110-7358 CA, U.S.A.
Search for more papers by this authorMaya L. Petersen
Division of Biostatistics, University of California, Berkeley, 94110-7358 CA, U.S.A.
Search for more papers by this authorthe SEARCH Collaboration
Division of HIV, Infectious Diseases and Global Medicine, University of California, San Francisco, 94143-0874 CA, U.S.A.
Search for more papers by this authorAbstract
In randomized trials, adjustment for measured covariates during the analysis can reduce variance and increase power. To avoid misleading inference, the analysis plan must be pre-specified. However, it is often unclear a priori which baseline covariates (if any) should be adjusted for in the analysis. Consider, for example, the Sustainable East Africa Research in Community Health (SEARCH) trial for HIV prevention and treatment. There are 16 matched pairs of communities and many potential adjustment variables, including region, HIV prevalence, male circumcision coverage, and measures of community-level viral load. In this paper, we propose a rigorous procedure to data-adaptively select the adjustment set, which maximizes the efficiency of the analysis. Specifically, we use cross-validation to select from a pre-specified library the candidate targeted maximum likelihood estimator (TMLE) that minimizes the estimated variance. For further gains in precision, we also propose a collaborative procedure for estimating the known exposure mechanism. Our small sample simulations demonstrate the promise of the methodology to maximize study power, while maintaining nominal confidence interval coverage. We show how our procedure can be tailored to the scientific question (intervention effect for the study sample vs. for the target population) and study design (pair-matched or not). Copyright © 2016 John Wiley & Sons, Ltd.
Supporting Information
Filename | Description |
---|---|
sim7023-sup-0001-supplementary.pdfPDF document, 128.8 KB |
Supporting info item |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- 1Fisher RA. Statistical Methods for Research Workers (4th edn.) Oliver and Boyd Ltd.: Edinburgh, 1932.
- 2Cochran WG. Analysis of covariance: its nature and uses. Biometrics 1957; 13: 261–281.
- 3Cox DR, McCullagh P. Some aspects of analysis of covariance. Biometrics 1982; 38(3): 541–561.
- 4Tsiatis AA, Davidian M, Zhang M, Lu X. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics in Medicine 2008; 27(23): 4658–4677.
- 5Moore KL, van der Laan MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Statistics in Medicine 2009; 28(1): 39–64.
- 6ICH Harmonised Tripartite Guideline. Statistical Principles for Clinical Trials E9, February 1998.
- 7Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Statistics in Medicine 2002; 21(19): 2917–2930.
- 8Hayes RJ, Moulton LH. Cluster Randomised Trials. Chapman & Hall/CRC: Boca Raton, 2009.
- 9Austin PC, Manca A, Zwarensteina M, Juurlinka DN, Stanbrook MB. A substantial and confusing variation exists in handling of baseline covariates in randomized controlled trials: a review of trials published in leading medical journals. Journal of Clinical Epidemiology 2010; 63: 142–153.
- 10Kahn BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 2014; 15(139): 1–7.
- 11Campbell MJ. 2014. Cluster randomized trials. In Handbook of Epidemiology, (2nd edn.), W Ahrens, I Pigeot (eds). Springer: New York.
- 12European Medicines Agency. Guideline on Adjustment for Baseline Covariates in Clinical Trials: London, 2015.
- 13Zhang M, Tsiatis AA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 2008; 64(3): 707–715.
- 14Rubin DB, van der Laan MJ. Empirical efficiency maximization: improved locally efficient covariate adjustment in randomized experiments and survival analysis. The International Journal of Biostatistics 2008; 4(1): Article 5.
10.2202/1557-4679.1084 Google Scholar
- 15Shen C, Li X, Li L. Inverse probability weighting for covariate adjustment in randomized studies. Statistics in Medicine 2014; 33: 555–568.
- 16Rosenblum M, van der Laan MJ. Simple, efficient estimators of treatment effects in randomized trials using generalized linear models to leverage baseline variables. The International Journal of Biostatistics 2010; 6(1): Article 13.
- 17Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods–application to control of the healthy worker survivor effect. Mathematical Modelling 1986; 7: 1393–1512.
- 18van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. The International Journal of Biostatistics 2006; 2(1): Article 11.
- 19van der Laan M, Rose S. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer: New York Dordrecht Heidelberg London, 2011.
- 20Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models (with rejoiner). Journal of the American Statistical Association 1999; 94(448): 1096–1120 (1135–1146).
- 21Colantuoni E, Rosenblum M. Leveraging prognostic baseline variables to gain precision in randomized trials. In 263, Johns Hopkins University, Dept. of Biostatistics Working Papers, 2015.
- 22Klar N, Donner A. The merits of matching in community intervention trials: a cautionary tale. Statistics in Medicine 1997; 16(15): 1753–1764.
10.1002/(SICI)1097-0258(19970815)16:15<1753::AID-SIM597>3.0.CO;2-E CAS PubMed Web of Science® Google Scholar
- 23Imbens GW. Experimental design for unit and cluster randomized trials. Technical Report, NBER Technical Working Paper, 2011.
- 24van der Laan MJ, Balzer LB, Petersen ML. Adaptive matching in randomized trials and observational studies. Journal of Statistical Research 2012; 46(2): 113–156.
- 25Balzer LB, Petersen ML, van der Laan MJ, the SEARCH Consortium. Adaptive pair-matching in randomized trials with unbiased and efficient effect estimation. Statistics in Medicine 2015; 34(6): 999–1011.
- 26Balzer LB, Petersen ML, van der Laan MJ. Targeted estimation and inference of the sample average treatment effect in trials with and without pair-matching. Statistics in Medicine 2016. 10.1002/sim.6965.
- 27Moore KL, Neugebauer R, Valappil T, van der Laan MJ. Robust extraction of covariate information to improve estimation efficiency in randomized trials. Statistics in Medicine 2011; 30(19): 2389–2408.
- 28Califf RM, Zarin DA, Kramer JM, Sherman RE, Aberle LH, Tasneem A. Characteristics of clinical trials registered in ClinicalTrials.gov, 2007-2010. JAMA 2012; 307(17): 1838–1847.
- 29Selvaraj S, Prasad V. Characteristics of cluster randomized trials: are they living up to the randomized trial?JAMA Internal Medicine 2013; 173(23): 313–315.
- 30Olken BA. Pre-analysis plans in economics. Technical Report, Massachusetts Institute of Technology Department of Economics, 2015.
- 31van der Laan MJ. 2011. Appendix A.19: Efficiency maximization and TMLE. In Targeted Learning: Causal Inference for Observational and Experimental Data, MJ van der Laan, S Rose (eds).Springer: New York Dordrecht Heidelberg London; 572–575.
10.1007/978-1-4419-9782-1 Google Scholar
- 32van der Laan MJ, Gruber S. Collaborative double robust targeted maximum likelihood estimation. The International Journal of Biostatistics 2010; 6(1): 1–71.
- 33Gruber S, van der Laan MJ. 2011. C-TMLE of an additive point treatment effect. In Targeted Learning: Causal Inference for Observational and Experimental Data, MJ van der Laan, S Rose (eds). Springer: New York Dordrecht Heidelberg London; 301–321.
10.1007/978-1-4419-9782-1_19 Google Scholar
- 34Neyman J. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes (In Polish). English translation by D.M. Dabrowska and T.P. Speed (1990). Statistical Science 1923; 5: 465–480.
10.1214/ss/1177012031 Google Scholar
- 35University of California SanFrancisco. Sustainable East Africa Research in Community Health (SEARCH). ClinicalTrials.gov, 2013. Available from: http://clinicaltrials.gov/show/NCT01864603. Accessed May 17, 2016.
- 36R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2015. Available from: http://www.R-project.org. Accessed May 17, 2016.
- 37Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 1974; 66(5): 688–701.
- 38Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. The International Journal of Biostatistics 2010; 6(1): Article 26, doi:10.2202/1557–4679.1260.
- 39van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer-Verlag: New York Berlin Heidelberg, 2003.
10.1007/978-0-387-21700-0 Google Scholar
- 40Small DS, Ten Have TR, Rosenbaum PR. Randomization inference in a group-randomized trial of treatments for depression: covariate adjustment, noncompliance, and quantile effects. Journal of the American Statistical Association 2008; 103(481): 271–279.
- 41Zhang K, Traskin M, Small DS. A powerful and robust test statistic for randomization inference in group-randomized trials with matched pairs of groups. Biometrics 2012; 68: 75–84.
- 42Rubin DB. Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science 1990; 5(4): 472–480.
10.1214/ss/1177012032 Google Scholar
- 43Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Review of Economics and Statistics 2004; 86(1): 4–29.
- 44Greevy R, Lu B, Silber JH, Rosenbaum P. Optimal multivariate matching before randomization. Biostatistics 2004; 5(2): 263–275.
- 45Zhang K, Small DS. Comment: the essential role of pair matching in cluster-randomized experiments, with application to the Mexican universal health insurance evaluation. Statistical Science 2009; 25(1): 59–64.
- 46Lu B, Greevy R, Xu X, Beck C. Optimal nonbipartite matching and its statistical applications. American Statistician 2011; 65(1): 21–30.
- 47Freedman LS, Gail MH, Green SB, Corle DK, The COMMIT Research Group. The efficiency of the matched-pairs design of the Community Intervention Trial for Smoking Cessation (COMMIT). Controlled Clinical Trials 1997; 18(2): 131–139.
- 48Campbell MJ, Donner A, Klar N. Developments in cluster randomized trials and Statistics in Medicine. Statistics in Medicine 2007; 26(1): 2–19.
- 49Imai K, King G, Nall C. The essential role of pair matching in cluster-randomized experiments, with application to the Mexican Universal Health Insurance Evaluation. Statistical Science 2009; 24(1): 29–53.
- 50Beck C, Lu B, Greevy R. nbpMatching: Functions for Optimal Non-bipartite Optimal Matching, 2016. Available from: https://CRAN.R-project.org/package=nbpMatching, R package version 1.5.0. Accessed May 17, 2016.
- 51Yuan S, Zhang HH, Davidian M. Variable selection for covariate-adjusted semiparametric inference in randomized clinical trials. Statistics in Medicine 2012; 31: 3789–3804.
- 52Abadie A, Imbens G. Simple and bias-corrected matching estimators for average treatment effects. Technical Report 283, NBER technical working paper, 2002.
- 53Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992; 3: 143–155.
- 54Pearl J. Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann: San Francisco, 2001; 411–420.