Impact of selection bias on the evaluation of clusters of chemical compounds in the drug discovery process
Corresponding Author
Ariel Alonso
Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Katholieke Universiteit Leuven, Leuven, Belgium
Correspondence to: Ariel Alonso, Interuniversity Institute for Biostatistics and statistical Bioinformatics, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium.
E-mail: [email protected]
Search for more papers by this authorElasma Milanzi
Interuniversity Institute for Biostatistics and statistical Bioinformatics, Universiteit Hasselt, Diepenbeek, Belgium
Search for more papers by this authorGeert Molenberghs
Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Katholieke Universiteit Leuven, Leuven, Belgium
Interuniversity Institute for Biostatistics and statistical Bioinformatics, Universiteit Hasselt, Diepenbeek, Belgium
Search for more papers by this authorCorresponding Author
Ariel Alonso
Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Katholieke Universiteit Leuven, Leuven, Belgium
Correspondence to: Ariel Alonso, Interuniversity Institute for Biostatistics and statistical Bioinformatics, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium.
E-mail: [email protected]
Search for more papers by this authorElasma Milanzi
Interuniversity Institute for Biostatistics and statistical Bioinformatics, Universiteit Hasselt, Diepenbeek, Belgium
Search for more papers by this authorGeert Molenberghs
Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Katholieke Universiteit Leuven, Leuven, Belgium
Interuniversity Institute for Biostatistics and statistical Bioinformatics, Universiteit Hasselt, Diepenbeek, Belgium
Search for more papers by this authorAbstract
Expert opinion plays an important role when selecting promising clusters of chemical compounds in the drug discovery process. Indeed, experts can qualitatively assess the potential of each cluster, and with appropriate statistical methods, these qualitative assessments can be quantified into a success probability for each of them. However, one crucial element often overlooked is the procedure by which the clusters are assigned to/selected by the experts for evaluation. In the present work, the impact such a procedure may have on the statistical analysis and the entire evaluation process is studied. It has been shown that some implementations of the selection procedure may seriously compromise the validity of the evaluation even when the rating and selection processes are independent. Consequently, the fully random allocation of the clusters to the experts is strongly advocated. Copyright © 2014 John Wiley & Sons, Ltd.
Supporting Information
Filename | Description |
---|---|
pst1665-sup-0001-supplementary.pdfPDF document, 218.4 KB | Supporting info item |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- 1 Alonso A, Molenberghs G. Surrogate endpoints: hopes and perils. Pharmacoeconomics and Outcomes Research 2008; 8: 255–259, DOI: 10.1586/14737167.8.3.255.
- 2 Mandal A, Johnson K, Wu JCF, Bornemeier D. Identifying promising compounds in drug discovery: genetic algorithms and some new statistical techniques. Journal of Chemical Information and Modeling 2007; 47: 981–988.
- 3 Hack MD, Rassokhin DN, Buyck C, Seierstad M, Skalkin A, ten Holte P, Jones TK, Mirzadegan T, Agrafiotis DK. Library enhancement through the wisdom of crowds. Journal of Chemical Information and Modeling 2011; 51: 3275–3286.
- 4 Oxman AD, Lavis JN, Fretheim A. Use of evidence in WHO recommendations. Lancet 2007; 369: 1883–1889.
- 5 Geneletti S, Richardson S, Best N. Adjusting for selection bias in retrospective, case-control studies. Biostatistics 2009; 10: 17–31.
- 6 Hernán MA, Hernández-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology 2004; 15: 615–625.
- 7 Horwitz R, Feinstein A. Alternative analytic methods for case-control studies of estrogens and endometrial cancer. New England Journal of Medicine 1978; 299: 368–387.
- 8 Geneletti S, Mason A, Best N. Adjusting for selection effects in epidemiologic studies: why sensitivity analysis is the only “solution”. Commentary in Epidemiology 2011; 22: 36–39.
- 9 Torner A, Duberg A, Dickman P, Svensson A. A proposed method to adjust for selection bias in cohort studies. American Journal of Epidemiology 2010; 171: 602–608.
- 10 Heckman J. Sample selection bias as a specification error. Econometrica 1979; 47: 153–161.
- 11 Puhani PA. The Heckman correction for sample selection and its critique. Journal of Economic Surveys 2000; 14: 53–68.
- 12
Baser O,
Bradley CJ,
Gardiner JC,
Given C. Testing and correcting for non-random selection bias due to censoring: an application to medical costs. Health Services & Outcomes Research Methodology 2003; 4: 93–107.
10.1023/B:HSOR.0000027922.32776.62 Google Scholar
- 13 Jüni P, Egger M. Empirical evidence of attrition bias in clinical trials. International Journal of Epidemiology 2005; 34: 87–88.
- 14 Lee B, Marsh LC. Sample selection bias correction for missing response observations. Oxford Bulletin of Economics and Statistics 2000; 62: 305–322.
- 15 Agrafiotis DK, Alex S, Dai H, Derkinderen A, Farnum M, Gates P, Izrailev S, Jaeger EP, Konstant P, Leung A, Lobanov VS, Marichal P, Martin D, Rassokhin DN, Shemanarev M, Skalkin A, Stong J, Tabruyn T, Vermeiren M, Wan J, Xu XY, Yao X. Advanced Biological and Chemical Discovery (ABCD): centralizing discovery knowledge in an inherently decentralized world. Journal of Chemical Information and Modeling 2007; 47: 1999–2014.
- 16 Milanzi E, Alonso A, Buyck C, Molenberghs G, Bijnens L. A permutational-splitting sample procedure to quantify expert opinion on chemical cluster using high-dimensional data. Annals of Applied Statistics 2014; 00: 00–00.
- 17 Follmann D, Wu M. An approximate generalized linear model with random effects for informative missing data. Biometrics 1995; 51: 151–168.
- 18 Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association 1995; 90: 1112–1121.
- 19 Lindstrom ML, Bates DM. Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association 1988; 83: 1014–1021.
- 20 Holland PW. Statistics and causal inference. Journal of the American Statistical Association 1986; 81: 945–960.
- 21
Molenberghs G,
Kenward MG. Missing data in clinical studies. Wiley: New York, 2007.
10.1002/9780470510445 Google Scholar
- 22 Kenward MG, Carpenter J. Multiple imputation: current perspectives. Statistical Methods in Medical Research 2007; 16: 199–218.
- 23 Rubin DB. Inference and missing data. Biometrika 1976; 63: 581–592.
- 24 Creemers A, Hens N, Aerts M, Molenberghs G, Verbeke G, Kenward MG. Generalized shared-parameter models and missingness at random. Statistical Modeling 2011; 11: 279–311.
- 25 Molenberghs G, Beunckens C, Kenward MG. Every missing not at random model has got a missing at random counterpart with equal fit. Journal of the Royal Statistical Society, Series B 2008; 70: 371–388.