Adaptive elastic net for group testing
Corresponding Author
Karl B. Gregory
Department of Statistics, University of South Carolina, Columbia, South Carolina
Karl B. Gregory, Department of Statistics, University of South Carolina, Columbia, South Carolina
Email: [email protected]
Dewei Wang, Department of Statistics, University of South Carolina, Columbia, SC
Email: [email protected]
Christopher S. McMahan, Department of Mathematical Sciences, Clemson University, Clemson, SC
Email: [email protected]
Search for more papers by this authorCorresponding Author
Dewei Wang
Department of Statistics, University of South Carolina, Columbia, South Carolina
Karl B. Gregory, Department of Statistics, University of South Carolina, Columbia, South Carolina
Email: [email protected]
Dewei Wang, Department of Statistics, University of South Carolina, Columbia, SC
Email: [email protected]
Christopher S. McMahan, Department of Mathematical Sciences, Clemson University, Clemson, SC
Email: [email protected]
Search for more papers by this authorCorresponding Author
Christopher S. McMahan
Department of Mathematical Sciences, Clemson University, Clemson, South Carolina
Karl B. Gregory, Department of Statistics, University of South Carolina, Columbia, South Carolina
Email: [email protected]
Dewei Wang, Department of Statistics, University of South Carolina, Columbia, SC
Email: [email protected]
Christopher S. McMahan, Department of Mathematical Sciences, Clemson University, Clemson, SC
Email: [email protected]
Search for more papers by this authorCorresponding Author
Karl B. Gregory
Department of Statistics, University of South Carolina, Columbia, South Carolina
Karl B. Gregory, Department of Statistics, University of South Carolina, Columbia, South Carolina
Email: [email protected]
Dewei Wang, Department of Statistics, University of South Carolina, Columbia, SC
Email: [email protected]
Christopher S. McMahan, Department of Mathematical Sciences, Clemson University, Clemson, SC
Email: [email protected]
Search for more papers by this authorCorresponding Author
Dewei Wang
Department of Statistics, University of South Carolina, Columbia, South Carolina
Karl B. Gregory, Department of Statistics, University of South Carolina, Columbia, South Carolina
Email: [email protected]
Dewei Wang, Department of Statistics, University of South Carolina, Columbia, SC
Email: [email protected]
Christopher S. McMahan, Department of Mathematical Sciences, Clemson University, Clemson, SC
Email: [email protected]
Search for more papers by this authorCorresponding Author
Christopher S. McMahan
Department of Mathematical Sciences, Clemson University, Clemson, South Carolina
Karl B. Gregory, Department of Statistics, University of South Carolina, Columbia, South Carolina
Email: [email protected]
Dewei Wang, Department of Statistics, University of South Carolina, Columbia, SC
Email: [email protected]
Christopher S. McMahan, Department of Mathematical Sciences, Clemson University, Clemson, SC
Email: [email protected]
Search for more papers by this authorAbstract
For disease screening, group (pooled) testing can be a cost-saving alternative to one-at-a-time testing, with savings realized through assaying pooled biospecimen (eg, urine, blood, saliva). In many group testing settings, practitioners are faced with the task of conducting disease surveillance. That is, it is often of interest to relate individuals’ true disease statuses to covariate information via binary regression. Several authors have developed regression methods for group testing data, which is challenging due to the effects of imperfect testing. That is, all testing outcomes (on pools and individuals) are subject to misclassification, and individuals’ true statuses are never observed. To further complicate matters, individuals may be involved in several testing outcomes. For analyzing such data, we provide a novel regression methodology which generalizes and extends the aforementioned regression techniques and which incorporates regularization. Specifically, for model fitting and variable selection, we propose an adaptive elastic net estimator under the logistic regression model which can be used to analyze data from any group testing strategy. We provide an efficient algorithm for computing the estimator along with guidance on tuning parameter selection. Moreover, we establish the asymptotic properties of the proposed estimator and show that it possesses “oracle” properties. We evaluate the performance of the estimator through Monte Carlo studies and illustrate the methodology on a chlamydia data set from the State Hygienic Laboratory in Iowa City.
Supporting Information
Web Appendices, Tables, and Figures referenced in Sections 2–8 as well as a zip file with software and examples are available with this paper at the Biometricswebsite on Wiley Online Library.
Filename | Description |
---|---|
biom12973-sup-0001-SuppData-S1.pdf1.2 MB | Supplementary Data S1. |
biom12973-sup-0002-SuppData-S2.zip154.6 KB | Supplementary Data S2. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- Albert, A. and Anderson, J. A. (1984). On the existence of maximum likelihood estimates in logistic regression models Biometrika 71, 1–10.
- Bühlmann, P., and Geer, S. (2011). Statistics for High-Dimensional Data. Methods, Theory and Applications. Heidelberg: Springer.
- Chen, P., Tebbs, J. M., and Bilder, C. R. (2009). Group testing regression models with fixed and random effects. Biometrics 65, 1270–1278.
- Das, D., Gregory, K., and Lahiri, S. N. (2017). Perturbation bootstrap in adaptive lasso. arXiv preprint arXiv:1703.03165.
- Delaigle, A. and Meister, A. (2011). Nonparametric regression analysis for group testing data. J Am Stat Assoc 106, 640–650.
- Delaigle, A., Hall, P., and Wishart, J. (2014). New approaches to non-and semi-parametric regression for univariate and multivariate group testing data. Biometrika 101, 567–585.
- Delaigle, A. and Hall, P. (2015). Nonparametric methods for group testing data, taking dilution into account Biometrika 102, 871–887.
-
Dorfman, R.
(1943).
The detection of defective members of large populations.
Ann Math Stat
14, 436–440.
10.1214/aoms/1177731363 Google Scholar
- Farrington, C. P. (1992). Estimating prevalence by group testing using generalized linear models. Stat Med 11, 1591–1597.
- Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 33, 1–22.
- Gastwirth, J. L. and Johnson, W. O. (1994). Screening with cost-effective quality control: potential applications to HIV and drug testing. J Am Stat Assoc 89, 972–981.
- Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R., et al. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42, 1166–1202.
- Geer, S., Bhlmann, P., and Zhou, S. (2011). The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) Electron J Statist 5, 688–749.
- Heffernan, A. L., Aylward, L. L., Leisa-maree, L., Sly, P. D., Macleod, M., and Mueller, J. F. (2014). Pooled biological specimens for human biomonitoring of environmental chemicals: opportunities and limitations. J Expo Sci Environ Epidemiol 24, 225–232.
- Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67.
- Huang, X. (2009). An improved test of latent-variable model misspecification in structural measurement error models for group testing data. Stat Med 28, 3316–3327.
- Huang, J., Ma, S., and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Stat Sin 18, 1603–1618.
- Hui, F. K. C., Warton, D. I., and Foster, S. D. (2015). Tuning parameter selection for the adaptive lasso using ERIC. J Am Stat Assoc 110, 262–269.
- Kim, H.-Y., Hudgens, M. G., Dreyfuss, J. M., Westreich, D. J., and Pilcher, C. D. (2007). Comparison of group testing algorithms for case identification in the presence of test error. Biometrics 63, 1152–1163.
- Krajden, M., Cook, D., Mak, A., et al. (2014). Pooled nucleic acid testing increases the diagnostic yield of acute HIV infections in a high-risk population compared to 3rd and 4th generation HIV enzyme immunoassays. J Clin Virol 61, 132–137.
- Lehmann, E. and Casella, G. (1998). Theory of Point Estimation, 2nd edn, New York: Springer.
- Lewis, J. L., Lockary, V. M., and Kobic, S. (2012). Cost savings and increased efficiency using a stratified specimen pooling strategy for Chlamydia trachomatis and Neisseria gonorrhoeae. Sexually Transmitted Dis 39, 46–48.
- Liu, A., Liu, C., Zhang, Z., and Albert, P. S. (2011). Optimality of group testing in the presence of misclassification. Biometrika 99, 245–251.
- McMahan, C. S., Tebbs, J. M., and Bilder, C. R. (2012). Regression models for group testing data with pool dilution effects. Biostatistics 14, 284–298.
- McMahan, C. S., Tebbs, J. M., Hanson, T. E., and Bilder, C. R. (2017). Bayesian regression for group testing data. Biometrics 73, 1443–1452.
-
Navarro, C.,
Jolly, A.,
Nair, R., and
Chen, Y.
(2003).
Risk factors for genital
Chlamydial infection.
J Sex Reprod Med
3, 23–34.
10.4172/1488-5069.1000047 Google Scholar
- Thompson, K. H. (1962). Estimation of the proportion of vectors in a natural population of insects. Biometrics 18, 568–578.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B (Methodol)58, 267–288.
- Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann Stat 40, 1198–1232.
- Vansteelandt, S., Goetghebeur, E., and Verstraeten, T. (2000). Regression models for disease prevalence with diagnostic tests on pools of serum samples. Biometrics 56, 1126–1133.
- Wang, D., McMahan, C. S., Gallagher, C. M., and Kulasekera, K. B. (2014). Semiparametric group testing regression models. Biometrika 101, 587–598.
- Xie, M. (2001). Regression analysis of group testing samples. Stat Med 20, 1957–1969.
- Zhang, B., Bilder, C. R., and Tebbs, J. M. (2013). Group testing regression model estimation when case identification is a goal. Biom J 55, 173–189.
- Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J Royal Stat Soc: Ser B (Stat Methodol) 76, 217–242.
- Zou, H. (2006). The adaptive lasso and its oracle properties. J Am Stat Assoc 101, 1418–1429.
- Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J Royal Stat Soc: Ser B (Stat Methodol) 67, 301–320.
- Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37, 1733–1751.