Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high-throughput toxicity testing
Corresponding Author
Matthew W. Wheeler
Risk Analysis Branch, National Institute for Occupational Safety and Health, Cincinnati, Ohio, U.S.A.
email: [email protected]Search for more papers by this authorCorresponding Author
Matthew W. Wheeler
Risk Analysis Branch, National Institute for Occupational Safety and Health, Cincinnati, Ohio, U.S.A.
email: [email protected]Search for more papers by this authorSummary
Many modern datasets are sampled with error from complex high-dimensional surfaces. Methods such as tensor product splines or Gaussian processes are effective and well suited for characterizing a surface in two or three dimensions, but they may suffer from difficulties when representing higher dimensional surfaces. Motivated by high throughput toxicity testing where observed dose-response curves are cross sections of a surface defined by a chemical's structural properties, a model is developed to characterize this surface to predict untested chemicals’ dose-responses. This manuscript proposes a novel approach that models the multidimensional surface as a sum of learned basis functions formed as the tensor product of lower dimensional functions, which are themselves representable by a basis expansion learned from the data. The model is described and a Gibbs sampling algorithm is proposed. The approach is investigated in a simulation study and through data taken from the US EPA's ToxCast high throughput toxicity testing platform.
Supporting Information
Additional Supporting Information may be found in the online version of this article.
Filename | Description |
---|---|
biom12942-sup-0001-SuppData-S1.pdf229 KB | Supplementary Data S1. |
biom12942-sup-0002-SuppDataCode-S1.zip894.6 KB | Supplementary Data Code S1. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Banerjee, A., Dunson, D. B., and Tokdar, S. T. (2013). Efficient gaussian process regression for large datasets. Biometrika 100, 75.
- Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98, 291–306.
- Bonilla, E. V., Chai, K. M., and Williams, C. (2007). Multi-task Gaussian process prediction. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweiss (eds), 153–160. Neural Information Processing Systems Conference. New York, NY: Curran Associates Inc.
- Breiman, L. (1996). Bagging predictors. Machine Learning 24, 123–140.
- Breiman, L. (2001). Random forests. Machine Learning 45, 5–32.
- Brockhaus, S., Scheipl, F., Hothorn, T., and Greven, S. (2015). The functional linear array model. Statistical Modelling 15, 279–300.
- Burden, F. R. (2001). Quantitative structure-activity relationship studies using Gaussian processes. Journal of Chemical Information and Computer Sciences 41, 830–835.
- Czermiński, R., Yasri, A., and Hartsough, D. (2001). Use of support vector machine in pattern classification: Application to QSAR studies. Quantitative Structure-Activity Relationships 20, 227–240.
- de Boor, C. (2001). A Practical Guide to Splines, Revised Edition, Vol. 27. New York, NY: Springer-Verlag.
- Deconinck, E., Hancock, T., Coomans, D., Massart, D., and Vander Heyden, Y. (2005). Classification of drugs in absorption classes using the classification and regression trees (CART) methodology. Journal of Pharmaceutical and Biomedical Analysis 39, 91–103.
- Delaigle, A. and Hall, P. (2013). Classification using censored functional data. Journal of the American Statistical Association 108, 1269–1283.
- Devillers, J. (1996). Neural Networks in QSAR and Drug Design. Cambridge, MA: Academic Press.
- Emmert-Streib, F., Dehmer, M., Varmuza, K., and Bonchev, D. (2012). Statistical Modelling of Molecular Descriptors in QSAR/QSPR. Hoboken, NJ: John Wiley & Sons.
- Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and Practice. New York, NY: Springer Science & Business Media.
- Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics 19, 1–67.
-
Gramacy, R. B.
(2007).
tgp: An R package for Bayesian nonstationary, semiparametric
nonlinear regression and design by treed Gaussian process models.
Journal of Statistical Software
19, 6.
10.18637/jss.v019.i09 Google Scholar
- Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association 103, 1119–1130.
- Hall, P., Poskitt, D. S., and Presnell, B. (2001). A functional data-analytic approach to signal discrimination. Technometrics 43, 1–9.
-
Higdon, D.
(2002).
Space and space-time modeling using process convolutions.
In Quantitative Methods for Current Environmental Issues, C. W. Anderson, P. C. Vic Barnett, P. C. Chatwin, and
A. H. El-Shaarawi (eds), 37–54. New York, NY: Springer.
10.1007/978-1-4471-0657-9_2 Google Scholar
- Hong, H., Xie, Q., Ge, W., Qian, F., Fang, H., Shi, L., et al. (2008). MOLD2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. Journal of Chemical Information and Modeling 48, 1337–1344.
- Judson, R. S., Houck, K. A., Kavlock, R. J., Knudsen, T. B., Martin, M. T., Mortensen, H. M., et al. (2010). In vitro screening of environmental chemicals for targeted testing prioritization: The ToxCast project. Environmental Health Perspectives 118, 485.
- Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., et al. (2016). caret: Classification and regression training R package version 6.0–73.
- Low-Kam, C., Telesca, D., Ji, Z., Zhang, H., Xia, T., Zink, J. I., et al. (2015). A Bayesian regression tree approach to identify the effect of nanoparticles properties on toxicity profiles. The Annals of Applied Statistics 9, 383–401.
- Montagna, S., Tokdar, S. T., Neelon, B., and Dunson, D. B. (2012). Bayesian latent factor regression for functional and longitudinal data. Biometrics 68, 1064–1073.
- Morris, J. S. (2015). Functional regression. Annual Review of Statistics and Its Application 2, 321–359.
- Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT press.
- Norinder, U. (2003). Support vector machine models in drug design: Applications to drug transport processes and QSAR using simplex optimisations and variable selection. Neurocomputing 55, 337–346.
- Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American Statistical Association 108, 1339–1349.
- Quiñonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research 6, 1939–1959.
- Ramsay, J. O. (2006). Functional Data Analysis. Hoboken, NJ: Wiley.
- Rasmussen, C. E. and Williams, C. K. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.
- Roy, K., Kar, S., and Das, R. N. (2015). Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment. Cambridge, MA: Academic Press.
- Scheipl, F., Staicu, A.-M., and Greven, S. (2015). Functional additive mixed models. Journal of Computational and Graphical Statistics 24, 477–501.
- Sollich, P. and Krogh, A. (1996). Learning with ensembles: How overfitting can be useful in Advances in Neural Information Processing Systems 9, M Mozer and M Jordan and T. Petsche (eds), Neural Information Systems Conference, New York, NY: Curran Associates Inc.
-
Sprechmann, P. and
Sapiro, G.
(2010).
Dictionary learning and sparse coding for unsupervised clustering.
In
2010 IEEE International Conference on Acoustics Speech and
Signal Processing (ICASSP), 2042–2045. New York, NY: IEEE.
10.1109/ICASSP.2010.5494985 Google Scholar
- Weininger, D. (1988). SMILES, a chemical language and information system: 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 28, 31–36.
-
Zhang, Q. and
Li, B.
(2010).
Discriminative K-SVD for dictionary learning in face recognition.
In 2010 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2691–2698. New York, NY: IEEE.
10.1109/CVPR.2010.5539989 Google Scholar
- Zhou, Z.-H., Wu, J., and Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence 137, 239–263.