Many modern datasets are sampled with error from complex high-dimensional surfaces. Methods such as tensor product splines or Gaussian processes are effective and well suited for characterizing a surface in two or three dimensions, but they may suffer from difficulties when representing higher dimensional surfaces. Motivated by high throughput toxicity testing where observed dose-response curves are cross sections of a surface defined by a chemical's structural properties, a model is developed to characterize this surface to predict untested chemicals’ dose-responses. This manuscript proposes a novel approach that models the multidimensional surface as a sum of learned basis functions formed as the tensor product of lower dimensional functions, which are themselves representable by a basis expansion learned from the data. The model is described and a Gibbs sampling algorithm is proposed. The approach is investigated in a simulation study and through data taken from the US EPA's ToxCast high throughput toxicity testing platform.

Supporting Information

References

Banerjee, A., Dunson, D. B., and Tokdar, S. T. (2013). Efficient gaussian process regression for large datasets. Biometrika 100, 75.
10.1093/biomet/ass068
PubMed Web of Science® Google Scholar
Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98, 291–306.
10.1093/biomet/asr013
CAS PubMed Web of Science® Google Scholar
Bonilla, E. V., Chai, K. M., and Williams, C. (2007). Multi-task Gaussian process prediction. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweiss (eds), 153–160. Neural Information Processing Systems Conference. New York, NY: Curran Associates Inc.
Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning 24, 123–140.
10.1023/A:1018054314350
Web of Science® Google Scholar
Breiman, L. (2001). Random forests. Machine Learning 45, 5–32.
10.1023/A:1010933404324
Web of Science® Google Scholar
Brockhaus, S., Scheipl, F., Hothorn, T., and Greven, S. (2015). The functional linear array model. Statistical Modelling 15, 279–300.
10.1177/1471082X14566913
Web of Science® Google Scholar
Burden, F. R. (2001). Quantitative structure-activity relationship studies using Gaussian processes. Journal of Chemical Information and Computer Sciences 41, 830–835.
10.1021/ci000459c
CAS PubMed Web of Science® Google Scholar
Czermiński, R., Yasri, A., and Hartsough, D. (2001). Use of support vector machine in pattern classification: Application to QSAR studies. Quantitative Structure-Activity Relationships 20, 227–240.
10.1002/1521-3838(200110)20:3<227::AID-QSAR227>3.0.CO;2-Y
CAS Web of Science® Google Scholar
de Boor, C. (2001). A Practical Guide to Splines, Revised Edition, Vol. 27. New York, NY: Springer-Verlag.
Google Scholar
Deconinck, E., Hancock, T., Coomans, D., Massart, D., and Vander Heyden, Y. (2005). Classification of drugs in absorption classes using the classification and regression trees (CART) methodology. Journal of Pharmaceutical and Biomedical Analysis 39, 91–103.
10.1016/j.jpba.2005.03.008
CAS PubMed Web of Science® Google Scholar
Delaigle, A. and Hall, P. (2013). Classification using censored functional data. Journal of the American Statistical Association 108, 1269–1283.
10.1080/01621459.2013.824893
CAS Web of Science® Google Scholar
Devillers, J. (1996). Neural Networks in QSAR and Drug Design. Cambridge, MA: Academic Press.
Google Scholar
Emmert-Streib, F., Dehmer, M., Varmuza, K., and Bonchev, D. (2012). Statistical Modelling of Molecular Descriptors in QSAR/QSPR. Hoboken, NJ: John Wiley & Sons.
Google Scholar
Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and Practice. New York, NY: Springer Science & Business Media.
Web of Science® Google Scholar
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics 19, 1–67.
10.1214/aos/1176347963
Web of Science® Google Scholar
Gramacy, R. B. (2007). tgp: An R package for Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian process models. Journal of Statistical Software 19, 6.
10.18637/jss.v019.i09
Google Scholar
Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association 103, 1119–1130.
10.1198/016214508000000689
CAS Web of Science® Google Scholar
Hall, P., Poskitt, D. S., and Presnell, B. (2001). A functional data-analytic approach to signal discrimination. Technometrics 43, 1–9.
10.1198/00401700152404273
Web of Science® Google Scholar
Higdon, D. (2002). Space and space-time modeling using process convolutions. In Quantitative Methods for Current Environmental Issues, C. W. Anderson, P. C. Vic Barnett, P. C. Chatwin, and A. H. El-Shaarawi (eds), 37–54. New York, NY: Springer.
10.1007/978-1-4471-0657-9_2
Google Scholar
Hong, H., Xie, Q., Ge, W., Qian, F., Fang, H., Shi, L., et al. (2008). MOLD², molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. Journal of Chemical Information and Modeling 48, 1337–1344.
10.1021/ci800038f
CAS PubMed Web of Science® Google Scholar
Judson, R. S., Houck, K. A., Kavlock, R. J., Knudsen, T. B., Martin, M. T., Mortensen, H. M., et al. (2010). In vitro screening of environmental chemicals for targeted testing prioritization: The ToxCast project. Environmental Health Perspectives 118, 485.
10.1289/ehp.0901392
CAS PubMed Web of Science® Google Scholar
Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., et al. (2016). caret: Classification and regression training R package version 6.0–73.
Google Scholar
Low-Kam, C., Telesca, D., Ji, Z., Zhang, H., Xia, T., Zink, J. I., et al. (2015). A Bayesian regression tree approach to identify the effect of nanoparticles properties on toxicity profiles. The Annals of Applied Statistics 9, 383–401.
10.1214/14-AOAS797
Web of Science® Google Scholar
Montagna, S., Tokdar, S. T., Neelon, B., and Dunson, D. B. (2012). Bayesian latent factor regression for functional and longitudinal data. Biometrics 68, 1064–1073.
10.1111/j.1541-0420.2012.01788.x
CAS PubMed Web of Science® Google Scholar
Morris, J. S. (2015). Functional regression. Annual Review of Statistics and Its Application 2, 321–359.
10.1146/annurev-statistics-010814-020413
Web of Science® Google Scholar
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT press.
Google Scholar
Norinder, U. (2003). Support vector machine models in drug design: Applications to drug transport processes and QSAR using simplex optimisations and variable selection. Neurocomputing 55, 337–346.
10.1016/S0925-2312(03)00374-6
Web of Science® Google Scholar
Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American Statistical Association 108, 1339–1349.
10.1080/01621459.2013.829001
CAS Web of Science® Google Scholar
Quiñonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research 6, 1939–1959.
Web of Science® Google Scholar
Ramsay, J. O. (2006). Functional Data Analysis. Hoboken, NJ: Wiley.
Google Scholar
Rasmussen, C. E. and Williams, C. K. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.
Google Scholar
Roy, K., Kar, S., and Das, R. N. (2015). Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment. Cambridge, MA: Academic Press.
Google Scholar
Scheipl, F., Staicu, A.-M., and Greven, S. (2015). Functional additive mixed models. Journal of Computational and Graphical Statistics 24, 477–501.
10.1080/10618600.2014.901914
PubMed Web of Science® Google Scholar
Sollich, P. and Krogh, A. (1996). Learning with ensembles: How overfitting can be useful in Advances in Neural Information Processing Systems 9, M Mozer and M Jordan and T. Petsche (eds), Neural Information Systems Conference, New York, NY: Curran Associates Inc.
Google Scholar
Sprechmann, P. and Sapiro, G. (2010). Dictionary learning and sparse coding for unsupervised clustering. In 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2042–2045. New York, NY: IEEE.
10.1109/ICASSP.2010.5494985
Google Scholar
Weininger, D. (1988). SMILES, a chemical language and information system: 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 28, 31–36.
10.1021/ci00057a005
CAS Web of Science® Google Scholar
Zhang, Q. and Li, B. (2010). Discriminative K-SVD for dictionary learning in face recognition. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2691–2698. New York, NY: IEEE.
10.1109/CVPR.2010.5539989
Google Scholar
Zhou, Z.-H., Wu, J., and Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence 137, 239–263.
10.1016/S0004-3702(02)00190-X
Web of Science® Google Scholar

Citing Literature

Volume75, Issue1

March 2019

Pages 193-201

Filename	Description
biom12942-sup-0001-SuppData-S1.pdf229 KB	Supplementary Data S1.
biom12942-sup-0002-SuppDataCode-S1.zip894.6 KB	Supplementary Data Code S1.

Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high-throughput toxicity testing

Summary

Supporting Information

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high-throughput toxicity testing

Summary

Supporting Information

References

Citing Literature

References

Related

Information