Bayesian nonlinear model selection for gene regulatory networks
Yang Ni
Department of Statistics, Rice University, Houston, Texas, U.S.A.
Search for more papers by this authorCorresponding Author
Francesco C. Stingo
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A.
email: [email protected]Search for more papers by this authorVeerabhadran Baladandayuthapani
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A.
Search for more papers by this authorYang Ni
Department of Statistics, Rice University, Houston, Texas, U.S.A.
Search for more papers by this authorCorresponding Author
Francesco C. Stingo
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A.
email: [email protected]Search for more papers by this authorVeerabhadran Baladandayuthapani
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A.
Search for more papers by this authorSummary
Gene regulatory networks represent the regulatory relationships between genes and their products and are important for exploring and defining the underlying biological processes of cellular systems. We develop a novel framework to recover the structure of nonlinear gene regulatory networks using semiparametric spline-based directed acyclic graphical models. Our use of splines allows the model to have both flexibility in capturing nonlinear dependencies as well as control of overfitting via shrinkage, using mixed model representations of penalized splines. We propose a novel discrete mixture prior on the smoothing parameter of the splines that allows for simultaneous selection of both linear and nonlinear functional relationships as well as inducing sparsity in the edge selection. Using simulation studies, we demonstrate the superior performance of our methods in comparison with several existing approaches in terms of network reconstruction and functional selection. We apply our methods to a gene expression dataset in glioblastoma multiforme, which reveals several interesting and biologically relevant nonlinear relationships.
Supporting Information
Additional Supporting Information may be found in the online version of this article.
Filename | Description |
---|---|
biom12309-sup-0001-SuppData.pdf1.4 MB | Supplementary Materials. |
biom12309-sup-0001-SuppDataCode.zip394.7 KB | Supplementary Materials Code. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Allen, J. D., Xie, Y., Chen, M., Girard, L., and Xiao, G. (2012). Comparing statistical methods for constructing large scale gene networks. PLoS ONE 7, e29348.
- Altomare, D., Consonni, G., and La Rocca, L. (2013). Objective Bayesian search of Gaussian directed acyclic graphical models for ordered variables with non-local priors. Biometrics 69, 478–487.
- Baladandayuthapani, V., Mallick, B. K., and Carroll, R. J. (2005). Spatially adaptive Bayesian penalized regression splines (P-splines). Journal of Computational and Graphical Statistics 14, 378–394.
- Blume-Jensen, P. and Hunter, T. (2001). Oncogenic kinase signalling. Nature 411, 355–365.
- Cerami, E., Demir, E., Schultz, N., Taylor, B. S., and Sander, C. (2010). Automated network analysis identifies core pathways in glioblastoma. PLoS ONE 5, e8918.
- Edelman, E. J., Guinney, J., Chi, J.-T., Febbo, P. G., and Mukherjee, S. (2008). Modeling cancer progression via pathway dependencies. PLoS Computational Biology 4, e28.
- Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11, 89–121.
- Friedman, N. and Koller, D. (2003). Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine Learning 50, 95–125.
- Friedman, N., Linial, M., Nachman, I., and Pe'er, D. (2000). Using Bayesian networks to analyze expression data. Journal of Computational Biology 7, 601–620.
- Fu, F. and Zhou, Q. (2013). Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. Journal of the American Statistical Association 108, 288–300.
- Furnari, F. B., Fenton, T., Bachoo, R. M., Mukasa, A., Stommel, J. M., Stegh, A., et al. (2007). Malignant astrocytic glioma: Genetics, biology, and paths to treatment. Genes and Development 21, 2683–2710.
-
Gelman, A. and
Rubin, D. B.
(1992).
Inference from iterative simulation using multiple sequences.
Statistical Science
7, 457–472.
10.1214/ss/1177011136 Google Scholar
- George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association 88, 881–889.
- Kitano, H. (2002). Computational Systems Biology. Nature 420, 206–210.
- Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics 13, 183–212.
- Li, F., Yang, Y., and Xing, E. (2006). Inferring Regulatory Networks Using a Hierarchical Bayesian Graphical Gaussian Model. CMU, Machine Learning Department.
- Malumbres, M. and Barbacid, M. (2003). Ras oncogenes: The first 30 years. Nature Reviews Cancer 3, 459–465.
- Marin, J.-M. and Robert, C. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer Texts in Statistics. New York: Springer.
- Meier, L., Van de Geer, S., and Bühlmann, P. (2009). High-dimensional additive modeling. The Annals of Statistics 37, 3779–3821.
- Morrissey, E. R., Juarez, M. A., Denby, K. J., and Burroughs, N. J. (2011). Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression. Biostatistics 12, 682–694.
- Peterson, C., Stingo, F., and Vannucci, M. (2015). Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association.
- Peterson, C., Vannucci, M., Karakas, C., Choi, W., Ma, L., and Maletić-Savatić, M. (2013). Inferring metabolic networks using the Bayesian adaptive graphical lasso with informative priors. Statistics and Its Interface 6, 547.
- Ravikumar, P., Lafferty, J., Liu, H., and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society, Series B 71, 1009–1030.
- Reich, B. J., Storlie, C. B., and Bondell, H. D. (2009). Variable selection in Bayesian smoothing spline ANOVA models: Application to deterministic computer codes. Technometrics 51, 110–120.
- Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., Mc Vean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M., and Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science 334, 1518.
-
Ruppert, D.,
Wand, M. P., and
Carroll, R. J.
(2003).
Semiparametric Regression. Cambridge, UK: Cambridge University Press.
10.1017/CBO9780511755453 Google Scholar
- Scheipl, F., Fahrmeir, L., and Kneib, T. (2012). Spike-and-slab priors for function selection in structured additive regression models. Journal of the American Statistical Association 107, 1518–1532.
- Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics 38, 2587–2619.
- Shojaie, A. and Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97, 519–538.
- Stingo, F. C., Chen, Y. A., Vannucci, M., Barrier, M., and Mirkes, P. E. (2010). A Bayesian graphical modeling approach to microRNA regulatory network inference. The Annals of Applied Statistics 4, 2024–2048.
- TCGA (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068.
- Verhaak, R. G. W., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98–110.
- Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research 11, 3571–3594.
- Werhli, A. V., Grzegorczyk, M., and Husmeier, D. (2006). Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics 22, 2523–2531.
- Whittaker, J. (2009). Graphical Models in Applied Multivariate Statistics. New York: Wiley Publishing.