Volume 41, Issue 25 pp. 5150-5187
TUTORIAL IN BIOSTATISTICS

Gaussian graphical models with applications to omics analyses

Katherine H. Shutta

Corresponding Author

Katherine H. Shutta

Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA

Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA

Correspondence Katherine H. Shutta, Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave., Boston, MA 02115, USA.

Email: [email protected]

Search for more papers by this author
Roberta De Vito

Roberta De Vito

Department of Biostatistics and Data Science Initiative, Brown University, Providence, Rhode Island, USA

Search for more papers by this author
Denise M. Scholtens

Denise M. Scholtens

Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA

Search for more papers by this author
Raji Balasubramanian

Raji Balasubramanian

Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA

Search for more papers by this author
First published: 26 September 2022
Citations: 4
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Funding information: U.S. National Library of Medicine, R01LM013444-01

Abstract

Gaussian graphical models (GGMs) provide a framework for modeling conditional dependencies in multivariate data. In this tutorial, we provide an overview of GGM theory and a demonstration of various GGM tools in R. The mathematical foundations of GGMs are introduced with the goal of enabling the researcher to draw practical conclusions by interpreting model results. Background literature is presented, emphasizing methods recently developed for high-dimensional applications such as genomics, proteomics, or metabolomics. The application of these methods is illustrated using a publicly available dataset of gene expression profiles from 578 participants with ovarian cancer in The Cancer Genome Atlas. Stand-alone code for the demonstration is available as an RMarkdown file at https://github.com/katehoffshutta/ggmTutorial.

DATA AVAILABILITY STATEMENT

The data used in this work are publicly available in the R package curatedOvarianData13 available for download from Bioconductor at https://bioconductor.org/packages/release/data/experiment/html/curatedOvarianData.html.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.