Volume 43, Issue 5 pp. 983-1002
RESEARCH ARTICLE

An integrated Bayesian framework for multi-omics prediction and classification

Himel Mallick

Corresponding Author

Himel Mallick

Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, 10065 New York, USA

Department of Statistics and Data Science, Cornell University, Ithaca, New York, USA

Correspondence

Himel Mallick, Weill Cornell Medicine, Cornell University, New York, NY, USA.

Email: [email protected]

Erina Paul, Merck & Co., Inc., Rahway, NJ, USA.

Email: [email protected]

Search for more papers by this author
Anupreet Porwal

Anupreet Porwal

Department of Statistics, University of Washington, Seattle, Washington, USA

Search for more papers by this author
Satabdi Saha

Satabdi Saha

Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Piyali Basak

Piyali Basak

Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA

Search for more papers by this author
Vladimir Svetnik

Vladimir Svetnik

Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA

Search for more papers by this author
Erina Paul

Corresponding Author

Erina Paul

Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA

Correspondence

Himel Mallick, Weill Cornell Medicine, Cornell University, New York, NY, USA.

Email: [email protected]

Erina Paul, Merck & Co., Inc., Rahway, NJ, USA.

Email: [email protected]

Search for more papers by this author
First published: 26 December 2023
Citations: 3
Himel Mallick and Anupreet Porwal contributed equally to this study.

Abstract

With the growing commonality of multi-omics datasets, there is now increasing evidence that integrated omics profiles lead to more efficient discovery of clinically actionable biomarkers that enable better disease outcome prediction and patient stratification. Several methods exist to perform host phenotype prediction from cross-sectional, single-omics data modalities but decentralized frameworks that jointly analyze multiple time-dependent omics data to highlight the integrative and dynamic impact of repeatedly measured biomarkers are currently limited. In this article, we propose a novel Bayesian ensemble method to consolidate prediction by combining information across several longitudinal and cross-sectional omics data layers. Unlike existing frequentist paradigms, our approach enables uncertainty quantification in prediction as well as interval estimation for a variety of quantities of interest based on posterior summaries. We apply our method to four published multi-omics datasets and demonstrate that it recapitulates known biology in addition to providing novel insights while also outperforming existing methods in estimation, prediction, and uncertainty quantification. Our open-source software is publicly available at https://github.com/himelmallick/IntegratedLearner.

DATA AVAILABILITY STATEMENT

The implementation of IntegratedLearner is publicly available with source code, documentation, tutorial, and as an R/Bioconductor package at https://github.com/himelmallick/IntegratedLearner. Analysis scripts for synthetic benchmarking and real data analyses are available from the first author upon request. Previously published data used in this study are appropriately cited in the main text as well as in the References section. The detailed data summary is provided in Table 1. All processed data for the four case studies are available at https://github.com/himelmallick/IntegratedLearner.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.