Volume 29, Issue 19 pp. 1998-2011
Research Article

Discovery of complex pathways from observational data

James W. Baurley

Corresponding Author

James W. Baurley

Department of Preventive Medicine, University of Southern California, Los Angeles, CA, U.S.A.

Department of Preventive Medicine, University of Southern California, Los Angeles, CA, U.S.A.Search for more papers by this author
David V. Conti

David V. Conti

Department of Preventive Medicine, University of Southern California, Los Angeles, CA, U.S.A.

Search for more papers by this author
W. James Gauderman

W. James Gauderman

Department of Preventive Medicine, University of Southern California, Los Angeles, CA, U.S.A.

Search for more papers by this author
Duncan C. Thomas

Duncan C. Thomas

Department of Preventive Medicine, University of Southern California, Los Angeles, CA, U.S.A.

Search for more papers by this author
First published: 29 July 2010
Citations: 29

Abstract

Unraveling complex interactions has been a challenge in epidemiologic research. We introduce a pathway modeling framework that discovers plausible pathways from observational data, and allows estimation of both the net effect of the pathway and the types of interactions occurring among genetic or environmental risk factors. Each discovered pathway structure links combinations of observed variables through intermediate latent nodes to a final node, the outcome. Biologic knowledge can be readily applied in this framework as a prior on pathway structure to give preference to more biologically plausible models, thereby providing more precise estimation of Bayes factors for pathways of greatest interest by Markov Chain Monte Carlo (MCMC) methods.

Data were simulated for binary inputs of which only a subset was involved in different pathway topologies. Our algorithm was then used to recover the pathway from the simulated data. The posterior distributions of inputs, pairwise and higher-order interactions, and topologies were obtained by MCMC methods. The evidence in favor of a particular pathway or interaction was summarized using Bayes factors. Our method can correctly identify the risk factors and interactions involved in the simulated pathway. We apply our framework to an asthma case–control data set with polymorphisms in 12 genes. Copyright © 2010 John Wiley & Sons, Ltd.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.