Gaussian Process Based Bayesian Semiparametric Quantitative Trait Loci Interval Mapping
Hanwen Huang
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A.
Search for more papers by this authorHaibo Zhou
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A.
Search for more papers by this authorFuxia Cheng
Department of Mathematics, Illinois State University, Normal, Illinois 61790, U.S.A.
Search for more papers by this authorIna Hoeschele
Virginia Bioinformatics Institute and Department of Statistics, Virginia Tech, Blacksburg, Virginia 24061, U.S.A.
Search for more papers by this authorCorresponding Author
Fei Zou
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A.
email: [email protected]Search for more papers by this authorHanwen Huang
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A.
Search for more papers by this authorHaibo Zhou
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A.
Search for more papers by this authorFuxia Cheng
Department of Mathematics, Illinois State University, Normal, Illinois 61790, U.S.A.
Search for more papers by this authorIna Hoeschele
Virginia Bioinformatics Institute and Department of Statistics, Virginia Tech, Blacksburg, Virginia 24061, U.S.A.
Search for more papers by this authorCorresponding Author
Fei Zou
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A.
email: [email protected]Search for more papers by this authorAbstract
Summary In linkage analysis, it is often necessary to include covariates such as age or weight to increase power or avoid spurious false positive findings. However, if a covariate term in the model is specified incorrectly (e.g., a quadratic term misspecified as a linear term), then the inclusion of the covariate may adversely affect power and accuracy of the identification of quantitative trait loci (QTL). Furthermore, some covariates may interact with each other in a complicated fashion. We implement semiparametric models for single and multiple QTL mapping. Both mapping methods include an unspecified function of any covariate found or suspected to have a more complex than linear but unknown relationship with the response variable. They also allow for interactions among different covariates. This analysis is performed in a Bayesian inference framework using Markov chain Monte Carlo. The advantages of our methods are demonstrated via extensive simulations and real data analysis.
References
- Abrahamsen, P. (1997). A review of Gaussian random fields and correlation functions. Technical Report 917, Norwegian Computing Center , Oslo .
- Abramovich, F., Sapatinas, T., and Silverman, B. W. (1998). Wavelet thresholding via a Bayesian approach. Journal of the Royal Statistical Society, Series B 60, 725–749.
- Basten, C. J., Weir, B. S., and Zeng, Z. B. (1999). QTL cartographer: A reference manual and tutorial for QTL mapping. Department of Statistics, North Carolina State University .
- Best, N. C., Cowles, M. K., and Vines, S. K. (1995). CODA Manual Version 0.30. Cambridge , U.K. : MRC Biostatistics Unit.
- Broman, K. W. and Speed, T. P. (2002). A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). Journal of the Royal Statistical Society, Series B 64, 641–656, 731–775.
- Broman, K. W., Wu, H., Sen, S., and Churchill, G. A. (2003). R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890.
- Chen, H. (1988). Convergence rates for parametric components in a partly linear model. Annals of Statistics 16, 136–146.
- Cuzick, J. (1992). Semiparametric additive regression. Journal of the Royal Statistical Society, Series B 54, 831–843.
- Denison, D. G. T., Mallick, B. K., and Smith, A. F. M. (1998). Automatic Bayesian curve fitting. Journal of the Royal Statistical Society, Series B 60, 333–350.
- DiMatteo, I., Genovese, C. R., and Kass, R. (2001). Bayesian curve fitting with free-knot splines. Biometrika 88, 1055–1071.
- Doerge, R. W., Zeng, Z. B., and Weir, B. S. (1997). Statistical issues in the search for genes affecting quantitative traits in experimental populations. Statistical Science 12, 195–219.
- Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American Statistical Association 87, 998–1004.
-
Gelman, A. and
Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences.
Statistical Science
7, 457–472.
10.1214/ss/1177011136 Google Scholar
- George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association 88, 881–889.
- Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bayesian Statistics, 4, J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith (eds). Oxford , U.K. : Clarendon Press.
- Godsill, S. J. (2001). On the relationship between Markov chain Monte Carlo methods for model uncertainty. Journal of Computational and Graphical Statistics 10, 230–248.
- Godsill, S. J. (2003). Proposal densities, and product space methods. In Highly Structured Stochastic Systems. London : Oxford University Press.
- Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732.
- Härdle, W. (1990). Applied nonparametric regression. Cambridge , U.K. : Cambridge University Press.
- Hastie, T. J. and Loader, C. (1993). Local regression: Automatic kernel carpentry (with discussion). Statistical Science 8, 120–143.
- Heckman, N. (1986). Spline smoothing in a partly linear model. Journal of the Royal Statistical Society, Series B 48, 244–248.
- Hoeschele, I. (2007). Mapping quantitative trait loci in outbred pedigrees. In Handbook of Statistical Genetics, D. J. Balding, M. Bishop, and C. Cannings (eds), 477–525. New York : Wiley.
- Lan, H., Kendziorski, C. M., Haag, J. D., Shepel, L. A., Newton, M. A., and Gould, M. N. (2001). Genetic loci controlling breast cancer susceptibility in the Wistar-Kyoto rat. Genetics 157, 331–339.
- Lander, E. and Botstein, D. (1989). Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199.
- Lincoln, S. E., Daly, M. J., and Lander, E. S. (1993). A Tutorial and Reference Manual for MAPMAKER/QTL. Cambridge, Massachusetts: Whitehead Institute .
- Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sunderland , Massachusetts : Sinauer.
- MacKay, D. J. (1998). Introduction to Gaussian processes. In Neural Networks and Machine Learning, C. M. Bishop (ed). Springer : Berlin.
- Manly, K.F. and Olson, J.M. (1999). Overview of QTL mapping software and introduction to Map Manager QT. Mammalian Genome 10, 327–334.
- McQueen, M. B., Bertram, L., Rimm, E. B., Blacker, D., and Santangelo, S. L. (2003). A QTL genome scan of the metabolic syndrome and its component traits. BMC Genetics 4, S96.
- Müller, P., Erkanli, A., and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures. Biometrika 83, 67–79.
-
Neal, R. M. (1996). Bayesian learning for neural networks.
New York
: Springer-Verlag.
10.1007/978-1-4612-0745-0 Google Scholar
- Neal, R. M. (1997). Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical Report No. 9702, Department of Statistics, University of Toronto .
-
O’Hagan, A. (1978). On curve fitting and optimal design for regression.
Journal of the Royal Statistical Society, Series B
40, 1–42.
10.1111/j.2517-6161.1978.tb01643.x Google Scholar
- Reifsnyder, P. C., Churchill, G. A., and Leiter, E. H. (2000). Maternal environment and genotype interact to establish diabesity in mice. Genome Research 10, 1568–1578.
- Richards, P. J. (1959). A flexible growth function for empirical use. Journal of Experimental Botany 10, 290–300.
- Satagopan, J. M., Yandell, B. S., Newton, M. A., and Osborn, T. C. (1996). A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144, 805–816.
- Sillanpää, M. J. and Arjas, E. (1998). Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148, 1373–1388.
- Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75, 317–343.
- Speckman, P. (1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society, Series B 50, 413–436.
- Stephens, D. A. and Fisch, R. D. (1998). Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54, 1334–1347.
- Stoehr, J. P., Nadler, S. T., Schueler, K. L., Rabaglia, M. E., Yandell, B. S., Metz, S. A., and Attie, A. D. (2000). Genetic obesity unmasks nonlinear interactions between murine type 2 diabetes susceptibility loci. Diabetes 49, 1946–1954.
- Stylianou, I. M., Korstanje, R. Li, R., Sheehan, S., Paigen, B., and Churchill, G. A. (2006). Quantitative trait locus analysis for obesity reveals multiple networks of interacting loci. Mammalian Genome 17, 22–36.
- Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B 58, 267–288.
- Ven, R. V. (2004). Reversible-Jump Markov chain Monte Carlo for quantitative trait loci mapping. Genetics 167, 1033–1035.
- Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society, Series B 40, 364–372.
- Wahba, G. (1984). Cross validated spline methods for the estimation of multivariate functions from data on functionals. In Statistics: An Appraisal, Proceedings 50th Anniversary Conference Iowa State Statistical Laboratory, H. A. David and H. T. David (eds), 205–235. Ames , Iowa : Iowa State University Press.
-
Wand, M. P. and
Jones, M. C. (1995). Kernel Smoothing.
London
: Chapman and Hall.
10.1007/978-1-4899-4493-1 Google Scholar
- Wang, H., Zhang, Y. M., Li, X., Masinde, G. L., Mohan, S., Baylink, D. J., and Xu, S. (2005). Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170, 465–480.
- Yi, N. (2004). A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167, 967–975.
- Yi, N. and Xu, S. (2000). Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155, 1391–1403.
- Yi, N. and Xu, S. (2001). Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics 157, 1759–1771.
- Yi, N. and Xu, S. (2008). Bayesian LASSO for quantitative trait loci mapping. Genetics 179, 1045–1055.
- Yi, N., Shriner, D., Banerjee, S., Mehta, T., Pomp, D., and Yandell, B. S. (2007). An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics 176, 1865–1877.