Volume 37, Issue 8 pp. 759-767

Research Article

Full Access

A General Framework for Association Tests With Multivariate Traits in Large-Scale Genomics Studies

Qianchuan He,

Qianchuan He

Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America

Search for more papers by this author

Christy L. Avery,

Christy L. Avery

Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America

Search for more papers by this author

Dan-Yu Lin,

Corresponding Author

Dan-Yu Lin

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States of America

Correspondence to: Dr. Dan-Yu Lin, Department of Biostatistics, University of North Carolina, McGavran-Greenberg Hall, CB #7420, Chapel Hill, NC 27599-7420, USA. E-mail: [email protected]Search for more papers by this author

Qianchuan He,

Qianchuan He

Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America

Search for more papers by this author

Christy L. Avery,

Christy L. Avery

Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America

Search for more papers by this author

Dan-Yu Lin,

Corresponding Author

Dan-Yu Lin

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States of America

First published: 05 November 2013

https://doi.org/10.1002/gepi.21759

Citations: 35

Share a link

Email
Wechat
Bluesky

ABSTRACT

Genetic association studies often collect data on multiple traits that are correlated. Discovery of genetic variants influencing multiple traits can lead to better understanding of the etiology of complex human diseases. Conventional univariate association tests may miss variants that have weak or moderate effects on individual traits. We propose several multivariate test statistics to complement univariate tests. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. We relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without modeling the dependence among the traits or family members. We construct score-type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We compare the power of the test statistics both theoretically and empirically. We provide a strategy to determine genome-wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. The application of the new methods to the meta-analysis of five major cardiovascular cohort studies identifies a new locus (HSCB) that is pleiotropic for the four traits analyzed.

Introduction

Pleiotropy, the influence of one gene on multiple traits, is a widespread phenomenon in complex human diseases [Sivakumaran et al., 2011]. Recent years have seen a heightened interest in discovering genetic variants with pleiotropic effects [Gottesman et al., 2012; Lawson et al., 2011; Paaby and Rockman, 2012; Watanabe et al., 2000]. The joint analysis of multiple traits can increase statistical power by aggregating multiple weak effects and provide new biological insights by revealing pleiotropic variants [Amos and Laing, 1993; Jiang and Zeng, 1995].

The advent of large-scale genetic association studies, particularly genome-wide association studies (GWAS), poses tremendous challenges in analyzing multiple traits. First, there are a huge number of genetic variants to be tested, which may entail considerable computation burden. The inclusion of covariates (e.g., ancestry variables to account for population stratification) may make the computation even more intensive. Second, complex diseases are characterized by a wide variety of traits, some of which are continuous (i.e., quantitative) and some of which are discrete. Third, it is desirable to combine results from multiple studies, some of which may consist of unrelated individuals and some of which may consist of families; the genetic variants and the traits may not be uniformly measured in all studies. Fourth, it is necessary to adjust for multiple testing, but the conventional Bonferroni correction may be overly conservative.

There exist several statistical methods for association analysis of multiple traits, but none of them addresses all the above issues. Ferreira and Purcell [2009] suggested canonical correlation analysis, which is computationally fast but does not accommodate covariates. Liu et al. [2009] suggested a Wald statistic based on generalized estimating equations (GEE) [Liang and Zeger, 1986] for the mixture of one continuous trait and one binary trait. Their method does not accommodate family data, and the Wald statistic requires fitting a regression model for each genetic variant, which can be time consuming. Yang et al. [2010] suggested a linear combination of univariate test statistics with data-dependent weights by estimating the weights from part of the data and calculating the test statistic from the remaining data. The P-values are assessed by permutation, which is computationally demanding. Maity et al. [2012] proposed a kernel machine method for joint analysis of multiple genetic variants, which is equivalent to testing the variance component in a multivariate linear mixed model. Recently, van der Sluis et al. [2013] proposed a method called “trait-based association test that uses extended Simes procedure” (TATES). The Simes procedure was originally designed to alleviate the conservativeness of the Bonferroni correction; the TATES extends the Simes procedure to the multivariate-trait analysis by harnessing the correlations among the traits.

In this paper, we provide a very general framework for association analysis of multiple traits, which simultaneously tackles all the aforementioned challenges. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. To enhance robustness, we relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without parametric modeling of the dependence among the traits or family members; we account for the dependence in constructing the test statistics by estimating the correlations empirically from the data. We develop score-type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We consider various types of multivariate test statistics and compare their power both theoretically and empirically. We provide a strategy to determine genome-wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. We demonstrate the usefulness of the new methods through extensive simulation studies and an application to five GWAS studies involving cardiovascular traits.

Methods

In this section, we present our general framework for association tests with multivariate traits. We first construct the marginal models and the corresponding score-type statistics. We then show how to combine those statistics to form multivariate test statistics. Finally, we discuss meta-analysis and genome-wide significance thresholds.

Calculating Score Statistics and Their Covariance Matrix

We consider a single study with a total of n unrelated individuals, K (potentially correlated) traits, and p covariates (including the unit component). For $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0001$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0002$ , let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0003$ be the kth trait of the ith individual. For $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0004$ , let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0005$ be the p-vector of covariates for the ith individual, and let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0006$ denote the number of minor alleles (or the imputed dosage) the ith individual carries at a particular test locus.

We assume that the marginal distribution of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0007$ is related to $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0008$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0009$ through a generalized linear model with mean $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0010$ and dispersion parameter $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0011$ , where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0012$ is a specific function, and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0013$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0014$ are unknown regression parameters. We adopt natural link functions such that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0015$ for continuous traits and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0016$ for binary traits.

To accommodate missing data, we let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0017$ indicate, by the values 1 versus 0, whether $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0018$ is observed or missing, and let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0019$ indicate, by the values 1 versus 0, whether $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0020$ is observed or missing. It is assumed that the covariates have no missing values. (We recommend to exclude the covariates with substantial missingness and to replace the missing values with their sample means for the remaining covariates.)

The score function for $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0021$ takes the form

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0022$

Thus, the score statistic for testing the null hypothesis that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0023$ is

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0024$

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0025$ solves the equation

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0026$

and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0027$ is a sample estimator of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0028$ . For continuous traits,

for binary traits, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0030$ . Note that the construction of the $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0031$ s makes full use of the available data by estimating $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0032$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0033$ from all individuals with nonmissing trait values and is more efficient than the traditional complete-case analysis.

By taking the Taylor series expansion of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0034$ at $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0035$ and applying the law of large numbers, we can show that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0036$ is asymptotically equivalent to the following sum of n-independent terms:

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0037$

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0038$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0039$ are the limits of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0040$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0041$ , respectively, and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0042$ . Define the score vector

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0043$

It follows from the multivariate central limit theorem that U is asymptotically K-variate normal with mean 0 and with a covariance matrix that can be estimated by

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0044$

where

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0045$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0046$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0047$

and

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0048$

Note that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0049$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0050$ $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0051$ do not depend on the SNP genotypes and thus need to be calculated only once (before looping through all the SNPs). Note also that, given the $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0052$ s, the calculations of U and V do not involve solving any equations. Thus, the implementation of the proposed score-type statistics is orders of magnitude faster than that of the conventional Wald statistics. In addition, the score-type statistics are numerically more stable and statistically more accurate than the Wald statistics, especially when the minor allele frequency (MAF) is low [Lin and Tang, 2011].

We now extend the above results to family studies. Suppose that we have a total of n families, with $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0053$ members in the ith family. For $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0054$ , $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0055$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0056$ , let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0057$ denote the kth trait for the jth member of the ith family, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0058$ denote the p-vector of covariates for the jth member of the ith family, and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0059$ denote the number of minor alleles (or the imputed dosage) which the jth member of the ith family carries at a particular test locus. We assume that the marginal distribution of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0060$ is related to $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0061$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0062$ through the same marginal generalized linear regression model as in the case of unrelated individuals.

Let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0063$ indicate whether $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0064$ is observed or missing, and let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0065$ indicate whether $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0066$ is observed or missing. It is assumed that there are no missing values in the covariates. Under the independence working assumption [Liang and Zeger, 1986], the (pseudo-likelihood) score statistic for testing the null hypothesis that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0067$ is

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0068$

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0069$ solves the equation

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0070$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0071$ for continuous traits, and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0072$ for binary traits. Again, define $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0073$ . It follows from the above arguments for the case of unrelated individuals that U is asymptotically K-variate normal with mean 0 and a covariance matrix that can be estimated by $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0074$ , where

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0075$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0077$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0078$

Note that the relatedness of family members is accounted for through the empirical correlations of the $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0079$ s.

Performing Multivariate Association Tests

To test the global null hypothesis that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0080$ , we calculate the quadratic form

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0081$

which is asymptotically chi-squared with K degrees of freedom. This is a global test statistic that is consistent (i.e., having the power of 1 as the sample size tends to ∞) against any alternative hypotheses.

To enhance power against alternative hypotheses under which genetic effects are similar among the K studies, we calculate a test statistic with one degree of freedom along the lines of O'Brien [1984]. Specifically, let Z be the standardized version of U and let R be the correlation matrix of U. That is, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0082$ $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0083$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0084$ $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0085$ . We then calculate

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0086$

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0087$ . This test statistic is asymptotically standard normal.

The test statistic T maximizes the noncentrality parameter among all linear combinations of the $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0088$ s. Note that the score test statistic $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0089$ is asymptotically equivalent to the Wald test statistic, i.e., the estimate of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0090$ divided by its standard error. Thus, T is optimal if the limits of the $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0091$ s or the standardized $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0092$ s are the same. To detect alternative hypotheses under which the original $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0093$ s are the same, we define $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0094$ $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0095$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0096$ $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0097$ . Note that the limit of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0098$ is approximately $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0099$ . We then calculate

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0100$

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0101$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0102$ . This test statistic is also asymptotically standard normal. When using this test statistic, it is important to use comparable scales for the K traits such that it is plausible for the $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0103$ to be equal. When using either T or $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0104$ , it is important to code the trait values in such a way that the genetic effects on the K traits are plausibly in the same direction.

If the effects of the SNP are similar among the K traits, then T and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0105$ will tend to be more powerful than Q. If the effects are very different, then Q will likely be more powerful than T and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0106$ . In the Appendix, we derive the asymptotic distributions of Q, T, and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0107$ under alternative hypotheses for the important special case of two continuous traits.

Combining Results From Multiple Studies

We wish to combine results from L-independent studies. For $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0108$ , let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0109$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0110$ denote the score vector and its (estimated) covariance matrix from the lth study. Then the overall score vector is $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0111$ , and its covariance matrix is estimated by $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0112$ . Note that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0113$ is the (pseudo-likelihood) score statistic in the joint analysis of the individual-level data of the L studies (allowing nuisance parameters to be different among the studies). Thus, meta-analysis of score statistics is equivalent to the joint analysis of individual-level data. When there are multiple studies, K pertains to the total number of distinct traits, some of which may not be measured in certain studies. (For $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0114$ , we may have four traits that are common between the two studies, two traits that are measured only in the first study, and three traits that are measured only in the second study. Then $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0115$ .) Given U and V, we can calculate Q, T, and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0116$ in the same manner as in the case of a single study.

Determining Genome-Wide Significance

Suppose that we have a total of m SNPs. For $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0117$ , let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0118$ be the value of Q for the jth SNP. If the critical value q₀ for the m test statistics satisfies

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0119$

then the family-wise error rate will be α. We estimate q₀ by Monte Carlo simulation. At each test locus, we calculate

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0121$ are independent standard normal random variables. Let $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0122$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0123$ be the values of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0124$ and V for the jth SNP. Define

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0125$

The joint distribution of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0126$ can be approximated by that of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0127$ [Lin, 2005]. Thus, we determine q₀ by the following equation

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0128$

We simulate the normal random sample $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0129$ 10,000 times while holding the observed data fixed and set q₀ to be the 10,000 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0130$ th largest value of the resulting $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0131$ 's. We may convert the critical value q₀ to the P-value threshold p₀ by referring q₀ to the chi-squared distribution with K degrees of freedom. We can determine the genome-wide significance thresholds for T, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0132$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0133$ $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0134$ in a similar manner.

Results

Simulation Studies

We conducted simulation studies to evaluate the performance of the proposed test statistics. We set G to be the number of minor alleles for a SNP with MAF of 0.4 and set X to be normal with mean 0.1G and unit variance. We generated two continuous traits under the bivariate linear model: $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0135$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0136$ where ε₁ and ε₂ are zero-mean normal with variances $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0137$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0138$ , respectively, and with correlation $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0139$ . We set α to 10⁻⁴. To evaluate the type I error, we simulated 10 million data sets under $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0140$ . To evaluate the power, we simulated 10,000 data sets under various combinations of γ₁ and γ₂. Each simulated data set consists of 1,000 unrelated individuals. In addition to the three multivariate test statistics, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0141$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0142$ , we considered two versions of univariate tests, Uni-B and Uni-corr, which adjust for multiple testing (between the traits) by adopting the Bonferroni correction (i.e., dividing the nominal significance level by the number of traits) and by accounting for the correlation between Z₁ and Z₂ (using the multivariate normal distribution of U), respectively. We also included the TATES method (van der Sluis et al., 2013).

The results are summarized in Table 1. The type I error for the TATES is inflated by about 12%. The type I errors for the other five tests are below the nominal significance level. The Q test has reasonable power against all 12 alternatives. As expected, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0143$ is more powerful than the other tests when γ₁ is close to γ₂, and T is more powerful than the others when $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0144$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0145$ are close to each other. The Uni-B is expected to have lower power than Uni-corr, but the two tests perform very similarly due to the relatively weak correlation between the two traits. The differences between the two tests become more pronounced as the correlation increases; see supplementary Table SI. The TATES has slightly higher power than Uni-B and Uni-corr but also has inflated type I error, especially when the correlation is high.

Table 1. Type I error and power of univariate versus multivariate tests for two continuous traits ( $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0146$ )

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0147$	$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0148$	Uni-B	Uni-corr	TATES	Q	T	$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0149$
(0, 0)	(0, 0)	8.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0150$	8.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0151$	1.12 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0152$	9.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0153$	9.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0154$	8.8 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0155$
(0.3, 0)	(0.21, 0)	0.6861	0.6865	0.714	0.854	0.095	0.003
(0.3, 0.1)	(0.21, 0.1)	0.6865	0.6869	0.715	0.638	0.487	0.187
(0.25, 0.18)	(0.18, 0.18)	0.5714	0.5723	0.606	0.594	0.700	0.641
(0.3, 0.25)	(0.21, 0.25)	0.9358	0.936	0.945	0.942	0.967	0.966
(0.2, 0.2)	(0.14, 0.2)	0.6222	0.6229	0.651	0.594	0.632	0.702
(0.2, 0.25)	(0.14, 0.25)	0.9054	0.9056	0.917	0.881	0.828	0.922
(0.25, 0.25)	(0.18, 0.25)	0.9128	0.913	0.925	0.906	0.917	0.947
(0, 0.25)	(0, 0.25)	0.9034	0.9036	0.915	0.977	0.197	0.701
(0, 0.3)	(0, 0.3)	0.9902	0.9903	0.992	0.999	0.402	0.914
(0.1, 0.25)	(0.07, 0.25)	0.9034	0.9036	0.915	0.907	0.523	0.841
(0.1, 0.3)	(0.07, 0.3)	0.9902	0.9903	0.992	0.994	0.742	0.968
(0.2, 0.3)	(0.14, 0.3)	0.9903	0.9904	0.992	0.987	0.940	0.990

We also considered the mixture of a binary trait and a continuous trait. We simulated the binary trait under the logistic regression model $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0156$ and simulated the continuous trait under the linear model $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0157$ , where ε is normal with mean 2Y₁ and unit variance. (The Pearson correlation between the two traits is about 0.65.) As shown in Table 2, Q tends to have higher power than the two univariate tests. As expected, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0158$ is more powerful than the other tests when $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0159$ , and T outperforms the others when the means of Z₁ and Z₂ are similar. Again, the TATES has higher power than Uni-B and Uni-corr but at the expense of inflated type I error.

Table 2. Type I error and power of univariate versus multivariate tests for one binary and one continuous traits

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0160$	$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0161$ *	Uni-B	Uni-corr	TATES	Q	T	$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0162$
(0, 0)	—	8.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0163$	9.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0164$	1.02 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0165$	8.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0166$	9.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0167$	9.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0168$
(0.3, 0)	(3.07, 2.09)	0.171	0.173	0.171	0.141	0.141	0.026
(0.3, 0.1)	(3.07, 3.65)	0.377	0.380	0.398	0.334	0.410	0.382
(0.25, 0.18)	(2.54, 4.52)	0.685	0.687	0.709	0.664	0.488	0.758
(0.3, 0.25)	(3.07, 5.88)	0.973	0.973	0.977	0.975	0.845	0.986
(0.2, 0.2)	(2.02, 4.49)	0.676	0.678	0.700	0.709	0.372	0.764
(0.2, 0.25)	(2.02, 5.24)	0.892	0.893	0.907	0.938	0.534	0.937
(0.25, 0.25)	(2.54, 5.56)	0.943	0.944	0.952	0.958	0.707	0.967
(0, 0.25)	(0.02, 4.02)	0.489	0.493	0.519	0.880	0.046	0.655
(0, 0.3)	(0.02, 4.79)	0.782	0.784	0.808	0.987	0.104	0.888
(0.1, 0.25)	(1.01, 4.62)	0.722	0.725	0.750	0.899	0.210	0.832
(0.1, 0.3)	(1.01, 5.37)	0.917	0.919	0.930	0.990	0.347	0.960
(0.2, 0.3)	(2.02, 5.97)	0.979	0.979	0.984	0.994	0.688	0.992

* $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0169$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0170$ are the sample means of Z₁ and Z₂, respectively.

We also considered four continuous traits with a compound-symmetry correlation structure for the error terms. The results are shown in supplementary Table SII. The basic conclusions remain the same. We added the MANOVA method implemented in R to the case of no covariates. As shown in supplementary Table SIII, MANOVA has slightly higher type I error and power than the Q test. This is consistent with the general phenomenon that the likelihood ratio test is more liberal than the score test [Lin and Zeng, 2011]. Finally, we considered family studies with 250 families (two parents and two children in each family) and two continuous traits. As shown in supplementary Table SIV, the conclusions are similar to the case of unrelated subjects.

Cardiovascular Studies

We analyzed the GWAS data on the Caucasian samples from the Atherosclerosis Risk in Communities (ARIC) study, the Coronary Artery Risk Development in Young Adults (CARDIA) study, the Cardiovascular Health Study (CHS), the Multi-Ethnic Study of Atherosclerosis (MESA), and the Framingham Heart Study (FHS), the sample sizes being 9,068, 1,433, 3,892, 2,286, and 2,789, respectively. The FHS is a family study, and the others consist of unrelated individuals. Each individual was genotyped on 250,000 SNPs. We considered four cardiovascular traits: diabetes status, high-density lipoprotein (HDL), low-density lipoprotein (LDL), and triglycerides; the first trait is binary whereas the other three are continuous. These traits are major players in the development of coronary artery diseases and metabolic syndrome [Grundy, 2012; Holmes et al., 1981]. We aimed to identify genetic factors underlying these traits. Since the HDL is “good” cholesterol, we used negative values of HDL in the analysis.

We performed single-SNP analysis with the following covariates: age, gender, study centers, and the top 10 principle components for ancestry. We calculated the score-type statistics and their covariance matrices for each study and then combined the results of the five studies. The (unadjusted) P-values of the univariate and multivariate tests are displayed in Figures 1 and 2, respectively. The genome-wide significance thresholds based on the Bonferroni correction and the Monte Carlo procedure are marked in both figures.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Univariate tests of the diabetes status and the LDL, HDL, and triglyceride levels in the ARIC, CARDIA, CHS, MESA, and FHS GWAS studies. Genome-wide significance thresholds based on the Bonferroni correction and the Monte Carlo procedure are shown in green and blue, respectively.

We first examine the results based on the Bonferroni correction. For the four traits, more than 10 regions are above the Bonferroni threshold in the Uni-B test (Fig. 1). Compared to Uni-B, the Q test identifies one new signal that is located on chromosome (Chr) 9 (Fig. 2). The signal on Chr9 identified by the Q test is an accumulation of weak/moderate signals for individual traits. The gene at this locus encodes a protein called ABCA1, which is involved in cellular cholesterol removal (Lawn et al., 1999). This gene was previously found to be associated with metabolic syndrome (Avery et al., 2011). Table 3 lists all the loci discovered by the Q test. (The P-values from the univariate-trait analysis are also shown.) The T and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0171$ tests did not identify any additional signals that achieve genome-wide significance, but the two tests yielded more extreme P-values for several SNPs than the univariate tests.

Table 3. P-values of the genetic loci identified by the Q test

				P-values of single-trait analysis
Chr	SNP	Gene	P-value of Q test	Diabetes	LDL	HDL	Trig
2	rs515135	APOB	6.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0172$	4.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0173$	3.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0174$	5.8 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0175$	3.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0176$
2	rs1260326	GCKR	2.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0177$	3.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0178$	3.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0179$	6.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0180$	8.9 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0181$
5	rs12916	HMGCR	1.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0182$	8.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0183$	3.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0184$	2.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0185$	9.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0186$
6	rs10455872	LPA	1.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0187$	9.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0188$	8.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0189$	1.8 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0190$	8.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0191$
7	rs7777102	MLXIPL	2.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0192$	4.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0193$	8.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0194$	3.8 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0195$	1.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0196$
8	rs1011685	LPL	5.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0197$	5.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0198$	7.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0199$	2.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0200$	4.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0201$
8	rs2954021	TRIB1	5.9 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0202$	6.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0203$	1.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0204$	2.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0205$	1.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0206$
9	rs2575876	ABCA1	9.5 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0207$	2.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0208$	1.9 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0209$	2.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0210$	4.8 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0211$
10	rs7903146	TCF7L2	8.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0212$	5.8 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0213$	8.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0214$	8.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0215$	6.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0216$
11	rs174538	FEN1	3.5 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0217$	1.9 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0218$	2.9 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0219$	3.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0220$	6.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0221$
11	rs964184	ZNF259	2.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0222$	8.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0223$	2.9 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0224$	3.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0225$	3.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0226$
15	rs1077835	LIPC	1.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0227$	2.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0228$	2.9 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0229$	5.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0230$	3.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0231$
16	rs247616	CETP	6.7 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0232$	7.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0233$	5.5 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0234$	3.5 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0235$	2.5 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0236$
18	rs4121823	LIPG	8.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0237$	5.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0238$	7.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0239$	2.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0240$	8.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0241$
19	rs6511720	LDLR	5.8 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0242$	8.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0243$	3.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0244$	6.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0245$	6.0 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0246$
19	rs10401969	SUGP1	2.4 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0247$	5.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0248$	1.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0249$	3.5 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0250$	1.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0251$
19	rs445925	APOC1	2.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0252$	8.6 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0253$	2.2 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0254$	6.3 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0255$	1.1 $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0256$

Not surprisingly, the Monte Carlo procedure reduced the genome-wide significance thresholds for all tests. For the univariate test on the HDL, one SNP on Chr20 becomes significant by the Monte Carlo criterion. For the T test, one SNP (rs5752792) on Chr22 is above the Monte Carlo threshold. This SNP resides near gene HSCB, which is mainly expressed in liver, muscle and heart [Sun et al., 2003] and is involved in the biogenesis of an elementary metabolic function unit [Rouault and Tong, 2008]. The expression pattern and biological function of HSCB strongly suggest that this gene is pleiotropic.

We have provided a very general and flexible approach to association testing with multivariate traits. An earlier version of this approach (focusing on the Q test for continuous traits) was recently used to successfully identify genes associated with metabolic syndrome [Avery et al., 2011]. The new application presented in this paper further demonstrates the usefulness of the proposed approach. It only took several hours to calculate all the P-values shown in Figures 1 and 2. We have posted our software online at http://dlin.web.unc.edu/software.

When the number of traits is very large, we recommend to reduce the dimension through principal component analysis [Avery et al., 2011]. Although we have focused on main effects of genetic variants, our approach can be easily modified to test gene–environment interactions. It can also be extended to perform burden tests on rare variants (Lin and Tang, 2011).

Univariate-trait analysis and multivariate-trait analysis are complementary to each other. The former is easier to implement and can be used to rapidly screen a large number of genetic variants. The multivariate-trait analysis provides a useful tool to uncover pleiotropic variants that have weak or moderate effects on individual traits. This is particularly important for dissecting the genetic basis of complex diseases, as most of the genetic variants with strong effects and high MAFs might have already been identified.

There is no uniformly most powerful test for analyzing multivariate traits. If the effects of a genetic variant are similar across the traits, then T and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0257$ are generally preferable. If the effects are considerably different or even in opposite directions, then Q is preferable. The theoretical results of the Appendix offer useful insights into the relative power of the three test statistics and can be used to determine the power and sample size for future studies.

For family data, we adopted the marginal models with an independence working correlation matrix. A more efficient approach would be a random-effect model which utilizes the family relationships. We adopted marginal models instead of random-effects models for several reasons. First, the association tests under marginal models are more robust to model misspecification. Second, it is much faster to fit marginal models than random-effects models. Third, marginal models can easily handle mixtures of continuous and binary traits.

Adjustment for multiple testing is an important issue in genetic association analysis. The Monte Carlo procedure considered in this paper accounts for the correlations among the test statistics and is thus less conservative than the conventional Bonferroni correction. Some existing methods, such as the TATES, may yield inflated type I error. We have focused on determining the genome-wide significance threshold rather than calculating individual adjusted P-values. The former only requires several thousands Monte Carlo samples whereas the latter would entail millions of Monte Carlo samples to estimate extremely small P-values. If the number of SNPs is small, the joint distribution of the test statistics can be evaluated through numerical integration [Conneely and Boehnke, 2007, 2010].

Acknowledgments

This research was supported by the National Institutes of Health grants R01 CA082659, U01 HG004803, and R00-HL-098458. We thank two reviewers for their helpful comments.

Appendix

Asymptotic Distributions of Q, T, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0258$ for Two Quantitative Traits

We consider a study of unrelated individuals and two quantitative traits satisfying the bivariate linear model:

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0259$

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0260$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0261$ are bivariate zero-mean normal with covariance matrix

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0262$

In the absence of missing values, the score statistic for testing $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0263$ takes the form

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0264$

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0265$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0266$ are the least-squares estimators of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0267$ and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0268$ . Simple algebraic manipulation yields

Assume that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0270$ is in the order of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0271$ . By the multivariate central limit theorem and the law of large numbers, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0272$ is approximately bivariate normal with mean

and covariance matrix

where $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0275$ .

It follows from the above result that Q is approximately chi-squared with 2 degrees of freedom and with noncentrality parameter

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0276$

In addition, T is approximately normal with mean

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0277$

and unit variance, and $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0278$ is approximately normal with mean

and unit variance.

In the special case of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0280$ ,

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0281$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0282$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0283$

Clearly, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0284$ . It can be shown that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0285$ , where the equality holds if and only if $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0286$ (assuming that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0287$ ).

In the special case of $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0288$ ,

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0289$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0290$

$urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0291$

Note that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0292$ . It can be shown that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0293$ , where the equality holds if and only if $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0294$ (assuming that $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0295$ ).

Supporting Information

References

Amos CI, Laing AE. 1993. A comparison of univariate and multivariate tests for genetic linkage. Genet Epidemiol 10: 671–676.
10.1002/gepi.1370100657
PubMed Web of Science® Google Scholar
Avery CL, He Q, North KE, Ambite JL, Boerwinkle E, Fornage M, Hindorff LA, Kooperberg C, Meigs JB, Pankow JS and others,. 2011. A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains. PLoS Genet 7: e1002322.
10.1371/journal.pgen.1002322
CAS PubMed Web of Science® Google Scholar
Conneely KN, Boehnke M. 2007. So many correlated tests, so little time! Rapid adjustment of p-values for multiple correlated tests. Am J Hum Genet 81: 1158–1168.
10.1086/522036
CAS PubMed Web of Science® Google Scholar
Conneely KN, Boehnke M. 2010. Meta-analysis of genetic association studies and adjustment for multiple testing of correlated SNPs and traits. Genet Epidemiol 34: 739–746.
10.1002/gepi.20538
PubMed Web of Science® Google Scholar
Ferreira M, Purcell S. 2009. A multivariate test of association. Bioinformatics 25: 132–133.
10.1093/bioinformatics/btn563
CAS PubMed Web of Science® Google Scholar
Gottesman O, Drill E, Lotay V, Bottinger E, Peter I. 2012. Can genetic pleiotropy replicate common clinical constellations for cardiovascular disease and risk? PLoS ONE 7: e46419.
10.1371/journal.pone.0046419
PubMed Web of Science® Google Scholar
Grundy SM. 2012. Pre-diabetes, metabolic syndrome, and cardiovascular risk. J Am Coll Cardiol 59: 635–643.
10.1016/j.jacc.2011.08.080
CAS PubMed Web of Science® Google Scholar
Holmes DR, Elveback LR, Frye RL, Kottke BA, Ellefson RD. 1981. Association of risk factor variables and coronary artery disease documented with Angiography. Circulation 63: 293–299.
10.1161/01.CIR.63.2.293
PubMed Web of Science® Google Scholar
Jiang C, Zeng ZB. 1995. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111–1127.
10.1093/genetics/140.3.1111
CAS PubMed Web of Science® Google Scholar
Lawn RM, Wade DP, Garvin MR, Wang X, Schwartz K, Porter JG, Seilhamer JJ, Vaughan AM, Oram JF. 1999. The Tangier disease gene product ABC1 controls the cellular apolipoprotein-mediated lipid removal pathway. J Clin Invest 104: R25–R31.
10.1172/JCI8119
CAS PubMed Web of Science® Google Scholar
Lawson HA, Cady JE, Partridge C, Wolf JB, Semenkovich CF, Cheverud JM. 2011. Genetic effects at pleiotropic loci are context-dependent with consequences for the maintenance of genetic variation in populations. PLoS Genet 7: e1002256.
10.1371/journal.pgen.1002256
CAS PubMed Web of Science® Google Scholar
Liang KY, Zeger SL. 1986. Longitudinal data analysis using generalized linear models. Biometrika 73: 13–22.
10.1093/biomet/73.1.13
Web of Science® Google Scholar
Lin DY. 2005. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21: 781–787.
10.1093/bioinformatics/bti053
CAS PubMed Web of Science® Google Scholar
Lin DY, Tang ZZ. 2011. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89: 354–367.
10.1016/j.ajhg.2011.07.015
CAS PubMed Web of Science® Google Scholar
Liu J, Pei Y, Papasian CJ, Deng HW. 2009. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol 33: 217–227.
10.1002/gepi.20372
PubMed Web of Science® Google Scholar
Maity A, Sullivan PF, Tzeng JY. 2012. Multivariate phenotype association analysis by marker-set kernel machine regression. Genet Epidemiol 36: 686–695.
10.1002/gepi.21663
CAS PubMed Web of Science® Google Scholar
O'Brien PC. 1984. Procedures for comparing samples with multiple endpoints. Biometrics 40: 1079–1087.
10.2307/2531158
PubMed Web of Science® Google Scholar
Paaby AB, Rockman MV. 2012. The many faces of pleiotropy. Trends Genet 29: 66–73.
10.1016/j.tig.2012.10.010
CAS PubMed Web of Science® Google Scholar
Rouault TA, Tong WH. 2008. Iron-sulfur cluster biogenesis and human disease. Trends Genet 24: 398–407.
10.1016/j.tig.2008.05.008
CAS PubMed Web of Science® Google Scholar
Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, Rudan I, McKeigue P, Wilson JF, Campbell H. 2011. Abundant pleiotropy in human complex disease and traits. Am J Hum Genet 89: 607–618.
10.1016/j.ajhg.2011.10.004
CAS PubMed Web of Science® Google Scholar
Sun G, Gargus JJ, Ta DT, Vickery LE. 2003. Identification of a novel candidate gene in the iron-sulfur pathway implicated in ataxia-susceptibility: human gene encoding HscB, a J-type co-chaperone. J Hum Genet 48: 415–419.
10.1007/s10038-003-0048-9
CAS PubMed Web of Science® Google Scholar
van der Sluis S, Posthuma D, Dolan CV. 2013. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet 9: e1003235.
10.1371/journal.pgen.1003235
CAS PubMed Web of Science® Google Scholar
Watanabe RM, Ghosh S, Langefeld CD, Valle TT, Hauser ER, Magnuson VL, Mohlke KL, Silander K, Ally DS, Chines P and others. 2000. The Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics (FUSION) Study. II. An autosomal genome scan for diabetes-related quantitative-trait loci. Am J Hum Genet 67: 1186–1200
10.1016/S0002-9297(07)62949-8
CAS PubMed Web of Science® Google Scholar
Yang Q, Wu H, Guo CY, Fox CS. 2010. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol 34: 444–454.
10.1002/gepi.20497
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume37, Issue8

December 2013

Pages 759-767

A General Framework for Association Tests With Multivariate Traits in Large-Scale Genomics Studies

ABSTRACT

Introduction

Methods

Calculating Score Statistics and Their Covariance Matrix

Performing Multivariate Association Tests

Combining Results From Multiple Studies

Determining Genome-Wide Significance

Results

Simulation Studies

Cardiovascular Studies

Acknowledgments

Appendix

Asymptotic Distributions of Q, T, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0258$ for Two Quantitative Traits

Supporting Information

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

A General Framework for Association Tests With Multivariate Traits in Large-Scale Genomics Studies

ABSTRACT

Introduction

Methods

Calculating Score Statistics and Their Covariance Matrix

Performing Multivariate Association Tests

Combining Results From Multiple Studies

Determining Genome-Wide Significance

Results

Simulation Studies

Cardiovascular Studies

Acknowledgments

Appendix

Asymptotic Distributions of Q, T, for Two Quantitative Traits

Supporting Information

References

Citing Literature

Figures

References

Related

Information

Asymptotic Distributions of Q, T, $urn:x-wiley:07410395:media:gepi21759:gepi21759-math-0258$ for Two Quantitative Traits