Volume 2014, Issue 1 432805

Research Article

Open Access

Exact Inference for the Dispersion Matrix

Corresponding Author

Alan D. Hutson

[email protected]

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

Gregory E. Wilding,

Gregory E. Wilding

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

Jihnhee Yu,

Jihnhee Yu

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

Albert Vexler,

Albert Vexler

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

Alan D. Hutson,

Corresponding Author

Alan D. Hutson

[email protected]

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

Gregory E. Wilding,

Gregory E. Wilding

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

Jihnhee Yu,

Jihnhee Yu

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

Albert Vexler,

Albert Vexler

Department of Biostatistics, University at Buffalo, 706 Kimball Tower, 3435 Main Street, Buffalo, NY 14214-3000, USA buffalo.edu

Search for more papers by this author

First published: 14 September 2014

https://doi.org/10.1155/2014/432805

Citations: 1

Academic Editor: Lynn Kuo

Share a link

Email
Wechat
Bluesky

Abstract

We develop a new and novel exact permutation test for prespecified correlation structures such as compound symmetry or spherical structures under standard assumptions. The key feature of the work contained in this note is the distribution free aspect of our procedures that frees us from the standard and sometimes unrealistic multivariate normality constraint commonly needed for other methods.

1. Introduction

Let (X₁, X₂, …, X_n) be an iid p-dimensional multivariate sample from an absolutely continuous distribution F with p × p dispersion matrix of X as

()

Inference about the dispersion matrix Σ takes the general form

()

where we assume that Σ₀ is specified in a particular manner, for example, a block diagonal matrix or a spherical type structure or simply an unstructured form.

In general, research and testing methods of this form assume an underlying multivariate normal distribution with associated exact and approximate tests; for example, for a thorough overview and history of this testing problem, see Seber [1] and the references therewithin. In practice one can safely say that it would be rare that the multivariate normality assumption holds. Hence, we were motivated to develop an exact permutation method approach to this problem. To the best of our knowledge no so-called exact permutation tests have been developed or explored with the exception of the very special case of p = 2 dimensions and testing H₀ : ρ₁₂ = 0; for example, see Good [2]. Martin [3] provides a bootstrap algorithm for testing H₀ : ρ₁₂ = ρ_12,0, which asymptotically can be shown to have the appropriate type I error rate. Unfortunately, the bootstrap methods given by Martin [3] relative to first standardizing the variables and rotating the data so as to transform the problem to the setting of testing H₀ : ρ₁₂ = 0 do not work in the permutation setting. The permutation test for the case H₀ : ρ₁₂ = 0 follows by permuting the second column of the n × 2 data matrix (X₁, X₂) and calculating the test statistic , where refers to the standard sample Pearson correlation coefficient, over all permutations. This can be done directly via a computationally expensive algorithm or via the more widely used Monte Carlo techniques. With respect to the Monte Carlo methods we generate B random permutations of the data and denote the permuted value of the test statistic by . Then the one-sided P value for the alternative H₁ : ρ₁₂ > 0 is given as , where the index i corresponds to a given permutation and I denotes the indicator function. Alternative approaches found in software packages such as SAS PROC FREQ (SAS version 9.3, Cary, NC) utilize hypergeometric probabilities similar to how Fisher’s exact test is carried out via treating the fixed data as discrete.

In general permutation testing is most often used for comparing two groups in the context of location differences or other features of distributions such as scale measures. Most of the theoretical work has been done in this setting such as type I error control. For a technical treatment of permutation testing see Romano [4] with respect to a theoretical examination for the behavior of the type I error control for permutation tests under exchangeability versus nonexchangeability conditions. In order to ensure true bounded type I error control in the permutation testing setting either the null hypothesis has to be specified in such a way that exchangeability holds by definition under H₀ or some design feature such as randomization or matching needs to be employed. Commenges [5] studies the more general transformation approach used to preserve exchangeability. Also, see Zhang [6], Huang et al. [7], and Janssen and Pauls [8] with respect to the inflation of type I error rates when comparing means in the two-sample setting along with other types of comparisons. In terms of permutation testing related to correlation structure based hypotheses very little has been accomplished. This paper represents one of the only investigations of this type to date.

In Section 2 we develop the general p-dimensional exact tests given a prespecified covariance structure. Special cases include testing for sphericity, compound symmetry, and block diagonality, to name a few. This presentation is followed by a simulation study in Section 3. We then apply our method in Section 4 to an example involving repeated measures mice weight data.

2. Exact Tests for Covariance Structures in p Dimensions

The focus of the work in this setting is with respect to two-sided alternatives. In certain instances a subset of these tests with one-sided alternative structure may be constructed. Those tests will not be included as part of this discussion due to the specificity of their applications.

2.1. Unequal Variance Setting

Now let (X₁, X₂, …, X_n) be an iid p-dimensional multivariate sample from an absolutely continuous distribution F with the first two finite central moments corresponding to each component of X given as

and

, i = 1,2, …, p. Let Corr(X_i, X_j) = ρ_ij, i = 1,2, …, p, j = 1,2, …, p, (i ≠ j). Furthermore, denote the p × p dispersion matrix of X by

()

where Σ is defined to be a p × p positive definite matrix. We represent the Cholesky decomposition of the p × p matrix Σ as

()

such that A^′−1 is defined. The Cholesky decomposition is a key component of the permutation test we propose; however it is not unique to the problem; that is, other decomposition methods may yield similar results and alternative solutions. From a practical standpoint the Cholesky decomposition is built in to several statistical software packages, thus making our methodology more feasible for a larger group of practitioners.

Now our more general hypothesis of interest takes the form

()

where Σ₀ are the hypothesized value of Σ at (3).

Test Statistic. Let the p × p matrix

denote the transpose of A^′−1 with the hypothesized values as given elements from (5). Let the n × p matrix

denote the data matrix following transformation. Then the dispersion matrix corresponding to the n × p matrix Z will be a diagonal matrix such that Corr(Z_i, Z_j) = 0 ∀ i < j, j < i if and only if H₀ at (5) holds true. Under these conditions testing H₀ at (5) is equivalent to the test:

()

where the off-diagonal elements of

are equal to 0 under H₀ at (6).

An exact α-level permutation test of H₀ can be defined for (6) by considering the permutation of each column of Z and employing the Pearson correlation coefficient for each combination of columns. Towards this end let us denote the

with the corresponding Pearson estimator by

. The test statistic of interest in the two-sided case with respect to detecting departures from H₀ at (6) is defined as

()

The exactness of the test in terms of the type I error control follows from a straightforward generalization of the form of the dispersion matrix for the 2 × 2 case, where

()

In the 2 × 2 case the off-diagonal elements of the dispersion matrix are given as

()

An examination of the covariance term corresponding to

at (9) clearly indicates that it has the value of 0 if and only if H₀ at (8) is true. When testing H₀ at (8) the Pearson correlation estimate between the transformed variates through

, Z₁, and Z₂ serves to appropriately detect departures from H₀. Within the permutation testing framework provides an exact α-level test; that is, the covariance of Z₁ and Z₂ is 0 if and only if the correlation of Z₁ and Z₂ is 0.

We resort to a Monte Carlo approximation in order to obtain the P value for testing H₀ defined at (5). The steps for performing the Monte Carlo approximation with respect to estimating the P value are as follows.

(1)
Define H₀ at (5).
(2)
Obtain the new random variates Z by applying the transformation to the observed data X.
(3)
Calculate T(Z) at (7).
(4)
Permute each column of Z independently such that we have the permuted n × p matrix denoted by Z^*.
(5)
Calculate T(Z^*) applying the resampled values to T(Z) at (7).
(6)
Repeat steps (4) and (5) B times.
(7)
Calculate the Monte Carlo estimated permutation P value as , where I_(·) denotes the indicator function.

2.2. Nontransformation Special Cases

In certain special cases we can test specific forms of hypothesis (5) using our permutation approach without specifying a specific subset of the

’s or ρ_ij’s. One obvious special case relative to testing hypothesis (5) is the test given for a diagonal dispersion matrix versus nondiagonal dispersion matrix such that under H₀ all ρ_ij = 0. Historical tests of this form have relied on assuming p-variate multivariate normality; for example, see Mudholkar et al. [9] for a description of a likelihood ratio approximation to this test. In this instance we have

()

with unspecified

’s under H₀.

In this case there is no transformation of the data required. An exact α-level permutation test of H₀ can be defined simply by considering the permutation of each column of X and employing the Pearson correlation coefficient for each combination of columns. Towards this end denote the

with the corresponding Pearson estimator by

. The test statistic of interest with respect to detecting departures from the diagonal structure is defined as

()

The Monte Carlo estimated permutation P value is calculated similarly as before, where

, where I_(·) denotes the indicator function.

Another special case where we can have a set of unspecified

’s or ρ_ij’s is when we may be interested in testing for a block diagonal dispersion matrix structure such that under H₀ at (5) we now have

()

where the partitioned q_j × q_j matrices are given as

()

The dispersion matrix Σ_jj may have different dimensions q_j < p(∑ q_j = p), j = 1,2, …, b, with unspecified

’s and ρ_ij’s under H₀.

As in the test for a diagonal dispersion matrix above there is no transformation of the data required. An exact α-level permutation test of H₀ can again be defined simply by considering the permutation of each column of X and employing the Pearson correlation coefficient for each combination of columns. The test statistic of interest with respect to detecting departures from the diagonal structure is a slight modification of the test statistic at (11) defined as

()

where the “off-block” correlation elements at (12), ρ_0,ij = 0, under H₀ and I_(·) denote the indicator function. The Monte Carlo estimated permutation P value is calculated similarly as before, where

and I_(·) denotes the indicator function.

2.3. Equal Variance Setting

For the equal variance p-dimensional case we have

()

where we now define the p × p dispersion matrix under H₀ as

()

where Σ₀ is defined to be a p × p positive definite matrix, Var⁡(X_i) = σ², i = 1,2, …, p, under H₀ and the p × p correlation matrix Γ₀ is given by

()

The Cholesky decomposition of matrix Γ₀ is as

()

In order to test the hypothesis at (15) we utilize the transformation

. Note that H₀ at (15) will be sensitive to departures from Γ₀ and unequal marginal variances. Furthermore, explicit values for σ² do not need to be specified within this hypothesis testing framework.

The steps for performing the Monte Carlo approximation with respect to estimating the P value are as follows.

(1)
Define H₀ at (15).
(2)
Obtain the new random variates Z by applying the transformation to the observed data X.
(3)
Calculate T(Z) at (7).
(4)
Permute each column of Z independently such that we have the permuted n × p matrix denoted by Z^*.
(5)
Calculate T(Z^*).
(6)
Repeat steps (4) and (5) B times.
(7)
Calculate the Monte Carlo estimated permutation P value as , where I_(·) denotes the indicator function.

A special case relative to testing hypothesis (15) is the test that the dispersion matrix is diagonal and all , i = 1,2, …, p.

Other special cases of the test at (15) may be of interest and written in the form

()

Examples of specific dispersion structures of importance corresponding to the test at (19) include

(1)
sphericity:
()
(2)
compound symmetry:
()
(3)
first-order autoregressive:
()
(4)
spatial power:
()

Several other well-known spatial dispersion matrices similar to the spatial power matrix presented above fit within this same framework and will not be presented here.

3. Simulation Study

In this section we examine the test at (19), where we specify ρ₀ and the form of the correlation structure Γ at (17), for example, compound symmetry. Our simulation study for the p × p case will utilize a p-variate standardized multivariate normal distribution with p = 5 and a special case mixing the marginal distributions across normal, exponential, and uniform forms. Again, differing location and scale doe not vary the general conclusions. In terms of our simulation study we set the null value of ρ₀ = 0,0.5,0.9 under a compound symmetry assumption and a first-order autoregressive assumption, where Γ will take the forms:

(1)
compound symmetry:
()
(2)
first-order autoregressive:
()

Note that the special case ρ₀ = 0 is the same for both covariance structures and is only presented once. It should also be noted that under the assumption of multivariate normality testing H₀ : Σ = σΓ₀(0) under the compound symmetry or first-order autoregressive structure is a special case (equal variance assumption) of the well-known test for “complete independence”; for example, see Mudholkar et al. [9]. Under nonnormality we are essentially testing the “complete uncorrelated” case. In this special case the methods presented here are the first exact methods developed for tackling this particular hypothesis. In terms of large sample theory around similar results see Jiang [10] and Xiao and Wu [11].

For our simulation study we used 1000 replicates for our study at n = 10,20,30,40 and set α = 0.05. The covariance structure was the same under H₀ and H₁ for this set of simulations. The results are contained in Figures 1, 2, 3, 4, and 5. As anticipated we see the expected results of appropriate type I error control and monotone power functions increasing in either direction about the null value for ρ. The range of ρ under the alternative was dictated by the constraint that Γ₀(ρ₀) is defined to be positive definite.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Power for testing for compound symmetry covariance as a function of ρ given ρ₀ = 0.

For the sake of example we modified our simulation and took ρ₀ = 0.5 with marginals given by X₁, X₂ ~ N(0,1), , and X₅exp⁡(1, −1) with Γ₀(ρ₀) assumed to be compound symmetric under H₀ and under H₁. The results are shown in Figure 6 and as we can see they do not differ dramatically from Figure 2 assuming multivariate normality, thus illustrating the flexibility and nonparametric nature of our methodology.

As an additional result we studied the power under the correctly specified ρ under H₀ with Γ₀(ρ₀) differing in structure. For this study we set ρ₀ = 0.5,0.9 with Γ₀(ρ₀) set to compound symmetry under H₀ and Γ₀(ρ₀) set to the first-order autoregressive structure under H₁. In other words what is the power to detect a correlation structure different from the null structure given that ρ₀ is the true correlation. At α = 0.05 the power to detect a different correlation structure under the alternative for ρ₀ = 0.5 and 0.9 at n = 10,20,30,40 and α = 0.05 was 0.245, 0.539, 0.761, and 0.894 and 0.520, 0.858, 0.951, and 0.998, respectively.

4. Example

As an illustration of our method we will use phenotypic weight data from n = 16 mice as contained in Table 1 from a recent unpublished study conducted within Roswell Park Cancer Institute. The estimated correlation matrix is provided in Table 2. The respective sample variance estimates were

, and

. For example, suppose we were interested in testing

()

for both Γ(·) having the compound symmetry structure or the first-order autoregressive structure as defined in (23). In this instance the test corresponding to the above hypothesis under a compound symmetry correlation structure yielded a Monte Carlo estimated P value <0.0001 (B = 10,000). While the test corresponding to the above hypothesis under the first-order autoregressive correlation structure yielded a Monte Carlo estimated P value = 0.002 (B = 10,000). For this example this provides some measure of evidence that the correlation structure does not fit the compound symmetry structure and that the first-order autoregressive structure assuming ρ = 0.95 may be more appropriate. Similarly, the test for diagonality under the equal variances assumption (sphericity), which does not assume a value for ρ₀, yielded a Monte Carlo estimated P value <0.0001 (B = 10,000). Note that we may be rejecting H₀ at some specified level α under at least one of 3 scenarios: unequal marginal variances, ρ ≠ ρ₀ or Γ ≠ Γ₀.

Table 1. Mice weights (grams).

Mouse	Day 0	Day 1	Day 2	Day 3	Day 4
1	21.4	21.0	21.0	21.3	21.5
2	18.4	18.4	18.0	18.1	17.8
3	17.8	17.6	17.4	17.2	17.1
4	18.2	18.5	17.6	18.0	17.5
5	20.0	19.4	19.1	19.6	18.9
6	20.2	19.3	19.3	19.3	18.6
7	16.5	16.3	16.6	16.4	16.3
8	17.6	17.4	17.5	17.9	17.6
9	19.8	20.4	20.1	20.6	20.3
10	22.0	22.3	21.3	21.9	20.7
11	17.5	17.4	17.5	17.5	17.3
12	20.3	20.2	19.8	20.4	19.7
13	16.3	16.2	16.2	15.9	15.7
14	18.0	17.2	17.2	17.0	16.4
15	20.3	19.9	19.0	18.9	18.1
16	21.4	21.2	20.7	21.1	20.7

Table 2. Estimated correlation matrix for mouse weight example data.

	Day 0	Day 1	Day 2	Day 3	Day 4
Day 0	1.000	0.977	0.974	0.959	0.921
Day 1	0.977	1.000	0.982	0.979	0.945
Day 2	0.974	0.982	1.000	0.993	0.979
Day 3	0.959	0.979	0.993	1.000	0.983
Day 4	0.921	0.945	0.979	0.983	1.000

Given our overall P value from above which was 0.002, we may wish to examine in further detail what is driving us to reject H₀. In this case we can examine specific submatrices of the dispersion matrix of . For this example we could test H₀ : Σ = σΓ₀(.95) using days 0, 1, and 2, only or any other combinations of days such that the appropriate correlation substructure is extracted from the original hypothesized values for Γ. For our example subtest we get P = 0.32 indicating no strong evidence against a first-order autoregressive “substructure” with equal variances and ρ₀ = 0.95. If we add day 3, our P value = 0.04, indicating either the correlation structure may be misspecified at this point or the variance is different. Note that further work relative to the multiple comparison problem of subtests and their relative correlation is needed. This is simply an exploratory approach to this issue relative to the example at hand.

5. Concluding Remarks

In this note we provided a method for exact testing around specific covariance structures. We employed the Cholesky decomposition for this purpose. It was noted by a reviewer that other decomposition methodologies may lead to extensions of this methodology, which we will consider in terms of future work.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research is supported by the NIH Grant 1R03DE020851-01A1, the National Institute of Dental and Craniofacial Research. The authors wish to thank the reviewers for their time and effort.

References

1 Seber G. A. F., Multivariate Observations, 1984, John Wiley & Sons, New York, NY, USA, https://doi.org/10.1002/9780470316641, MR746474.
10.1002/9780470316641
Google Scholar
2 Good P., Robustness of Pearson correlation, Interstat. (2009) 15, no. 5, 1–6.
Google Scholar
3 Martin M. A., Bootstrap hypothesis testing for some common statistical problems: a critical evaluation of size and power properties, Computational Statistics & Data Analysis. (2007) 51, no. 12, 6321–6342, https://doi.org/10.1016/j.csda.2007.01.020, MR2408597, 2-s2.0-34547159345.
Google Scholar
4 Romano J. P., On the behavior of randomization tests without a group invariance assumption, Journal of the American Statistical Association. (1990) 85, no. 411, 686–692, https://doi.org/10.1080/01621459.1990.10474928, MR1138350.
Google Scholar
5 Commenges D., Transformations which preserve exchangeability and application to permutation tests, Journal of Nonparametric Statistics. (2003) 15, no. 2, 171–185, https://doi.org/10.1080/1048525031000089310, MR1981459, ZBL1054.62055, 2-s2.0-0037832215.
Web of Science® Google Scholar
6 Zhang S. P., The split sample permutation t-tests, Journal of Statistical Planning and Inference. (2009) 139, no. 10, 3512–3524, https://doi.org/10.1016/j.jspi.2009.04.004, MR2549099, 2-s2.0-67650001639.
Google Scholar
7 Huang Y., Xu H., Calian V., and Hsu J. C., To permute or not to permute, Bioinformatics. (2006) 22, no. 18, 2244–2248, https://doi.org/10.1093/bioinformatics/btl383, 2-s2.0-33748698241.
Web of Science® Google Scholar
8 Janssen A. and Pauls T., A Monte Carlo comparison of studentized bootstrap and permutation tests for heteroscedastic two-sample problems, Computational Statistics. (2005) 20, no. 3, 369–383, https://doi.org/10.1007/BF02741303, MR2242115, ZBL1091.62034, 2-s2.0-33746037217.
Google Scholar
9 Mudholkar G. S., Trivedi M. C., and Lin C. T., An approximation to the distri bution of the likelihood ratio statistics for testing complete independence, Technometrics. (1982) 24, no. 2, 139–143, https://doi.org/10.1080/00401706.1982.10487736, 2-s2.0-0020125611.
Google Scholar
10 Jiang T., The asymptotic distributions of the largest entries of sample correlation matrices, The Annals of Applied Probability. (2004) 14, no. 2, 865–880, https://doi.org/10.1214/105051604000000143, MR2052906, ZBL1047.60014, 2-s2.0-11244330494.
Google Scholar
11 Xiao H. and Wu W. B., Asymptotic theory for maximum deviations of sample covariance matrix estimates, Stochastic Processes and their Applications. (2013) 123, no. 7, 2899–2920, https://doi.org/10.1016/j.spa.2013.03.012, MR3054550, ZBL1284.62122, 2-s2.0-84885041889.
Google Scholar

Citing Literature

All articles

Exact Inference for the Dispersion Matrix

Abstract

1. Introduction