Volume 72, Issue 1 pp. 4-13
Original Article
Open Access

Analytic posteriors for Pearson's correlation coefficient

Alexander Ly

Corresponding Author

Alexander Ly

Department of Psychological Methods, University of Amsterdam, PO Box 15906, Amsterdam, 1001 NK The Netherlands

[email protected]Search for more papers by this author
Maarten Marsman

Maarten Marsman

Department of Psychological Methods, University of Amsterdam, PO Box 15906, Amsterdam, 1001 NK The Netherlands

Search for more papers by this author
Eric-Jan Wagenmakers

Eric-Jan Wagenmakers

Department of Psychological Methods, University of Amsterdam, PO Box 15906, Amsterdam, 1001 NK The Netherlands

Search for more papers by this author
First published: 05 July 2017
Citations: 190

Abstract

Pearson's correlation is one of the most common measures of linear dependence. Recently, Bernardo (11th International Workshop on Objective Bayes Methodology, 2015) introduced a flexible class of priors to study this measure in a Bayesian setting. For this large class of priors, we show that the (marginal) posterior for Pearson's correlation coefficient and all of the posterior moments are analytic. Our results are available in the open-source software package JASP.

1 Introduction

Pearson's product–moment correlation coefficient ρ is a measure of the linear dependency between two random variables. Its sampled version, commonly denoted by r, has been well studied by the founders of modern statistics such as Galton, Pearson, and Fisher. Based on geometrical insights, FISHER (1915, 1921) was able to derive the exact sampling distribution of r and established that this sampling distribution converges to a normal distribution as the sample size increases. Fisher's study of the correlation has led to the discovery of variance-stabilizing transformations, sufficiency (FISHER, 1920), and, arguably, the maximum likelihood estimator (FISHER, 1922; STIGLER, 2007). Similar efforts were made in Bayesian statistics, which focus on inferring the unknown ρ from the data that were actually observed. This type of analysis requires the statistician to (i) choose a prior on the parameters, thus, also on ρ, and to (ii) calculate the posterior. Here we derive analytic posteriors for ρ given a large class of priors that include the recommendations of JEFFREYS (1961), LINDLEY (1965), BAYARRI (1981), and, more recently, BERGER and SUN (2008) and BERGER et al. (2015). Jeffreys's work on the correlation coefficient can also be found in the second edition of his book (JEFFREYS, 1961), originally published in 1948; see ROBERT et al. (2009) for a modern re-read of Jeffreys's work. An earlier attempt at a Bayesian analysis of the correlation coefficient can be found in JEFFREYS (1935). Before presenting the results, we first discuss some notations and recall the likelihood for the problem at hand.

2 Notation and result

Let (X1,X2) have a bivariate normal distribution with mean urn:x-wiley:stan:media:stan12111:stan12111-math-0001 and covariance matrix
urn:x-wiley:stan:media:stan12111:stan12111-math-0002
where urn:x-wiley:stan:media:stan12111:stan12111-math-0003 and urn:x-wiley:stan:media:stan12111:stan12111-math-0004 are the population variances of X1 and X2, and where ρ is
urn:x-wiley:stan:media:stan12111:stan12111-math-0005(1)
Pearson's correlation coefficient ρ measures the linear association between X1 and X2. In brief, the model is parametrized by the five unknowns θ=(μ1,μ2,σ1,σ2,ρ).
Bivariate normal data consisting of n pairs of observations can be sufficiently summarized as urn:x-wiley:stan:media:stan12111:stan12111-math-0006, where
urn:x-wiley:stan:media:stan12111:stan12111-math-0007
is the sample correlation coefficient, urn:x-wiley:stan:media:stan12111:stan12111-math-0008 the sample mean, and urn:x-wiley:stan:media:stan12111:stan12111-math-0009 the average sums of squares. The bivariate normal model implies that the observations y are functionally related to the parameters by the following likelihood function:
urn:x-wiley:stan:media:stan12111:stan12111-math-0010(2)
For inference we use the following class of priors:
urn:x-wiley:stan:media:stan12111:stan12111-math-0011(3)
where η denotes the hyperparameters, that is, η=(α,β,γ,δ). This class of priors is inspired by the one that José Bernardo (2015) used in his talk on reference priors for the bivariate normal distribution at the ‘11th International Workshop on Objective Bayes Methodology in honor of Susie Bayarri’. This class of priors contains certain recommended priors as special cases.
If we set α=1,β=γ=δ=0 in Equation 3, we retrieve the prior that Jeffreys recommended for both estimation and testing (JEFFREYS, 1961, pp. 174–179 and 289–292). This recommendation is not the prior derived from Jeffreys's rule based on the Fisher information (e.g., LYet al., 2017), as discussed in BERGER and SUN (2008). With α=1,β=γ=δ=0, thus, a uniform prior on ρ, Jeffreys showed that the marginal posterior for ρ is approximately proportional to ha(n,r|ρ), where
urn:x-wiley:stan:media:stan12111:stan12111-math-0012
represents the ρ-dependent part of the likelihood Equation 2 with θ0=(μ1,μ2,σ1,σ2) integrated out. For n large enough, the function ha is a good approximation to the true reduced likelihood hγ,δ given below.

If we set α=β=γ=δ=0 in Equation 3, we retrieve Lindley's reference prior for ρ. LINDLEY (1965, pp. 214–221) established that the posterior of urn:x-wiley:stan:media:stan12111:stan12111-math-0013 is asymptotically normal with mean urn:x-wiley:stan:media:stan12111:stan12111-math-0014 and variance n−1, which relates the Bayesian method of inference for ρ to that of Fisher. In Lindley's (1965, p. 216) derivation, it is explicitly stated that the likelihood with θ0 integrated out cannot be expressed in terms of elementary functions. In his analysis, Lindley approximates the true reduced likelihood hγ,δ with the same ha that Jeffreys used before. BAYARRI (1981) furthermore showed that with the choice γ=δ=0, the marginalization paradox (DAWID et al., 1973) is avoided.

In their overview, BERGER and SUN (2008) showed that for certain a,b with α=b/2−1,β=0,γ=a−2, and δ=b−1, the priors in Equation 3 correspond to a subclass of the generalized Wishart distribution. Furthermore, a right-Haar prior (e.g., SUN AND BERGER, 2007) is retrieved when we set α=β=0,γ=−1,δ=1 in Equation 3. This right-Haar prior then has a posterior that can be constructed through simulations, that is, by simulating from a standard normal distribution and two chi-squared distributions (BERGER AND SUN, 2008, Table 1). This constructive posterior also corresponds to the fiducial distribution for ρ (e.g., FRASER, 1961; HANNIGet al., 2006). Another interesting case is given by α=0,β=1,γ=δ=0, which corresponds to the one-at-a-time reference prior for σ1 and σ2; see also Jeffreys (1961 p. 187).

The analytic posteriors for ρ follow directly from exact knowledge of the reduced likelihood hγ,δ(n,r|ρ), rather than its approximation used in previous work. We give full details, because we did not encounter this derivation in earlier work.

Theorem 1.The reduced likelihood hγ,δ(n,r|ρ) If |r|<1,n>γ+1 , and n>δ+1 , then the likelihood f(y|θ) times the prior Equation 3 with θ0=(μ1,μ2,σ1,σ2) integrated out is a function fγ,δ that factors as

urn:x-wiley:stan:media:stan12111:stan12111-math-0015(4)
The first factor is the marginal likelihood with ρ fixed at zero, which does not depend on r nor on ρ , that is,
urn:x-wiley:stan:media:stan12111:stan12111-math-0016(5)
where urn:x-wiley:stan:media:stan12111:stan12111-math-0017 . We refer to the second factor as the reduced likelihood, a function of ρ which is given by a sum of an even function and an odd function, that is, hγ,δ=Aγ,δ+Bγ,δ , where
urn:x-wiley:stan:media:stan12111:stan12111-math-0018(6)
urn:x-wiley:stan:media:stan12111:stan12111-math-0019(7)
where urn:x-wiley:stan:media:stan12111:stan12111-math-0020 and where 2F1 denotes Gauss' hypergeometric function.

Proof.To derive fγ,δ(y|ρ), we have to perform three integrals: (i) with respect to urn:x-wiley:stan:media:stan12111:stan12111-math-0021, (ii) urn:x-wiley:stan:media:stan12111:stan12111-math-0022, and (iii) urn:x-wiley:stan:media:stan12111:stan12111-math-0023.

  1. The integral with respect to urn:x-wiley:stan:media:stan12111:stan12111-math-0024 yields
    urn:x-wiley:stan:media:stan12111:stan12111-math-0025(8)
    where we abbreviated urn:x-wiley:stan:media:stan12111:stan12111-math-0026. The factor pγ,δ(y0) follows directly by setting ρ to zero in Equation 8 and two independent gamma integrals with respect to σ1 and σ2 resulting in Equation 5. These gamma integrals cannot be used when ρ is not zero. For fγ,δ(y|ρ), which is a function of ρ, we use results from special functions theory.
  2. For the second integral, we collect only that part of Equation 8 that involves σ1 into a function g, that is,
    urn:x-wiley:stan:media:stan12111:stan12111-math-0027
    The assumption n>γ+1 and the substitution urn:x-wiley:stan:media:stan12111:stan12111-math-0028 allow us to solve this integral using Lemma A.1, which we distilled from the Bateman manuscript project (ERDéLYI et al., 1954), with urn:x-wiley:stan:media:stan12111:stan12111-math-0029 and c=nγ−1. This yields
    urn:x-wiley:stan:media:stan12111:stan12111-math-0030(9)
    where
    urn:x-wiley:stan:media:stan12111:stan12111-math-0031(10)
    urn:x-wiley:stan:media:stan12111:stan12111-math-0032(11)
    and where 1F1 denotes the confluent hypergeometric function. The functions Åγ and B¨γ are the even and odd solutions of Weber's differential equation in the variable urn:x-wiley:stan:media:stan12111:stan12111-math-0033, respectively.
  3. With urn:x-wiley:stan:media:stan12111:stan12111-math-0034, we see that fγ,δ(y|ρ) follows from integrating σ2 out of the following expression:
    urn:x-wiley:stan:media:stan12111:stan12111-math-0035
    where
    urn:x-wiley:stan:media:stan12111:stan12111-math-0036(12)
    Hence, the last integral with respect to σ2 only involves the functions k and l in Equation 12. The assumption n>δ+1 and the substitution urn:x-wiley:stan:media:stan12111:stan12111-math-0037, thus, urn:x-wiley:stan:media:stan12111:stan12111-math-0038 allow us to solve this integral using Equation (7.621.4) from GRADSHTEYN AND RYZHIK (2007, p. 822) with urn:x-wiley:stan:media:stan12111:stan12111-math-0039. This yields
    urn:x-wiley:stan:media:stan12111:stan12111-math-0040
    After we combine the results, we see that urn:x-wiley:stan:media:stan12111:stan12111-math-0041, where
    urn:x-wiley:stan:media:stan12111:stan12111-math-0042
    Hence, fγ,δ(y|ρ) is of the asserted form. Note that urn:x-wiley:stan:media:stan12111:stan12111-math-0043 is even, while urn:x-wiley:stan:media:stan12111:stan12111-math-0044 is an odd function of ρ.

This main theorem confirms Lindley's insights; hγ,δ(n,r|ρ) is indeed not expressible in terms of elementary functions, and the prior on ρ is updated by the data only through its sampled version r and the sample size n. As a result, the marginal likelihood for data y then factors into pη(y)=pγ,δ(y0)pα,β(n,r;γ,δ), where urn:x-wiley:stan:media:stan12111:stan12111-math-0045 is the normalizing constant of the marginal posterior of ρ. More importantly, the fact that the reduced likelihood is the sum of an even function and an odd function allows us to fully characterize the posterior distribution of ρ for the priors Equation 3 in terms of its moments. These moments are easily computed, as the prior πα,β(ρ) itself is symmetric around zero. Furthermore, the prior πα,β(ρ) can be normalized as
urn:x-wiley:stan:media:stan12111:stan12111-math-0046(13)
where urn:x-wiley:stan:media:stan12111:stan12111-math-0047 denotes the beta function. The case with β=0 is also known as the (symmetric) stretched beta distribution on (−1,1) and leads to Lindley's reference prior when we ignore the normalization constant, that is, urn:x-wiley:stan:media:stan12111:stan12111-math-0048, and, subsequently, let urn:x-wiley:stan:media:stan12111:stan12111-math-0049.

Corollary 1.Characterization of the marginal posteriors of ρ If n>γ+δ−2α+1 , then the main theorem implies that the marginal likelihood with all the parameters integrated out factors as pη(y)=pγ,δ(y0)pα,β(n,r;γ,δ) where

urn:x-wiley:stan:media:stan12111:stan12111-math-0050(14)
defines the normalizing constant of the marginal posterior for ρ . Observe that the integral involving Bγ,δ is zero, because Bγ,δ is odd on (−1,1) . More generally, the k th posterior moment of ρ is
urn:x-wiley:stan:media:stan12111:stan12111-math-0051(15)
These posterior moments define the series
urn:x-wiley:stan:media:stan12111:stan12111-math-0052(16)
where urn:x-wiley:stan:media:stan12111:stan12111-math-0053 is the normalization constant of the prior Equation 13, Wγ,δ(n) is the ratios of gamma functions as defined under Equation 7, and urn:x-wiley:stan:media:stan12111:stan12111-math-0054 refers to the Pochhammer symbol for rising factorials. The terms ak,m and bk,m are
urn:x-wiley:stan:media:stan12111:stan12111-math-0055
The series defined in Equation 16 are hypergeometric when β is a non-negative integer.

Proof.The series E(ρk|n,r) result from term-wise integration of the hypergeometric functions in Aγ,δ and Bγ,δ. The assumption n>γ+δ−2α+1 and the substitution x=ρ2 allow us to solve these integrals using Equation (3.197.8) in GRADSHTEYN AND RYZHIK (2007, p. 317) with their urn:x-wiley:stan:media:stan12111:stan12111-math-0056 and urn:x-wiley:stan:media:stan12111:stan12111-math-0057 when k is even, while we use urn:x-wiley:stan:media:stan12111:stan12111-math-0058 when k is odd. A direct application of the ratio test shows that the series converge when |r|<1.

3 Analytic posteriors for the case β=0

For most of the priors discussed earlier, we have β=0, which leads to the following simplification of the posterior.

Corollary 1. (Characterization of the marginal posteriors of ρ, when β=0)If n>γ+δ−2α+1 and |r|<1 , then the marginal posterior for ρ is

urn:x-wiley:stan:media:stan12111:stan12111-math-0059(17)
where pα(n,r;γ,δ) refers to the normalizing constant of the (marginal) posterior of ρ , which is given by
urn:x-wiley:stan:media:stan12111:stan12111-math-0060
More generally, when β=0 , the k th posterior moment is
urn:x-wiley:stan:media:stan12111:stan12111-math-0061
when k is even, and
urn:x-wiley:stan:media:stan12111:stan12111-math-0062
when k is odd.

Proof.The assumption n>γ+δ−2α+1 and the substitution x=ρ2 allow us to use Equation (7.513.12) in GRADSHTEYN AND RYZHIK (2007, p. 814) with urn:x-wiley:stan:media:stan12111:stan12111-math-0063 and urn:x-wiley:stan:media:stan12111:stan12111-math-0064 when k is even, while we use urn:x-wiley:stan:media:stan12111:stan12111-math-0065 when k is odd. The normalizing constant of the posterior pα(n,r;γ,δ) is a special case with k=0.

The marginal posterior for ρ updated from the generalized Wishart prior, the right-Haar prior, and Jeffreys's recommendation then follow from a direct substitution of the values for α,γ, and δ as discussed under Equation 3. Lindley's reference posterior for ρ is given by
urn:x-wiley:stan:media:stan12111:stan12111-math-0066
which follows from Equation 17 by setting γ=δ=0 and, subsequently, letting urn:x-wiley:stan:media:stan12111:stan12111-math-0067.

Lastly, for those who wish to sample from the posterior distribution, we suggest the use of an independence-chain Metropolis algorithm (TIERNEY, 1994) using Lindley's normal approximation of the posterior of urn:x-wiley:stan:media:stan12111:stan12111-math-0068 as the proposal. This method could be used when Pearson's correlation is embedded within a hierarchical model, as the posterior for ρ will then be a full conditional distribution within a Gibbs sampler. For α=1,β=γ=δ=0,n=10 observations and r=0.6, the acceptance rate of the independence-chain Metropolis algorithm was already well above 75%, suggesting a fast convergence of the Markov chain. For n larger, the acceptance rate further increases. The R code for the independence-chain Metropolis algorithm can be found on the first author's home page. In addition, this analysis is also implemented in the open-source software package JASP (https://jasp-stats.org/).

Acknowledgements

This work was supported by the starting grant ‘Bayes or Bust’ awarded by the European Research Council (grant number 283876). The authors thank Christian Robert, Fabian Dablander, Tom Koornwinder, and an anonymous reviewer for helpful comments that improved an earlier version of this manuscript.

    Appendix A: A Lemma distilled from the Bateman project

    Lemma A.1.For a,c>0, the following equality holds:

    urn:x-wiley:stan:media:stan12111:stan12111-math-0069(A.1)
    that is, the integral is solved by the functions
    urn:x-wiley:stan:media:stan12111:stan12111-math-0070(A.2)
    which define the even and odd solutions to Weber's differential equation in the variable urn:x-wiley:stan:media:stan12111:stan12111-math-0071, respectively.

    Proof.By ERDéLYIet al. (1954, p 313, Equation (13)), we note that

    urn:x-wiley:stan:media:stan12111:stan12111-math-0072(A.3)
    where Dλ(z) is WHITTAKER'S (1902) parabolic cylinder function (ABRAMOWITZ and STEGUN, 1992). By virtue of Equation (4) on p. 117 of ERDéLYI et al. (1981), we can decompose Dλ(z) into a sum of an even function and an odd function. Replacing this decomposition for Dλ(z) in Equation A.3 and an application of the duplication formula of the gamma function yields the statement.

    • We thank an anonymous reviewer for clarifying how Jeffreys derived this approximation.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.