Volume 36, Issue 11 pp. 4272-4286
Research Article
Full Access

G-protein genomic association with normal variation in gray matter density

Jiayu Chen

Jiayu Chen

The Mind Research Network, Albuquerque, New Mexico

Search for more papers by this author
Vince D. Calhoun

Vince D. Calhoun

The Mind Research Network, Albuquerque, New Mexico

Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, New Mexico

Search for more papers by this author
Alejandro Arias-Vasquez

Alejandro Arias-Vasquez

Department of Human Genetics, Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Department of Cognitive Neuroscience, Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Department of Psychiatry, Radboud University Nijmegen Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Search for more papers by this author
Marcel P. Zwiers

Marcel P. Zwiers

Centre for Cognitive Neuroimaging, Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Search for more papers by this author
Kimm van Hulzen

Kimm van Hulzen

Department of Human Genetics, Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Search for more papers by this author
Guillén Fernández

Guillén Fernández

Department of Cognitive Neuroscience, Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Search for more papers by this author
Simon E. Fisher

Simon E. Fisher

Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands

Centre for Neuroscience, Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Search for more papers by this author
Barbara Franke

Barbara Franke

Department of Human Genetics, Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Department of Psychiatry, Radboud University Nijmegen Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

Search for more papers by this author
Jessica A. Turner

Jessica A. Turner

The Mind Research Network, Albuquerque, New Mexico

Psychology Department, Georgia State University, Atlanta, Georgia

Neuroscience Institute, Georgia State University, Atlanta, Georgia

Search for more papers by this author
Jingyu Liu

Corresponding Author

Jingyu Liu

The Mind Research Network, Albuquerque, New Mexico

Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, New Mexico

Correspondence to: Jiayu Chen, The Mind Research Network 1101 Yale Blvd. NE. Albuquerque, NM, USA 87106-3834. E-mail: [email protected]Search for more papers by this author
First published: 07 August 2015
Citations: 14

Abstract

While detecting genetic variations underlying brain structures helps reveal mechanisms of neural disorders, high data dimensionality poses a major challenge for imaging genomic association studies. In this work, we present the application of a recently proposed approach, parallel independent component analysis with reference (pICA-R), to investigate genomic factors potentially regulating gray matter variation in a healthy population. This approach simultaneously assesses many variables for an aggregate effect and helps to elicit particular features in the data. We applied pICA-R to analyze gray matter density (GMD) images (274,131 voxels) in conjunction with single nucleotide polymorphism (SNP) data (666,019 markers) collected from 1,256 healthy individuals of the Brain Imaging Genetics (BIG) study. Guided by a genetic reference derived from the gene GNA14, pICA-R identified a significant SNP-GMD association (r = −0.16, P = 2.34 × 10−8), implying that subjects with specific genotypes have lower localized GMD. The identified components were then projected to an independent dataset from the Mind Clinical Imaging Consortium (MCIC) including 89 healthy individuals, and the obtained loadings again yielded a significant SNP-GMD association (r = −0.25, P = 0.02). The imaging component reflected GMD variations in frontal, precuneus, and cingulate regions. The SNP component was enriched in genes with neuronal functions, including synaptic plasticity, axon guidance, molecular signal transduction via PKA and CREB, highlighting the GRM1, PRKCH, GNA12, and CAMK2B genes. Collectively, our findings suggest that GNA12 and GNA14 play a key role in the genetic architecture underlying normal GMD variation in frontal and parietal regions. Hum Brain Mapp 36:4272–4286, 2015. © 2015 Wiley Periodicals, Inc.

INTRODUCTION

Studying associations between genetic variables and imaging traits is a valuable strategy, a maturing field known as imaging genetics, which holds the promise to help better understand the genetic underpinnings of cognition and reveal biological mechanisms of mental disorders [Thompson et al., 2014]. Structural magnetic resonance imaging (sMRI) provides a noninvasive approach to study the morphology of the living brain. Quantitative measures are derived from T1-weighted MRI images using computational methods such as voxel-based morphometry (VBM) [Ashburner and Friston, 2005] or cortical surface reconstruction (FreeSurfer) [Fischl and Dale, 2000] to depict various structural features, including brain volume, gray matter volume, cortical thickness, and surface area. These features can then be compared at subject level for their functional implications. Under this strategy, biomarkers have been consistently identified from both healthy and diseased human brains, characterizing neural development, ageing, Alzheimer's disease, schizophrenia, and so on [Ellison-Wright et al., 2008; Frisoni et al., 2010; Giedd and Rapoport, 2010; Raz and Rodrigue, 2006]. More importantly, many attributes of brain structure, including both global and regional traits, are confirmed to be genetically influenced in twin studies, with heritability estimated around 40–90% [Peper et al., 2007; Thompson et al., 2001; Winkler et al., 2010].

Heterotrimeric guanine nucleotide-binding proteins (G-proteins) serve as molecular switches in intracellular singling cascades, where they sense signals from cell-surface receptors that are activated by extracellular stimuli and transduce signals to downstream effectors [Oldham and Hamm, 2008]. It has been documented that activated G-proteins directly interact with a variety of effector proteins, including phosphodiesterase E, phospholipase D, phospholipase C, inducible nitric oxide synthase, calcium channels, and the G protein-regulated inducer of neurite outgrowth 1 and 2 [Cabrera-Vera et al., 2003]. Particularly, G-proteins are richly expressed in the brain and involved in cortical development, neuronal growth as well as neuronal signaling [Bromberg et al., 2008; Hamm, 1998; Offermanns, 2001]. For instance, there is evidence that G protein alpha 12 and alpha 13 can mediate growth cone collapse and neurite retraction [Nurnberg et al., 2008], while deficiency of the G-protein α-subunits causes localized overmigration of neurons in the developing cerebral and cerebellar cortices [Moers et al., 2008]. Knockout of G protein beta 5 impairs brain development and causes multiple neurologic abnormalities in mice [Zhang et al., 2011]. Neural expression of G protein-coupled receptor 3, 6, and 12 has been implicated in up-regulating cyclic AMP (cAMP) levels in neurons and stimulating neurite outgrowth [Tanaka et al., 2007].

Given their roles in neural development, G-proteins pose promising candidates for imaging genetic association studies. More recently, Chavarria-Siles et al. [2013] investigated 502 single nucleotide polymorphisms (SNPs) in 25 G-protein genes for their associations with brain-wide gray matter volume using a mass univariate model. They identified seven SNPs to be significantly associated with gray matter volume variation in different brain regions, including the medial frontal cortex. While this work provides direct evidence that G-protein SNPs are related to brain structure, the univariate analysis is not able to assess the aggregate effects of multiple variants. In addition, considering that G-proteins are key cell signaling molecules and affect multiple biological processes, their effect on neurobiological conditions is likely not isolated but involves a large network. Therefore, a multivariate study is strongly encouraged to search for G-protein involved genetic components that may better delineate the basis of the gray matter variation.

In the present work, we applied a semi-blind multivariate approach, parallel independent component analysis with reference (pICA-R) [Chen et al., 2013], to explore G-protein involved genomic factors underlying brain structure in a large homogeneous cohort of 1,256 healthy Caucasians. Brain-wide gray matter density (GMD) images were analyzed in conjunction with genome-wide SNP data. G-proteins identified in the work by Chavarria-Siles et al. [2013] served as references to guide the analysis but the method still allows for other genetic variants to be revealed as well.

MATERIALS AND METHODS

Participants

BIG

A large cohort was used for discovery to increase the statistical power. This step was performed using the Brain Imaging Genetics (BIG) dataset, an ongoing effort being conducted at the Radboud University Nijmegen together with the Max Planck Institute for Psycholinguistics (Nijmegen, the Netherlands) [Bralten et al., 2011; Cousijn et al., 2012]. The regional medical ethics committee approved the study and all subjects provided written informed consent. Specifically in this work, a total of 1,256 healthy Caucasians were admitted into the investigation, including 617 males (age: 23.28 ± 4.11 years) and 843 females (age: 22.67 ± 3.61 years) for which both neuroimaging and genotype data were collected [Guadalupe et al., 2014]. All subjects are typically highly educated and free of neurological or psychiatric history according to self-report. To be noted, an overlap of subjects might exist between the present study and Chavarria-Siles et al.'s study. The latter employed 532 healthy BIG subjects.

MCIC

In the validation step, we used the subjects from the Mind Clinical Imaging Consortium (MCIC) study [Gollub et al., 2013], a collaborative effort of four research teams from University of New Mexico-Mind Research Network, Massachusetts General Hospital, University of Minnesota, and University of Iowa. The institutional review board at each site approved the study and all subjects provided written informed consent. Out of 255 subjects, 89 were healthy Caucasians, including 51 males (age: 31.41 ± 10.39 years) and 38 females (age: 33.61 ± 11.50 years). These subjects were employed to validate the results obtained from the BIG data. All healthy subjects were screened to ensure that they were free of any medical, neurological, or psychiatric illnesses, including any history of substance abuse.

Neuroimaging

BIG

Structural images were acquired at the Donders Centre for Cognitive Neuroimaging (Nijmegen, The Netherlands) using different scanners, i.e., 1.5 T Siemens Avanto and Sonata, as well as 3.0 T Siemens Trio and TIM Trio. Transmitting and receiving coils also differed across subjects. A standard sagittal T1-weighted three-dimensional magnetization prepared rapid gradient echo (MP-RAGE) sequence was employed, while some variations were observed in repetition, inversion, and echo time, as well as pixel bandwidth and flip angle. The use of parallel imaging with an acceleration factor of 2 was also included. Table 1 summarizes the settings used in BIG sMRI scans.

Table 1. Summary of BIG scanning parameters
Scanning parameter Variations across subjects
Station name avanto (462), sonata (160), trio (52), triotim (582),
Sequence name *tfl3d1 (13), *tfl3d1_ns (983), spc3d1rr282ns (5), tfl3d1 (1), tfl3d1_ns (254),
Repetition time 1660 (3), 1960 (13), 2250 (539), 2300 (615), 2730 (81), 3200 (5),
Echo time 2.02 (3), 2.86 (1), 2.92 (22), 2.94 (1), 2.95 (462), 2.96 (183), 2.99 (14), 3.03 (348), 3.04 (1), 3.08 (1), 3.11 (1), 3.13 (1), 3.55 (1), 3.68 (148), 3.93 (51), 4.43 (7), 4.58 (3), 401 (5), 5.59 (3),
Inversion time 1000 (81), 1100 (627), 750 (3), 850 (539), 900 (1), null (5),
Magnetic field strength 1.494 (101), 1.5 (521), 2.89362 (52), 3 (582),
Number of phase encoding steps 176 (3), 196 (5), 253 (5), 255 (565), 256 (677), 320 (1),
Pixel bandwidth 130 (636), 140 (611), 240 (1), 260 (3), 751 (5),
Transmitting coil body (1068), cp_head (49), txrx_head (139),
Flip angle 120 (5), 15 (539), 7 (81), 8 (630), 9 (1),
Tcoil ID/receiving coil 32ch_head (215), 8ch_head (573), body (1), cp_headarray (256), headmatrix (70), null (2), txrx_head (139),
  • Each scanning setting is followed by the number of subjects that have been scanned using this setting.

MCIC

The MCIC structural images were coronal T1-weighted MRIs collected at multiple sites. Table 2 lists the setting used in the scans. It can be seen that scanners differed among 1.5 T Siemens Sonata and GE Signa, as well as 3.0 T Siemens Trio. Closely matched acquisition sequences were used. However, repetition time and pulse sequence varied somewhat.

Table 2. Summary of MCIC scanning parameters
Site M021 (51) M552 (92) M554 (47) M871 (44)
Scanner Siemens Avanto GE Signa Siemens Trio Siemens Sonata
Scanning sequence GR RM IR\GR GR
Sequence name *fl3d1_ns N/A *tfl3d1_ns (19), tfl3d1_ns (28) *fl3d1_ns (35), fl3d1_ns (9)
Slice thickness (mm) 1.5 1.5 (20), 1.6 (38), 1.7 (31), 1.8 (3) 1.5 1.5
TR/TE (ms) 12/4.76 20/6 2530/3.81 12/4.76
Number of averages 1 1 (2), 2 (90) 1 1
Magnetic field strength (T) 1.494 1.5 2.8936 1.494
Number of phase encoding steps 256 N/A 256 288
Percent phase field of view 100 100 100 100
Pixel bandwidth 160 122 180 110
Receiving coil cp_head 8ch_head 8ch_head 8ch_head
Acquisition matrix 0 256 256 0 0 256 256 0 0 256 256 0 0 256 256 0
Flip angle 20 30 7 20
Pixel spacing 0.625 0.625 (11), 0.70313 0.70313 (40) 0.625 0.625 (32), 0.66406 0.66406 (16), 0.70313 0.70313 (44) 0.625 0.625 0.625 0.625 (42), 0.70313 0.70313 (2)
  • Each site is followed by the number of subjects that have been scanned using this setting, which also applies if a scanning setting varies within a site.

Preprocessing

The T1-weighted sMRI data were preprocessed at the Mind Research Network with Statistical Parametric Mapping 5 (SPM5, http://www.fil.ion.ucl.ac.uk/spm) using unified segmentation [Ashburner and Friston, 2005] in which image registration, bias correction, and tissue classification are performed using a single integrated algorithm. In this way, brains were segmented into gray matter, white matter, and cerebrospinal fluid and nonlinearly transformed into the ICBM152 standard space without Jacobian modulation. The resulting GMD images were re-sliced to 2 × 2 × 2 mm, resulting in 91 × 109 × 91 voxels. In the subsequent quality check, we excluded outliers whose correlations between the individual images and the across-subject average image were 4 standard deviations less than the mean. Based on this criterion, four subjects were excluded from the BIG data and no outliers were identified for the MCIC data. A mask was then generated (mean GMD > 0) to include only the segmented gray matter voxels, resulting in a total of 298,707 voxels for the BIG data and 292,998 voxels for the MCIC data. A voxelwise linear regression was performed to remove age and sex effects to avoid capturing associations majorly driven by these factors in the subsequent analysis. Furthermore, considering that the images were acquired with various scanning platforms, we employed the source-based-morphometry (SBM) approach [Xu et al., 2009] to investigate and eliminate the image variability introduced by scanning settings [Chen et al., 2014b]. Specifically, the data were decomposed into a linear combination of underlying sources using independent component analysis (ICA) [Amari, 1998; Bell and Sejnowski, 1995]. The loadings of each component were then assessed for associations with available scanning parameters. Components significantly affected by scanning settings were then identified and eliminated from the original data. Following this, nine scanning-related components were removed for the BIG data (Table 1) and eight scanning-related components removed for the MCIC data (Table 2). After correction, we did not observe any significant scanning effects. Finally, the corrected data were smoothed with an 8-mm full width at half-maximum Gaussian kernel and thresholded at mean GMD > 0.1 (resulting in 274,131 voxels) for the subsequent association analysis.

Genotyping

BIG

Saliva samples were collected from participants for DNA extraction. Genotyping was then conducted using the Affymetrix GeneChip SNP 6.0 array spanning more than 906,600 SNPs. The call rate threshold was set to 90%.

MCIC

DNA was extracted from blood samples. Genotyping for all participants was performed using the Illumina Infinium HumanOmni1-Quad assay spanning 1,140,419 SNP loci. BeadStudio was used to make the final genotype calls.

Imputation

To maximize the number of overlapping SNPs between two datasets, we imputed the MCIC data up to 5M SNPs using the MACH/Minimac pipeline [Howie et al., 2012] leveraging a large reference panel of the 1,000 Genomes data [Altshuler et al., 2012], as described in the ENIGMA protocol (http://enigma.ini.usc.edu/). We further excluded those imputed SNPs whose estimates of the squared correlations between imputed genotypes and true unobserved genotypes (rsq) were lower than 0.3, as recommended by the developer of the tool. To be consistent with the BIG data, the MCIC genotype data were obtained based on the continuous imputation data using the Genome-wide Complex Trait Analysis (GCTA) tool [Yang et al., 2011].

Data cleaning

PLINK software [Purcell et al., 2007] was used to perform a series of standard quality control procedures. SNPs and subjects were first examined for a genotyping rate threshold of 90%; SNPs were excluded if they showed deviation from Hardy–Weinberg Equilibrium with a threshold of 10−6 or if they failed to be missing at random with a threshold of 10−10; minor allele frequency cut-off was set to 0.01. Potential relatives were excluded if the Identity-By-Descent (IBD) was higher than 0.1875. For the BIG data, we first replaced the missing genotypes using haplotype genotypes of high linkage disequilibrium (LD) loci (correlation > 0.85). After the above procedures, missing genotypes were still observed in 134,607 out of 671,782 autosomal SNPs with missing ratios no greater than 0.05. We then replaced the remaining missing genotypes using the major alleles of individual loci. Discrete numbers were then assigned to the categorical genotypes: 0 for no minor allele, 1 for one minor allele, and 2 for two minor alleles. Finally, 666,019 common loci were genotyped or imputed in both the BIG and MCIC data. These loci were included for the association and validation analysis.

Association Analysis

pICA-R [Chen et al., 2013] was employed to identify relationships between hidden factors of particular attribute within two data modalities, as illustrated in Figure 1. In this work, pICA-R estimates underlying components for the GMD images (Modality 1) and the SNP data (Modality 2) independently and in parallel. The data decomposition builds upon the regular infomax ICA framework [Bell and Sejnowski, 1995], which identifies sets of co-varying variables that are independent from each other and organizes them into different components, as described in Eq. 1, where X represents the observed data matrix (subject × voxel/SNP); S, A, and W denote the component, loading/mixing, and unmixing matrix, respectively. The subscript d runs from 1 to 2, denoting the data modality.
urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0001(1)
Details are in the caption following the image

Flow chart of pICA-R. W1 and W2 denote the unmixing matrices of the two modalities, respectively. F1, F2, and F3 represent the objective functions based on which unmixing matrices are updated.

Mathematically, pICA-R iteratively updates the unmixing matrices W1 and W2 to gradually optimize the objective functions F1, F2, and F3, as described in Eqs. 2 and 3. F1 is the objective function of the regular infomax for Modality 1, where independence among components is achieved by maximizing the entropy H. fy(Y) is the probability density function of the sigmoid function Y. E is the expected value and W0 is the bias vector. In contrast, F2 is the objective function for Modality 2, where an additional Euclidean distance metric is imposed to extract maximally independent components, one of which also closely resembles the reference r. To avoid false positives, the constraint weight λ is adaptively adjusted so that the distance metric is not over-emphasized. The inter-modality correlation function F3 is designed to maximize the correlations computed over the columns of the loading matrices A1 and A2, capturing connections between pairs of inter-modality components.
urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0002(2)
urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0003(3)

The reference r is a binary vector with the same number of loci as the genomic data, where the selected referential loci are set to “1” and the rest are “0”s. This binary reference serves as a mask such that the distance between the component and reference vector is optimized particularly for the referential loci only. This design enables a semi-blind decomposition, where the presumed causal loci are constrained to be highlighted in the resulting component while the remaining loci are allowed to show their own importance driven by the data. For a full description of the mathematical details of pICA-R, we refer readers to the original publication [Chen et al., 2013].

Specifically in this work, the genetic references were derived from genes encoding heterotrimeric G-proteins. We leveraged the results from an imaging genetics study reporting associations between gray matter volume variations and SNPs in G-protein coding genes [Chavarria-Siles et al., 2013]. Among the seven genes identified in that previous work, GNA15, GNAO1, and GNB5 covered less than 10 SNPs in our data, therefore were not tested as references given that simulation results suggested an optimal reference of 20 true causal loci [Chen et al., 2013]. For the remaining four genes, GNG2, GNAQ, GNA14, and GNAL, we identified the corresponding SNPs in each gene and grouped neighboring SNPs with moderate correlations (r2 > 0.2, [Ripke et al., 2011]) into a cluster. The largest cluster within each gene was then used as the reference. This is because SNPs in one gene do not necessarily contribute simultaneously to only a single component, which is against the design of pICA-R. Instead, SNPs in one LD cluster are more likely to contribute to the same component and serve as good candidates for reference given that a marker allele in LD with the causal variant should show (by proxy) an association with the trait of interest [Stranger et al., 2011]. In this way, we tested four references hosted by GNG2, GNAQ, GNA14, and GNAL.

Like infomax ICA, pICA-R requires estimating the number of components before data decomposition. Minimum description length (MDL) [Rissanen, 1978] is commonly used for the imaging modality to select the component number yielding the most efficient representation of the original data. However in the SNP data, most genetic factors account for small amounts of variance, except for those related to population structure. Thus, MDL is much less applicable to select the component number such that the major variance in the data is retained. Instead, we chose to estimate the component number based on components’ consistency, to obtain the most stable data decomposition [Chen et al., 2012].

Validation

The genetic references were implicated in the work by Chavarria-Siles et al. [2013] which used a portion of the current BIG data. Unfortunately, we did not have a detailed list of the subjects used in that work and could not investigate whether our finding would hold in the non-overlapping portion of the BIG data. Instead, we used the independent MCIC data to assess the validity of the imaging genetic associations identified in the BIG data. As pICA-R's performance would significantly degrade given the MCIC sample size of 89 and 666,019 SNPs [Chen et al., 2013], we chose to project the sMRI and SNP components identified in the BIG data to the MCIC data to obtain the new loading coefficients, as described in Eq. 4:
urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0004(4)
where the subscript “d” runs from 1 to 2, denoting the sMRI and SNP modality, respectively. urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0005represents the pseudo inverse of the component matrix extracted from the BIG data. urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0007 and urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0008denote respectively the submatrices of urn:x-wiley:10659471:media:hbm22916:hbm22916-math-0010and the MCIC observed data, corresponding to the top voxels or SNPs presenting relatively stronger correlations with the loadings. AMCICrepresents the loaing matrix estimated through projecting the top voxels or SNPs. We expect that the loadings obtained in this way should more accurately reflect the effects of the most important markers. Finally, the correlations between the projected loadings were calculated to assess whether the sMRI-SNP association identified in the BIG data remained significant in the MCIC data.

Analyzing Components

To understand the influences on cognition, the identified SNP and sMRI components were further investigated for associations with available phenotypic data, which was the reversal learning score collected for 599 out of 1,256 BIG subjects. In addition, top contributing voxels of the identified sMRI component were mapped to the Talairach atlas [Lancaster et al., 1997, 2000] for the involved brain regions. Meanwhile, top contributing SNPs were annotated and the hosting genes were sent for Ingenuity Pathway analysis (IPA, http://www.ingenuity.com/) to reveal the genetic architecture.

RESULTS

The number of components was estimated to be 10 and 11 for the sMRI and SNP data, respectively. Among all the tested references, the one derived from GNA14 elicited a significant sMRI-SNP association (r = −0.16, P = 2.27 × 10−8). This exceeded the conservative Bonferroni threshold of 1.14 × 10−4 which corrected for all the tested four genetic references and the combinations of all extracted components (10 sMRI × 11 SNP) in each run. The reference comprised 22 out of 83 SNPs from GNA14, as listed in Supporting Information Table S1. These SNPs are in moderate LD, presenting a mean r-square of 0.54, and distanced by an average of 2,557 base pairs. The sMRI-SNP association remained significant, exhibiting a correlation of −0.16 (P = 2.34 × 10−8) when the SNP component was controlled for age and sex, as shown in Figure 2a. Besides, the SNP-sMRI association remained significant (P = 3.30 × 10−7) when the top three principal components of the SNP data were further included as covariates [Ripke et al., 2011], indicating that the observed association was not likely attributable to ancestral background. The associated sMRI component was thresholded at |Z| > 2 for top voxels. The number of top SNPs was determined based on the absolute values of the component z-scores using a more conservative linear fitting approach (for details see Supporting Information Figure S1), yielding a total of 2,000 top SNPs corresponding to |Z| > 3.37. We then projected the top voxels and SNPs to the MCIC data, and the resulting loadings again yielded a significant correlation of −0.25 (P = 0.02) when controlling for age and sex, as shown in Figure 2b. To be noted, the projected association was robust to the threshold selection, remaining significant with a number of top SNPs ranging from 1,000 to 2,500. No association was observed between the identified sMRI/SNP component and the reversal learning score.

Details are in the caption following the image

Scatter plots of (a) the single nucleotide polymorphism (SNP) and sMRI loadings identified in the BIG data; and (b) the SNP and sMRI loadings obtained from the MCIC data through projection.

The top contributing voxels of the sMRI component were further mapped to nearest gray matter to obtain the Talairach atlas labels [Lancaster et al., 1997, 2000]. Figure 3 and Table 3 show the spatial map and the mapped regions, respectively. It can be seen that the identified brain network comprised superior and medial frontal gyri, precuneus, and cingulate gyrus. For the SNP component, Supporting Information Table S2 provides a summary of the top contributing SNPs, including chromosome, base pair position, corresponding gene, minor allele, and component z-score. From the GNA14 reference, five SNPs were identified as top contributors to the associated SNP component, including rs10869927_G (“G” for the minor allele), rs10781441_T, rs12684903_A, rs2889774_T, and rs7047853_G. Figure 4 shows a Manhattan plot of weights of loci for the identified SNP component, where clusters are marked if they span more than five SNPs with maximum normalized weights greater than 5. Out of the top 2,000 SNPs, 803 are intragenic and mapped to 272 unique genes, including another G-protein, GNA12, and two G-protein-coupled receptors, GPR113 and GPR133. IPA revealed that these 272 genes were significantly enriched in a number of canonical pathways, including synaptic long-term potentiation (LTP) and depression (LTD), axonal guidance, protein kinase A (PKA), CREB, and nNOS signaling, as listed in Table 4. In particular, all the pathways highlighted in bold remained significantly enriched when the number of top SNPs was adjusted from 1,000 to 2,500, indicating a relatively consistent genetic architecture. We also explored through IPA potential interactions between selected genes of interest, including GNA12, GNA14, GRM1, PRKCD, CAMK2B, and PDIA3. The constructed network is illustrated in Supporting Information Figure S2.

Details are in the caption following the image

Spatial map of the identified sMRI component (|Z| > 2). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Details are in the caption following the image

Manhattan plot for the identified single nucleotide polymorphism (SNP) component. The horizontal line indicates the threshold at |Z| > 3.37 for selection of top contributing SNPs. Clusters are marked if they span more than 5 SNPs with maximum normalized weights greater than 5. The GNA14 cluster is also marked. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Table 3. Talairach labels of identified brain regions (|Z| > 2)
Brain region Brodmann area L/R volume (cm3) L/R random effects, max Z (x, y, z)
Medial frontal gyrus 6, 8, 32, 9, 10 13.9/8.7 6.53 (0,16,47)/5.79 (2,−6,65)
Superior frontal gyrus 6, 8, 9, 10 12.3/9.6 7.15 (0,14,53)/6.01 (2,14,53)
Precuneus 7, 19 8.0/6.0 6.44 (0,−47,63)/6.29 (2,−47,63)
Paracentral lobule 5, 4, 6, 31, 7 5.2/4.1 7.01 (0,−41,65)/6.63 (2,−43,63)
Cingulate gyrus 24, 32, 31 4.0/2.0 5.59 (0,2,48)/4.79 (2,18,43)
Postcentral gyrus 5, 7, 4, 3, 2 3.5/3.5 6.75 (−2,−41,65)/6.81 (2,−41,65)
Middle frontal gyrus 6, 8 2.7/2.9 2.88 (−28,1,61)/3.38 (28,5,59)
Table 4. Pathway analysis on the identified SNP component
Ingenuity canonical pathways Molecules P-value
Neuropathic pain signaling in dorsal horn neurons NTRK2, KCNQ2, GRM1, PDIA3, KCNQ3, PRKCH, CAMK2B 6.76 E−04
Inositol pyrophosphates biosynthesis IP6K3, PPIP5K1 4.37 E−03
Histidine degradation III UROC1, AMDHD1 5.75 E−03
Histidine degradation VI UROC1, AMDHD1 7.41 E−03
Synaptic long-term potentiation GRM1, PDIA3, PRKCH, GNA14, ADCY8, CAMK2B 7.59 E−03
Protein kinase A signaling EPM2A, PTPRK, PTPRD, PTPRJ, PDIA3, PDE8B, PRKCH, ADCY8, EYA2, PTPRT, NTN1, CAMK2B 8.51 E−03
Axonal guidance signaling PDIA3, GNA12, GNA14, ADAMTS2, NTN1, SRGAP3, NTRK2, NTRK3, EFNA5, PAK7, PRKCH, WNT5A, GLIS1 1.05 E−02
CREB signaling in neurons GRM1, PDIA3, GNA12, PRKCH, GNA14, ADCY8, CAMK2B 1.32 E−02
Synaptic long-term depression GRM1, PDIA3, GNA12, PLA2R1, PRKCH, GNA14 1.62 E−02
Sulfate activation for sulfonation PAPSS2 2.95 E−02
Formaldehyde oxidation II (glutathione-dependent) ESD 2.95 E−02
Glutamine degradation I GLS 2.95 E−02
nNOS signaling in neurons DLG2, PRKCH, NOS1AP 3.09 E−02
Endothelin-1 signaling PDIA3, GNA12, PLA2R1, PRKCH, GNA14, ADCY8 3.89 E−02
GNRH signaling PAK7, PRKCH, GNA14, ADCY8, CAMK2B 4.07 E−02
RhoGDI signaling GNA12, ARHGAP12, CDH18, PAK7, GNA14, CDH13 4.37 E−02
Methionine salvage II (Mammalian) BHMT2 4.37 E−02

DISCUSSION

In this work, we applied pICA-R to explore genomic basis of brain structure in a homogeneous cohort of healthy Caucasians. G-proteins arose as potential references given that they are implicated in neural development. A previous univariate study [Chavarria-Siles et al., 2013] provided more direct evidence for associations between G-protein SNPs and regional gray matter variations. We then leveraged these results and derived four genetic references which were then investigated in the semi-blind multivariate framework. In the BIG dataset which included 1,256 subjects, a GNA14 reference elicited a significant SNP-sMRI association (r = −0.16, P = 2.34 × 10−8), while the other three tested references did not yield any significant finding. As discussed by Chen et al. [2013], references derived from univariate models do not always yield significant findings in multivariate analyses. This might be due to various factors. One possibility is that the reference SNPs may have heterogeneous effects and contribute to different components. Or the related network as a whole may not be well represented in this specific dataset.

Using a semi-blind multivariate approach, we identified extended SNP and sMRI components compared to the previous work by Chavarria-Siles et al. The identified brain network consisted of medial frontal gyrus, superior frontal gyrus, precuneus, and cingulate regions. To be noted, the medial frontal cortex was also identified by the Chavarria-Siles et al. that employed 532 BIG subjects and used a different method. Regarding the genetic modality, five reference SNPs from GNA14, as expected, were identified as top contributors to the associated SNP component. The SNP highlighted in Chavarria-Siles et al. (rs4745639) was not included in the reference, however it showed moderate LD with rs2889774 and rs10869927 (r2 of 0.15 and 0.14, respectively), and all contributed with negative component weights. The elicited SNP component was enriched in pathways related to neuronal functions, as listed in Table 4. More importantly, the identified association was further replicated in the independent MCIC dataset. Compared to univariate analyses, pICA-R assesses multiple variables for the aggregate effect. For the SNP modality, pICA-R is well posed to model polygenicity and to capture the additive effect of multiple SNPs with moderate individual effects [Chen et al., 2013; Polderman et al., 2015]. For the sMRI modality, pICA-R extracts patterns of variations shared by regional and distant voxels [Xu et al., 2009]. And imaging genetic associations are then evaluated between the SNP and sMRI components’ loadings. Thus, our results delineate a genomic basis underlying a proportion of the GMD variation in the highlighted frontal and parietal regions.

The spatial map of the identified sMRI component clearly highlighted the midline of brain, which is frequently implicated in neural development, ageing, and cognition [Frangou et al., 2004; Lyuksyutova et al., 2003; Shaw et al., 2006; Sowell et al., 2003]. Prefrontal cortex is known to be involved in complex cognitive behavior and decision making [Koechlin and Hyafil, 2007; Koechlin et al., 2003]. Precuneus plays a key role in highly integrated tasks, including episodic memory retrieval, self-referential processes, and consciousness [Laureys et al., 2004; Lundstrom et al., 2005; Ochsner et al., 2004]. Cingulate cortex is an integral part of the limbic system and active in a variety of cognitive functions such as emotion, learning and memory [Bush et al., 2000]. More interestingly, these regions are also among those showing relatively significant structural variations for different age and intelligence groups. A VBM study demonstrated positive correlations between IQ and GMD in voxel clusters distributed along the brain midline, including frontal cortex, cingulate, and precuneus [Frangou et al., 2004]. At superior and medial frontal gyri, the trajectories of cortical thickness change with age are found to be different between the superior and the high/average intelligence groups [Shaw et al., 2006]. Besides, axons extend on either side of the midline (anterior–posterior axis) to form longitudinal tracts and the midline cells are crucial for guiding axon outgrowth [Lyuksyutova et al., 2003]. These observations suggest that the identified brain network captures regional GMD co-variations that can play a role in cognitive abilities and might reflect inter-individual differences in brain development.

The SNP component was overrepresented in a number of pathways communicating with each other and with G-proteins. From those shown in Table 4, LTP and LTD are two forms of synaptic plasticity that affect signal transmission between neurons and have been widely studied to understand the mechanisms underlying learning and memory [Martin et al., 2000; Neves et al., 2008]. It is believed that glutamate receptors are major triggers for the induction of LTP and LTD, where PKA, protein kinase C (PKC), calcium/calmodulin-dependent protein kinase II (CaMKII) can all play a role [Collingridge et al., 2004]. CREB is majorly activated by cAMP through PKA [Delghandi et al., 2005]. In particular, a behavioral experience can elicit synaptic activities which activate CREB, inducing expression of molecules contributing to consolidating changes in synaptic strength [Benito and Barco, 2010]. Axon guidance strongly relies on a number of cue molecules, among which netrins are the most well understood. It has been demonstrated that netrins retain the function of attracting axons toward the brain midline while also repelling some axons, and responses to the guidance cue netrin-1 (NTN1) are sensitive to levels of cAMP or PKA activity [Dickson, 2002]. Neuronal nitric oxide synthase (nNOS) is a biosynthetic enzyme functioning in several types of synaptic plasticity, including LTP and LTD [Bon and Garthwaite, 2003; Nelson et al., 1995]. The phosphorylation of nNOS is regulated by kinases and phosphatases such as PKA, PKC, and CaMKII [Zhou and Zhu, 2009]. It is noteworthy that G-proteins are involved in the signaling cascades of all these cellular machineries, including glutamate, cAMP, PKA, PKC, CREB, and CaMKII [Neves et al., 2002].

Among the enriched pathways, overlaps of genes were observed, including GRM1, PRKCH, GNA12, GNA14, and CAMK2B. GRM1 encodes a metabotropic, G protein-coupled receptor for glutamate, the major excitatory neurotransmitters in the central nervous system [Hermans and Challiss, 2001]. A total of 10 SNPs from GRM1 were identified as top contributors to the component in our analysis. Nine out of these 10 SNPs contributed positively (see Supporting Information Table S2 for details). Note that the SNP-sMRI correlation was negative, indicating that subjects with higher loads of the SNP component carried relatively lower loads of the sMRI component. Also, the highlighted brain regions presented positive component weights, indicating that subjects with higher loads of the sMRI component had higher regional GMD. Thus, for these SNPs presenting positive component weights, our finding suggested that, overall, subjects carrying more minor alleles at these loci showed reduced GMD in the highlighted brain regions. Only one out of the 10 top SNPs in GRM1 (rs854144_T) contributed with a negative weight, indicating more minor alleles found in subjects exhibiting increased regional GMD. PRKCH is another molecule of great importance as it encodes a PKC subtype. PKCs phosphorylate a wide range of protein targets involved in brain functioning. For instance, phosphorylation of GluR1 by PKC is demonstrated to be critical for LTP expression [Boehm et al., 2006]. In our analysis, one single SNP rs1957902_G was identified from PRKCH, presenting a negative weight. GNA12 and GNA14 fall into the G-protein subfamilies of G12 and Gq. GNA12 has been indicated to negatively regulate cell adhesion through interacting with cadherin [Meigs et al., 2002]. GNA14 is among the genes which were found to exhibit developmental expression variations in the cortex of a marmoset animal model, and linked to axonal guidance [Sasaki et al., in press]. In the present study, rs2258960_T and rs2644311_T were identified from GNA12 and presented positive weights. For GNA14, rs10869927_G and rs2889774_T showed negative weights while rs10781441_T, rs12684903_A, and rs7047853_G showed positive weights. CAMK2B encodes a Ca2+/calmodulin-dependent protein kinase involved in calcium signaling. The expression of this gene has been found to be disrupted in Alzheimer's disease [Antonell et al., 2013; Liang et al., 2008]. Three SNPs, rs12702075_A, rs10281178_C, and rs3934888_G, were identified from CAMK2B, all exhibiting negative weights.

Both the imaging and genetic findings from our analysis link strongly to neural development, indicating that the observed GMD variations might be traced back to developmental processes under control of genetic factors. It is well acknowledged that genetic components underlie anatomic variations in healthy and diseased human brains [Andersen, 2003; Giedd, 2005; Sotelo, 2004]. Direct associations between genetic variants and quantitative structural measures have also been demonstrated in large-scale studies [Bis et al., 2012; Stein et al., 2012]. Compared with previous works using univariate approaches, our results delineate relationships between particular genetic pathways and regional structural variations, emphasizing that these G-protein-related PKA, CREB, and nNOS mechanisms play a critical role in brain development which might ultimately lead to GMD differences. On the other hand, it should be noted that brain structure is the final manifestation of a complex interplay between multiple genetic and environmental factors. Our analysis captured a part of the story.

The current study benefited from a large homogeneous discovery sample which enhances statistical power. In addition, the employed multivariate approach assessed a group of variables whose aggregate effects would be more prominent compared to those of individual variables. Leveraging prior knowledge further improved the chance of pinpointing factors of interest in a large complex dataset. Most importantly, the imaging genetic association identified from the discovery sample was replicated in an independent cohort, indicating a low possibility of false positive results. On the other hand, one major limitation of this study lies in the different scanning platforms employed for imaging data collection, which can be a potential confound. To address this issue, we carefully explored all the scanning settings and eliminated from the imaging data the variance most likely induced by scanning discrepancies based on a linear model. Nonlinear scanning effects might still exist for which we do not have any knowledge at present. However, such effects were not expected to significantly contribute to the identified imaging genetic association given the linear decomposition of ICA. Another limitation is that it is not clear yet how the identified components are related to behavior based on the available data. The identified local GMD variations have been implicated in intelligence studies; however we did not collect IQ data in this project. Instead we investigated whether this imaging trait was correlated with the available reversal learning scores and found no significant associations. Further work needs to be done to better understand how the genetic and GMD variations might lead to differences in behavior. Besides, it should be noted that SNPs in high LD would exhibit comparable effects in our analysis. Therefore, SNPs might be identified due to tagging true causal variants. However, the identified genes and pathways should be unchanged. Finally, following the design of pICA-R, we only tested the largest cluster as a genetic reference for each candidate gene in this work, ignoring other smaller clusters which might also be biologically informative. This could be tackled with an extended approach, parallel ICA with multiple references [Chen et al., 2014a], which explores potential convergence of functional influences among genes. We plan to conduct such an analysis in a future study.

In summary, we performed a guided exploration in the present study, where a reference derived from the GNA14 gene managed to elicit a genetic component significantly associated with an sMRI component in a semi-blind multivariate analysis. The associated genetic component highlighted several neural signaling pathways which appear to interact with each other and are active in synaptic plasticity and axonal guidance. Meanwhile, the identified brain network comprised regions recruited in various cognitive processes and robustly implicated in brain maturation and intelligence. Collectively, our study suggests a key role of G-proteins in the genetic architecture underlying normal GMD variations in frontal and parietal regions. We speculate that the observed GMD variations partially result from differential genetic modulation of brain development, though future longitudinal studies are needed to dissect genetic contribution to trajectories of anatomic changes in developing brains.

ACKNOWLEDGMENTS

This work makes use of the BIG (Brain Imaging Genetics) database, first established in Nijmegen, The Netherlands, in 2007. This resource is now part of Cognomics (www.cognomics.nl), a joint initiative by researchers of the Donders Centre for Cognitive Neuroimaging, the Human Genetics and Cognitive Neuroscience departments of the Radboud University Nijmegen Medical Centre and the Max Planck Institute for Psycholinguistics in Nijmegen. The Cognomics Initiative is supported by the participating departments and centres and by external grants, i.e. the Biobanking and Biomolecular Resources Research Infrastructure (Netherlands) (BBMRI-NL), the Hersenstichting Nederland, and the Netherlands Organisation for Scientific Research (NWO). The authors thank all persons who kindly participated in the BIG research. The Board of the Cognomics Initiative consists of Barbara Franke, Simon Fisher, Guillén Fernandez, Peter Hagoort, Han G. Brunner, Jan Buitelaar, Hans van Bokhoven and David Norris. The authors would also like to thank the University of Iowa Hospital, Massachusetts General Hospital, the University of Minnesota, the University of New Mexico, and the Mind Research Network staff for their efforts in data collection, preprocessing, and analyses.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.