Protein-Altering Variants' Analysis in Autism Subgroups Uncovers Early Brain-Expressed Gene Modules Relevant to Autism Pathophysiology
Funding: This work was supported by a grant from the Italian Ministry of Health to AC (Ricerca Corrente 2025). APC funded by Bibliosan.
Gaia Scaccabarozzi and Luca Fumagalli contributed equally to this work.
Uberto Pozzoli and Alessandro Crippa jointly supervised this work.
ABSTRACT
Understanding the functional implications of genes' variants in autism heterogeneity is challenging. Gene set analysis examines the cumulative effect of multiple functionally converging genes. Here we explored whether a multi-step analysis could identify gene sets with different loads of protein-altering variants (PAVs) between two subgroups of autistic children. After subdividing our sample (n = 71, 3–12 years) based on higher (> 80; n = 43) and lower ( 80; n = 28) intelligence quotient (IQ), a gene set variant enrichment analysis identified gene sets with significantly different incidence of PAVs between the two subgroups of autistic children. Significant gene sets were then clustered into modules of genes. Their brain expression was investigated according to the BrainSpan Atlas of the Developing Human Brain. Next, we extended each module by selecting the genes that were spatio-temporally co-expressed in the developing brain and physically interacting with those in modules. Last, we explored the incidence of autism susceptibility genes within original and extended modules. Our analysis identified 38 significant gene sets (FDR, q < 0.05). They clustered in four modules involved in ion cell communication, neurocognition, gastrointestinal function, and immune system. Those modules were highly expressed in specific brain structures across development. Spatio-temporal brain co-expression and physical interactions identified extended genes' clusters with over-represented autism susceptibility genes. Overall, our unbiased approach identified modules of genes functionally relevant to autism pathophysiology, possibly implicating them in phenotypic variability across subgroups. The findings also suggest that autism diversity likely originates from multiple interacting pathways. Future research could leverage this approach to identify genetic pathways relevant to autism subtyping.
Summary
- This study investigates how genetic differences contribute to the diversity of autism by examining gene variants in two groups of autistic children with different IQ levels.
- By using an unbiased approach, we found 38 gene sets with significant differences in possibly damaging variants between the subgroups.
- These gene sets were grouped into four key modules related to brain cell communication, neurocognition, gut function, and the immune system.
- The modules were strongly expressed in certain brain regions and stages of development, and interact with genes including known autism susceptibility genes.
- This approach offers valuable insights into how genetic factors may lead to different autism subtypes and could guide future research into autism's underlying biological mechanisms.
1 Introduction
Autism is a neurodevelopmental condition characterized by relevant heterogeneity across multiple levels of analysis. Its phenotypic presentation can drastically differ depending on the degree of both core and co-occurring features, such as language abilities (Naigles et al. 2017), intellectual profile (Wolff et al. 2022), motor coordination difficulties (Craig et al. 2021; Wang, Petrulla, et al. 2022), and adaptive functioning (Mandelli et al. 2023) (with this respect, see the LIMA approach, (Mandelli et al. 2024)). Critical heterogeneity has been found at deeper levels of investigation, including neurophysiology (Geschwind and Levitt 2007; Di Martino et al. 2014) and genetics (An and Claudianos 2016; Havdahl et al. 2021). Along with those aspects, it is also important to consider the dimension of chronogeneity, that is, the variation in autism manifestation in relation to the dimension of time and, markedly, of development (Georgiades et al. 2017). Despite this multi-level heterogeneity, the role of genetic factors in autism etiology is unquestioned. The heritability of the condition is high, with estimates ranging between 64% and 91% (Steffenburg et al. 1989; Le Couteur et al. 1996; Taniai et al. 2008; Lichtenstein et al. 2010; Hallmayer et al. 2011; Nordenbæk et al. 2014; Colvert et al. 2015; De Rubeis et al. 2014). Over recent years, hundreds of de novo and rare inherited variants have been identified as major contributors to individual autism risk (Iossifov et al. 2014; Sanders et al. 2015; Yuen et al. 2017; Ruzzo et al. 2019; Doan et al. 2019; Satterstrom et al. 2020; Choi and An 2021). The expression of those autism-associated genes has been reported to be preeminent at prenatal and early postnatal stages, substantiating the early onset of the condition (Willsey et al. 2013; Courchesne et al. 2019). However, although with smaller effect in comparison to that of de novo and rare inherited variants, several studies have demonstrated that heritability in autism is largely due to common variants, that is, to the genetic variations that are commonly present in the general population (Gaugler et al. 2014; de la Torre-Ubieta et al. 2016; Grove et al. 2019). Thus, as for other neurodevelopmental conditions, there is considerable evidence indicating the high heritability of autism and its polygenic nature, with both rare and common variants having a critical role (Cirnigliaro et al. 2023).
In the last 10 years, a body of research has leveraged genetic heterogeneity to parse autism phenotypic presentations (Jeste and Geschwind 2014; Chawner et al. 2021; Warrier et al. 2022; Di Giovanni et al. 2023; Bertelsen et al. 2021). Nevertheless, given the heterogeneity of the condition, the identification of the functional implications of genes related to autism requires large-scale studies, mostly involving thousands of participants.
An alternative perspective is shifting the focus from single genes to networks of genes that converge in functionally relevant biological processes, for example, by using gene set analysis. Gene sets are groups of genes that share a common biological function and are pre-defined based on prior biological knowledge (Subramanian et al. 2005). The gene set approach permits examination of the combined effect of multiple DNA variants that cumulatively impact biological pathways potentially relevant to autism (An and Claudianos 2016).
In this study, we explored the hypothesis of whether a multi-step unbiased analysis could identify gene sets relevant to autism subtyping and functionally characterize them in terms of biological processes and their spatio-temporal co-expression in the brain. Since we intended to evaluate the gene sets' cumulative effects, we considered all the protein-altering variants (PAVs), regardless of both their frequency in the population and their effect on the proteins.
The present work is split into different objectives. First, after dividing our clinical sample of autistic children into two subgroups based on their standardized measures of intelligence quotient (IQ), we identified gene sets that presented a different load of PAVs between participants with higher IQ and lower IQ, using a data-driven enrichment analysis. With this regard, previous research has reported significant associations between genetic variations in autism and cognitive heterogeneity (Wang et al. 2013; Iossifov et al. 2014). The obtained gene sets were hierarchically clustered into modules based on their PAVs enrichment. We then characterized each of those modules with a label representative of the biological processes that best characterized them.
Next, to explore the functional implications of the modules, we assessed their expression profiles in the brain according to the BrainSpan Atlas of the Developing Human Brain (n.d.). More specifically, we determined whether genes in the modules were expressed above expectations in brain structures through different developmental stages, to gain insights about their potential involvement in brain development.
Afterwards, to explore more extensively the functional interplay between the modules, we extended each module by selecting those genes which both physically interact at the protein level and show a highly correlated spatio-temporal brain expression profile with those in the modules. Among the extended set of genes, we assessed their spatial and temporal co-expression in the brain according to the BrainSpan Atlas of the Developing Human Brain and their physical interaction at the protein level according to the bioGRID database (Stark et al. 2006). Understanding how genes relate to each other during neurodevelopment is essential to identify what are potential shared biological processes (Mahfouz et al. 2015) and how they could interact along their functional pathways to impact the final outcome. Genes that are co-expressed in the brain are indeed more likely to act together in converging processes. In this sense, the evidence of the corresponding proteins' interactions represents a confirmation of this functional association (Oldham et al. 2008).
Lastly, we investigated whether the genes in the modules and those in the extended set of their co-expressed interactors were enriched in candidate genes highly associated with autism susceptibility according to the Simons Foundation Autism Research Initiative (SFARI) database (n.d.), which constantly integrates genetic information from multiple research studies.
2 Methods
We summarized our methods in a workflow diagram (Figure 1).

2.1 Participants
A total of 71 autistic children aged 3–12 years were involved in this cross-sectional study. Participants were consecutively recruited at the Child Psychopathology Unit of Scientific Institute, IRCCS Eugenio Medea (Bosisio Parini, Italy), over a 36-month period between July 2016 and July 2019, as a part of a larger study (Crippa et al. 2021). Autistic children were admitted to inpatient/outpatient units of our institute either for assessment or for rehabilitation programs. All participants had been previously diagnosed at our institute on the basis of a consensus “best estimate” DSM-5 clinical diagnostic process informed by, but not dependent on, scores on the Autism Diagnostic Observation Schedule-Second Edition (ADOS-2) (Lord et al. 2012). Participants were excluded in case a well-defined genetic disorder was detected. Further exclusion criteria were the use of medication affecting the central nervous system, the presence of significant sensory impairment (e.g., blindness, deafness), abnormalities detected by MRI, and suffering from chronic or acute medical illness. All participants were drug-naïve.
This research was conducted in accordance with the ethical standards of the 1964 Declaration of Helsinki and later amendments and was approved by the Ethics Committee of our Institute “Comitato Etico IRCCS E. Medea—Sezione Scientifica Associazione La Nostra Famiglia” (Prot. N.33/18—CE). Informed written consent was collected by all of the participants' parents or legal guardians before participation.
2.2 Measures
All children underwent an assessment of the IQ level. The specific assessment tool was chosen according to the participants' developmental level among the Griffiths Mental Development Scales (Griffiths 1970), Wechsler Preschool and Primary Scale of Intelligence (WPPSI-III) (Wechsler 2002), and Wechsler Intelligence Scale for Children-IV (WISC-IV) (Wechsler 2012). Autism core symptoms severity was rated using the calibrated severity scores for the total, the social affect, and the restricted and repetitive behavior domains of the ADOS-2 (Lord et al. 2012), and the Social Responsiveness Scales (SRS) (Constantino and Gruber 2005). In the context of a broader research project (Crippa et al. 2021), the participants' motor abilities were also evaluated using the Movement Assessment Battery for Children-2 (MABC2) (Henderson et al. 2007), the sensorimotor subtests of the Developmental Neuropsychological Assessment Second Edition (NEPSY-II) (Korkman et al. 2007), the Beery-Buktenica Developmental Test of Visual-Motor Integration (VMI) (Beery and Beery 2004), and the Developmental Coordination Disorder Questionnaire (DCDQ) (Wilson et al. 2014), a motor screener filled by caregivers. Lastly, information about the familial socioeconomic status (SES) was collected using the Hollingshead scale for parental employment (Hollingshead 1975). Table S1 provides a description of the sample size for each of the investigated features.
2.3 Biological Samples
Blood or saliva samples (7 and 64, respectively) were obtained from participants for DNA extraction, which was performed in-house at the Molecular Biology Laboratory of our Institute. Blood was collected in tubes with EDTA. Genomic DNA was extracted from blood samples through the GenElute Blood Genomic DNA kit (Sigma). DNA from each sample was eluted in 50 μL of 10 mM Tris (pH 9.0)/0.5 mM EDTA and stored at −20°C. The Oragene OG-500 kit (DNA Genotek) was used to collect saliva samples. Genomic DNA was extracted from saliva samples by precipitation in ethanol following the manufacturer's protocol, resuspended in 50 μL of 10 mM Tris (pH 9.0)/1 mM EDTA and stored at −20°C.
2.4 Variants Identification
Samples' DNA were screened by using a targeted next generation sequencing approach with a targeted design that enables analysis of only the disease-associated targets, the SureSelect Focused Exome, with approximately 5200 disease-associated genes (see Table S2 for a complete list of genes). The sequencing libraries were prepared from genomic DNA by using a Sure Select enrichment system (Agilent Technologies). Targeted libraries were run on the NextSeq platform according to the manufacturer's instructions (Illumina, San Diego, CA, USA). The sequenced reads were then aligned to reference target regions, and variants were called with the BWA enrichment application (which include also GATK for variant calling) available on BaseSpace Onsite (Illumina, San Diego, CA, USA). Erroneous variant calls were discarded. Furthermore, we did not consider the Y chromosome because some variations were retrieved also for women.
Variants' annotation was performed by ANNOVAR (Wang et al. 2010) with a refGene database generated by converting the human reference genome (GRCh37/hg19) GFF3 and FASTA files to ANNOVAR file format using the gff3toGenePred package (Grüning et al. 2018). PAVs were then selected as those occurring in the panel exons and annotated as indels, stop codon gain, stop codon loss, or non-synonymous. Lastly, we quantified variants for each participant, controlling for the presence of rare variants (i.e., not reported or with a frequency below 0.01 in gnomAD, [Karczewski et al. 2020]) or pathogenic variants (annotated as Pathogenic or Likely Pathogenic according to ACMG-AMP guidelines, obtained from the InterVar database) (Li and Wang 2017).
2.5 Gene Sets Definition
We obtained 29,227 gene sets corresponding to ontology terms or pathways and their associated genes. To this purpose, we interrogated multiple sources, namely: Gene Ontology (Ashburner et al. 2000) (version 2019-03), HPO Phenotype (Kanehisa et al. 2010) (version 2019-03), Kegg Pathway Database (Kanehisa et al. 2010) (2019-2-25) and Reactome Pathway Database (Fabregat et al. 2018) (Physical Entity Identifier mapping files, updated to 2019-03).
2.6 Evaluation of Gene Sets Variants Abundance in a Group of Subjects
2.7 Gene Set Variants Enrichment Analysis
The goal of the gene set variants enrichment analysis (GSVEA) is to determine whether the abundance of variants observed in a given gene set differs in two subject groups. To assess whether the variants abundance in a gene set S is significantly different between two groups of participants G1 and G2, a Fisher's exact test is applied to the contingency table defined as:
Number of variants in the gene set | Number of variants not in the gene set | |
---|---|---|
G1 | VG1 | VA1–VG1 |
G2 | VG2 | VA2–VG2 |
VG1 and VG2 are calculated according to Equation (1) for the gene set S while VA1 and VA2 are calculated according to Equation (1) but for the gene set comprising all the genes in the exome panel. When multiple gene sets are tested, false discovery rate (FDR) is applied to the Fisher test p values and q values are reported.
2.8 Gene Sets and Subjects Clustering
Hierarchical clustering has been consistently performed in this study using the Ward algorithm (Murtagh and Legendre 2014) applied to the euclidean distance. Optimal ranks have been determined by silhouette analysis (Rousseeuw 1987).
2.9 Gene Expression in the Brain During Neurodevelopmental Stages
We used the ABAEnrichment R package (version 1.20.0) (Grote et al. 2016) to characterize the brain expression of the genes included in the modules according to the BrainSpan Atlas of the Developing Human Brain (Chawner et al. 2021). BrainSpan Atlas includes five developmental stages, namely prenatal, infant (0–2 years), child (3–11 years), adolescent (12–19 years) and adult (> 19 years). According to the study's aims, we limited our analysis to the four developmental stages.
2.10 Genes Spatio-Temporal Co-Expression in the Developing Brain
For each gene in the exome panel, we built an expression vector extracting from the BrainSpan Atlas of the Developing Human Brain the expression levels measured in each available structure for the first four developmental stages. For any pair of genes in a set, we estimated their co-expression in time and space by calculating the Pearson correlation coefficient (p) between their expression vectors. Correlation p values are also computed and corrected for multiple testing (FDR). Gene pairs with p > 0.7 and q values < 0.05 are considered to be co-expressed.
2.11 Genes Protein–Protein Interactions
The bioGRID database (Stark et al. 2006) was used to identify whether couples of genes had physical protein–protein interactions. Genes having at least one physical interaction with the genes in the modules were considered as interactors in the present study.
2.12 Incidence of Autism-Related Genes
Lastly, we analyzed the incidence of genes with a gene score of 1 (High Confidence category) in the SFARI database (https://sfari.org/ accessed July 9, 2024) in our gene modules and their extended networks. Fisher exact tests were performed considering all the genes listed in the exome panel and, in case of multiple testing, p values were corrected according to Bonferroni.
3 Results
3.1 Clinical and Motor Differences Between Subgroups of Participants
Table 1 summarizes the sociodemographic, clinical, and motor characteristics of the total sample and for the two subgroups of participants with lower (≤ 80, n = 28) and higher IQ (> 80, n = 43), respectively. The two subgroups were balanced in autism severity as assessed with both the ADOS-2 calibrated severity scores and the SRS, and in the male to female ratio. Participants with higher IQ had significantly higher scores on SES (p = 0.035) and were slightly older than subjects with lower IQ (p = 0.057). Demographically, the entire sample was Caucasian.
Autism, TOT (n = 71) | Autism, IQ ≤ 80 (n = 28) | Autism, IQ > 80 (n = 43) | Subgroup comparisons | |
---|---|---|---|---|
Age | 7.6 (2.4) [3.2–11.9] | 6.9 (2.6) [3.2–11.9] | 8.1 (2.1) [3.8–11.6] | 0.057c |
Sex (M:F) | 61:10 | 25:3 | 36:7 | 0.510c |
IQ | 86.1 (21.8) [47–134] | 64.7 (9.6) [47–80] | 100.0 (15.0) [81–134] | < 0.001 c |
SES | 58.5 (18.1) [20–90] | 52.7 (16.9) [20–90] | 62.5 (18.1) [30–90] | 0.035 c |
Autism core characteristics | ||||
ADOS-2 total CSS | 6.2 (1.7) [3–10] | 6.1 (1.7) [4–10] | 6.2 (1.7) [3–9] | 0.505c |
ADOS-2 SA CSS | 6.4 (1.9) [3–10] | 6.5 (1.9) [3–10] | 6.4 (1.9) [3–10] | 0.696c |
ADOS-2 RRB CSS | 6.3 (2.1) [1–10] | 5.9 (2.7) [1–10] | 6.5 (1.6) [1–9] | 0.756c |
SRSa | 73.8 (29.8) [26–148] | 68.11 (31.1) [26–148] | 77.6 (28.7) [26–142] | 0.200c |
Motor skills | ||||
DCDQ—Controla | 17.5 (5.2) [7–30] | 18.1 (5.2) [7–28] | 17.1 (5.2) [8–30] | 0.566d |
DCDQ—Fine motor/handwritinga | 12.2 (4.6) [4–20] | 11.1 (5.3) [4–19] | 12.9 (3.9) [4–20] | 0.261d |
DCDQ—General coordinationa |
14.5 (4.6) [7–24] | 15.4 (5.1) [7–24] | 13.9 (4.2) [9–23] | 0.378d |
MABC 2—Manual dexterityb | 6.0 (2.8) [1–13] | 4.5 (2.2) [2–10] | 6.8 (2.7) [1–13] | 0.003 d |
MABC 2—Aiming and catchingb | 7.1 (2.6) [2–13] | 7.3 (2.6) [3–13] | 6.9 (2.7) [2–13] | 0.691d |
MABC 2—Balanceb | 6.3 (2.3) [1–14] | 6.0 (1.9) [1–10] | 6.5 (2.5) [1–14] | 0.566d |
NEPSY II—Visuomotor precisionb | 6.8 (2.3) [1–14] | 6.1 (1.8) [1–10] | 7.2 (2.4) [2–14] | 0.122d |
NEPSY II—Fingertip tappingb | 11.4 (2.8) [1–15] | 10.3 (3.5) [2–15] | 12 (2.3) [1–15] | 0.159d |
NEPSY II—Manual motor sequencesb | 9.2 (3.4) [1–15] | 8.0 (3.3) [3–14] | 10 (3.2) [2–15] | 0.086d |
NEPSY II—Imitating hand positionsb | 6.8 (4.7) [1–15] | 3.9 (3.9) [1–12] | 8.3 (4.4) [1–15] | 0.002 d |
VMI totalb | 93.8 (17.8) [57–148] | 85.4 (13.5) [66–114] | 98.6 (18.3) [57–148] | 0.012 d |
- Note: Data are expressed as Mean (SD) [range]. Mann–Whitney U test or t-test was used for quantitative variables, chi-squared test for the categorical variable sex. araw score; bstandard score; cp value; dq value, false discovery rate corrected. Bold font: p values < 0.05.
- Abbreviations: ADOS: autism diagnostic observation schedule; CSS: calibrated severity scores; DCDQ: Developmental Coordination Disorder Questionnaire; MABC2: Movement Assessment Battery for Children 2; NEPSY-II: Developmental Neuropsychological Assessment Second Edition; RRB: restricted repetitive behaviors; SA: social affect; SES: socio-economic status; SRS: Social Responsiveness Scales; VMI: Beery-Buktenica Developmental Test of Visual-Motor Integration.
With respect to the motor skills, participants with higher IQ presented higher scores on MABC2 Manual Dexterity, NEPSY-II Imitating Hand Positions, and VMI (all FDR q < 0.05).
3.2 Gene Sets Showing Differential PAVs Enrichment Between Autism Subgroups Are Involved in Ion Cell Communication, Neurocognition, Gastrointestinal Function, and Immune System
In order to check for the presence of rare or pathogenic variants in the two subgroups of participants, we quantified variants for each subject, applying progressively stricter selection criteria. When considering all variants (Figure S1, Panel A), participants with higher IQ showed a slightly greater number of variants. This non-significant difference progressively decreased when we focused on rare variants (Figure S1, Panel B) or pathogenic variants (Figure S1, Panel C). When we filtered for both frequency and pathogenicity (Figure S1, Panel D), the trend reversed, with slightly, although not significantly, more variants observed in the low IQ group. The application of this last criterion selected 144 variants reported in Table S3.
After this preliminary check, we performed a gene set variants enrichment analysis (GSVEA). In brief, the goal of this analysis was to determine whether the abundance of PAVs observed in a given gene set differed in the two subgroups of autistic participants. We performed the analysis for nearly 30.000 functionally characterized gene sets obtained from public databases of ontologies and pathways (see methods). Among the considered gene sets, 38 showed a significantly different incidence of PAVs between the two subgroups of autistic participants (FDR, q < 0.05). For each subject, we calculated the proportion of PAVs—enrichment score—in each significant gene set. We then applied an optimal hierarchical clustering procedure to the resulting matrix at both gene sets and subjects level (Figure 2A). The significant gene sets resulted grouped in five clusters (terms' clusters, TC). Two of them—TC2 and TC3—were merged in a single cluster, given their high intersection level (Figure 2B), resulting in a total of four modules, which were operationally defined as the union of the corresponding significantly enriched gene sets. These four modules grouped together genes involved in a variety of processes and functions related to ion cell communication, neurocognition, gastrointestinal function, and immune system, respectively, and were thus labeled accordingly. The data-driven identification of these four modules was based on the initial top-down splitting of our sample in two subgroups on the basis of their standardized measures of IQ. With a bottom-up approach, we also explored whether the modules' enrichment effectively identified clusters of participants with phenotypic differences. At the subject level, participants were clustered in three groups (subjects' clusters, SC). These SCs had a non-random distribution of subjects with IQ > 80 (Fisher exact test; p < 0.001), which resulted particularly represented in SC2 (Figure 2C). A logistic regression analysis based on the five TCs enrichment scores was also performed. With a leave-one-out cross validation approach, the regression model showed an accuracy of 0.704 (area under the receiver operating characteristic curve, ROC AUC = 0.74) in predicting IQ group membership (Figure 2D). Participants in the three SCs did not differ significantly in ADOS total or subscales scores (Kruskal–Wallis test; p > 0.05).

3.3 Genes in the Modules Are Expressed in the Brain in the Developmental Period
We then aimed to understand whether the genes included in the four modules could be relevantly expressed in brain regions throughout development. To this end, we leveraged the spatial gene expression data provided by the BrainSpan Atlas of the Developing Human Brain to link our gene modules to brain structures across development. Given the childhood-onset nature of the condition, we specifically considered the first four developmental stages listed in the BrainSpan Atlas: prenatal, infancy (0–2 years), childhood (3–11 years), adolescence (12–19 years).
Results of the ABAEnrichment analysis revealed that genes in most of the modules were significantly more expressed across specific brain structures and developmental stages, except for the immune system module. An overview of the significantly enriched brain structures in each developmental stage can be found in Table 2. Genes in the ion cell communication module appear to be particularly expressed in the prenatal period and in adolescence across different brain structures. Genes in the neurocognition module are mainly expressed in the hippocampus and in various primary and associative cortical areas across infancy and childhood. Genes of the gastrointestinal module are extensively expressed from infancy to adolescence in the cerebellum, subcortical nuclei, and cortical structures. Last, genes included in the immune system module are not expressed higher than expected in either any brain structure or developmental stage.
Ion cell communication | Neurocognition | Immune system | Gastro-intestinal | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pre-natal | Infant 0–2 years | Child 3–11 years | Adol 12–19 years | Pre-natal | Infant 0–2 years | Child 3–11 years | Adol 12–19 years | Pre-natal | Infant 0–2 years | Child 3–11 years | Adol 12–19 years | Pre-natal | Infant 0–2 years | Child 3–11 years | Adol 12–19 years | |
CBC_cerebellar cortex |
0.032 1.138 |
0.028 1.156 |
0.043 1.177 |
|||||||||||||
MD_mediodorsal nucleus of thalamus |
0.01 3.25 |
0.031 2.884 |
0.037 2.78 |
|||||||||||||
STR_striatum |
0.01 1.476 |
0.021 3.06 |
0.037 1.302 |
|||||||||||||
AMY_amygdaloid complex |
0.042 1.23 |
0.035 1.216 |
||||||||||||||
HIP_hippocampus (hippocampal formation) |
0.016 2.36 |
0.037 2.552 |
0.018 1.246 |
0.005 1.268 |
0.023 1.454 |
|||||||||||
M1C_primary motor cortex (area M1, area 4) |
0.03 1.389 |
0.016 2.98 |
0.025 2.001 |
0.04 1.249 |
0.013 1.465 |
|||||||||||
S1C_primary somatosensory cortex (area S1, areas 3, 1, 2) |
0.013 2.997 |
0.022 2.265 |
0.041 1.237 |
|||||||||||||
V1C_primary visual cortex (striate cortex, area V1/17) |
0.029 2.945 |
0.013 3.015 |
0.042 1.185 |
|||||||||||||
A1C_primary auditory cortex (core) |
0.004 1.56 |
0.049 1.2 |
0.024 1.215 |
|||||||||||||
OFC_orbital frontal cortex |
0.037 2.78 |
0.017 2.331 |
0.049 2.524 |
|||||||||||||
VFC_ventrolateral prefrontal cortex |
0.034 2.863 |
0.019 2.01 |
0.012 1.706 |
0 1.468 |
||||||||||||
DFC_dorsolateral prefrontal cortex |
0.04 2.775 |
0.038 1.614 |
0.008 3.292 |
|||||||||||||
MFC_anterior (rostral) cingulate (medial prefrontal) cortex |
0.008 1.75 |
0.004 1.307 |
0.008 1.531 |
|||||||||||||
ITC_inferolateral temporal cortex (area TEv, area 20) |
0.047 1.437 |
0.035 2.821 |
0.01 2.398 |
|||||||||||||
STC_posterior (caudal) superior temporal cortex (area 22c) |
0.018 2.321 |
0.023 1.666 |
||||||||||||||
IPC_posteroventral (inferior) parietal cortex |
0.004 1.56 |
0.008 2.408 |
0.025 1.201 |
- Note: Sampled brain structures are listed on the left, and investigated age ranges are reported in columns. p values (bold text) and fold values (plain text) are reported for brain structures and age ranges where a significant enrichment for that gene module was found.
3.4 Identification of Extended Modules Through Physical Protein–Protein Interactions and Brain Spatio-Temporal Co-Expression
Next, we aimed at identifying the genes which are both interacting with those in the four modules and showing a high level of co-expression in the developing brain. To this purpose, we used the protein–protein interaction database bioGRID, and the expression data from the BrainSpan Atlas of the Developing Human Brain. For each gene in a module, we selected its direct physical interactors which also presented a significant level of spatio-temporal co-expression in the brain with that gene (see Section 2). Each module was then extended with its co-expressed interactors (Table 3, see Table S4 for a complete list of genes).
Number of genes in each module | Number of genes in each extended module | |
---|---|---|
Ion cell communication | 36 | 70 |
Neurocognition | 20 | 149 |
Gastrointestinal function | 94 | 331 |
Immune system | 80 | 207 |
Total | 219 | 590 |
We then collected all the genes in the four extended modules (n = 590) and built a co-expression matrix by calculating the spatio-temporal co-expression for each pair (Figure 3A). Hierarchical clustering identified three genes' clusters (GC) among this extended set of genes. Genes in clusters GC1 and GC2 showed high within-cluster and low between-cluster co-expression levels, while those in cluster GC3 showed a general intermediate level of co-expression both internally and with the other two clusters. Figure 3B depicts the annotated protein–protein interaction network for the three clusters, displaying high connection levels for clusters GC1 and GC2.

3.5 SFARI Genes Are Overrepresented in the Extended Networks of Genes in the Modules
Looking at the incidence of autism-related genes, nine out of the 219 genes in the modules (4.11%) were also annotated in the SFARI database in the high confidence category. This proportion was higher than expected in a random sample of genes, but did not reach the level of significance (OR = 1.534; p = 0.2063). When considering the four extended modules, high-confidence SFARI genes were significantly overrepresented (32 out of 590, 5.42%; OR = 2.296, p = 0.0001; see Table S5 for a complete list of genes).
Looking at the distribution of these SFARI genes in the three co-expression clusters, they were significantly overrepresented in cluster GC1 (28 out of 362, 7.73%; OR = 3.40; corrected p < 0.001) while not in GC2 (1 out of 153, 0.65%) and GC3 (3 out of 63, 4.76%). Moreover, within GC1, the enrichment of SFARI genes was significant in the extended networks of the neurocognition module (10 out of 128, 7.81%; OR = 3.114; Bonferroni corrected p = 0.033) and gastrointestinal module (16 out of 246, 6.50%; OR = 2.615; Bonferroni corrected p = 0.014) but not in the immune system and ion cell communication modules.
4 Discussion
In this work, we identified four modules of genes with different PAVs' load in two autism subgroups with higher and lower IQ levels by using an unsupervised approach. Importantly, the identified modules grouped together genes involved in a variety of biological processes—ion cell communication, neurocognition, gastrointestinal function, and immune system—that have been previously reported to be atypical in autism. Here, for the first time to our knowledge, we provide preliminary evidence of the possible implication of these biological processes in the differential phenotypic manifestations of two discrete autism subgroups.
The first gene module—ion cell communication—includes genes involved in ion homeostasis, transport, and signaling (e.g., ATP1A2 and ATP1A3—subunits of a sodium-potassium pump, CACNA1C—subunit of a calcium voltage-gated channel). Those genes have been extensively associated with autism (Schmunk and Gargus 2013; Daghsni et al. 2018; Evangelho et al. 2023), mostly for their crucial role in synaptic functioning (De Rubeis et al. 2014; Bourgeron 2015; Satterstrom et al. 2020). In the very early developmental stages, genes within this module encode proteins with major contributions to neural proliferation, migration, and differentiation (Smith and Walsh 2020), including voltage-gated ion channels that regulate the propagation of action potentials and pacemaking, also in relation to cardiac membrane excitability. With this respect, an association between autism and congenital heart disease has been reported (Gu et al. 2023; Sigmon et al. 2019; Splawski et al. 2004) and functionally confirmed based on the evidence of convergent genetic pathways (De Rubeis et al. 2014; Rosenthal et al. 2021). Genes in this module were significantly enriched in many brain regions during the prenatal stage according to our brain expression analysis, confirming their early involvement in development. Brain expression results also identified a significant enrichment of those genes during adolescence. With this respect, an association between the expression of ion-related genes in adolescence and neuropsychiatric conditions has been observed (Clifton et al. 2021). Interestingly, our results also align with previous research indicating an association between variants in ion channels-related genes and both higher IQ and lower levels of repetitive behaviors in autism (Lee et al. 2021), confirming the relevance of this gene module in distinguishing different phenotypic manifestations of the condition.
The neurocognition module encompasses gene sets related to various alterations in cognitive functions, including lack of insight, anomia, agnosia, alexia, as well as some core autistic features such as circumscribed behaviors and collectionism. Coherently with their involvement in cognitive functions, we found that the brain expression of these genes is mainly localized in cortical regions. Interestingly, many of the included genes are involved in the causation of different forms of neurodegenerative dementia (e.g., TREM2, GRN, PSEN1, C9ORF72, MAPT), including frontotemporal dementia (Firdaus and Li 2024; Hinz and Geschwind 2017; DeJesus-Hernandez et al. 2011). Common etiopathological mechanisms have been hypothesized between neurodegenerative dementia and neurodevelopmental conditions, including autism (Li et al. 2022; Nadeem et al. 2021; Fumagalli and Crippa 2022), based on the observations of partially overlapping symptoms (Nadeem et al. 2021; Rhodus et al. 2020) and increased occurrence of dementia in autistic adults (Vivanti et al. 2021). Moreover, these genes are also involved in biological processes specific to neurodevelopment, such as the production of neurotrophic factors (Wang, Chen, et al. 2022), synaptic development and refinement (Tian et al. 2024; Filipello et al. 2018), and broad synaptic functioning (Saura et al. 2011), which is coherent with their potential involvement in autism.
Third, the finding of a gene module related to gastrointestinal function is in line with the relatively recent hypothesis of gut involvement in the etiological mechanisms of autism (Dargenio et al. 2023; Bjørklund et al. 2020). Gastrointestinal symptoms are frequent in a large percentage of autistic individuals (Chaidez et al. 2014) and are associated with the severity of the condition (Adams et al. 2011). Alterations in gut microbiome are also commonly reported in autism (Xu et al. 2019; Liu et al. 2019). The impact of the gut microbiota on the brain, mainly via the bidirectional microbiota-gut-brain axis, has been broadly investigated in neurodevelopmental conditions (Morais et al. 2021). Our expression analyses revealed that genes in the gastrointestinal module were extensively expressed in the brain across all developmental stages. The present result, therefore, extends the previous finding of a surprisingly high proportion of autism-associated genes expressed both in the brain and in the gastrointestinal tract (Niesler and Rappold 2021), further indicating a common genetic pathway for alterations in these two systems.
Lastly, the immune system has been extensively involved in autism, both as a candidate etiological mechanism, for example, through maternal immune activation, and as a potential endophenotype, with converging evidence of immune dysregulation in autistic individuals (Meltzer and Van de Water 2017; Onore et al. 2012; Ormstad et al. 2018; Lombardo et al. 2018; Careaga et al. 2017; Masi et al. 2017). Moreover, genes controlling innate and adaptive immunity have been previously associated with autism (Horiuchi et al. 2021; Arenella et al. 2023), but also with autistic traits in the general population (Arenella et al. 2022). Interestingly, some of the immune genes possibly involved in autism are also implicated in neurodevelopmental processes such as neuronal plasticity, with their strongest expression in the brain during gestational and early postnatal age (Arenella et al. 2023; Estes and McAllister 2015). In contrast with those findings, we did not find here genes within the immune module to be particularly expressed in the brain along the neurodevelopmental stages. With this respect, it is important to have in mind that the gene expression profiles considered in this work are those reported in the BrainSpan Atlas of the Developing Human Brain for typically developing individuals. It is therefore possible that the presence of genetic variants in autistic subjects contributes to change the typical gene expression patterns, leading to the immune genes' over-expression in the brain that is reported elsewhere in the literature (Garbett et al. 2008; Voineagu et al. 2011). Furthermore, this result could also suggest a systemic rather than brain-localized role of the immune system in autism. Indeed, immune dysregulation can impact brain function through different pathways, including the gut-brain axis where the immune system is a crucial mediator (Azhari et al. 2019; Powell et al. 2017). Moreover, some circulating cytokines can reach the brain and inhibit neurogenesis or promote neuron death, whereas endogenous anti-brain antibodies may be produced, altering the development or function of neurons (Meltzer and Van de Water 2017; Estes and McAllister 2015).
After having identified the four modules with a data-driven analysis based on the initial top-down splitting of our sample on IQ, we applied a bottom-up approach to explore whether the variants enrichment could effectively identify different clusters of participants. Coherently with our initial stratification, we found that participants with IQ > 80 were particularly represented in one of the three SC, as also confirmed by a predictive logistic regression model. While this result aligns with our criterion for subgrouping participants, the moderate accuracy of the regression model suggests that other factors are implicated in the between-groups differences associated with the modules' enrichment. Nevertheless, the SC were not highly differentiated in terms of autism symptom severity. Future extensions of this work should combine different units of analysis (e.g., cells, circuits/networks, neurobiology) to further clarify linkages between genotype and clinical phenotypes.
In addition to pinpointing the gene modules and discussing their relationship to phenotypic characteristics of autism, we thoroughly explored the functional interplay between the modules by considering an extended set of genes including the co-expressed interactors of the genes in each module. We discovered that most of the genes within this extended set presented high levels of spatio-temporal brain co-expression across development. Consistently, genes with higher co-expression correlation levels also highly interacted at the level of protein–protein interactions. Likewise, genes from different modules were found to be mutually interacting. Complementing earlier reports (De Rubeis et al. 2014; Lombardo et al. 2017; Gupta et al. 2014), this work therefore provides further evidence supporting the idea that the biology of autism could likely implicate altered interactions between different modules rather than the existence of independent disrupted pathways, which cumulatively contribute to the clinical outcome.
Lastly, we found that SFARI high-confidence autism-related genes were overrepresented in the total set of genes in the extended modules, but not in the restricted group of genes in the modules. This suggests a significant connection, but no intersection, between the four pinpointed modules and genes previously associated with autism susceptibility. Nonetheless, this result also indicates that our unbiased method could effectively identify sets of genes in close relationship with autism-related genes, uncovering their potential role in shaping the heterogeneous autistic phenotype. Importantly, the distribution of SFARI high-confidence genes within the total set of genes in the four extended modules was not stochastic. Indeed, based on their spatio-temporal co-expression levels, we found three different gene clusters: one with higher interaction levels but relatively few SFARI genes, a second one with fewer interactions and fewer SFARI genes, and a third cluster displaying high levels of interaction and overrepresented SFARI genes. In this last cluster, the autism-related genes were specifically overrepresented within the extended networks of the neurocognition and gastrointestinal genes. Interestingly, the interaction between SFARI and neurocognition-related genes is in line with a previous large-scale exome sequencing study in autism (De Rubeis et al. 2014), which found a significant protein–protein interaction between autism genes and MAPT, a gene implicated in neurodegenerative disorders, that is also part of our neurocognition module. Previous work has also highlighted that above 90% of the 62 highest-ranking autism risk genes in the SFARI database are expressed in both brain and gastrointestinal tissues (Niesler and Rappold 2021), substantiating their interactions.
5 Limitations
Some limitations of the present study should be considered. The main limitation is related to the small sample sizes of participant groups; the present findings, therefore, must be considered preliminary. Future extensions of this work should include additional cohorts from publicly available sequencing cohorts (e.g., SPARK) as replication datasets. Nonetheless, this work serves as a “proof-of-concept” demonstrating the potential value of our analytical method, even within a restricted yet phenotypically well-characterized group of autistic participants. With respect to the genetic data, our analysis was limited to a subsection of the whole exome (~5200 genes) screened with a panel that exclusively enables analysis of disease-associated target genes. Notwithstanding this limitation, the present preliminary findings align with previous evidence from whole-exome sequencing studies. However, it should be noted that whole-exome/genome analysis could have yielded an even more comprehensive pattern of results. Moreover, by only considering the load of PAVs in a gene set, we neglect possible differences between each variant effect on the gene set associated function. The availability of this information could have facilitated functional connections between genetics and phenotypic manifestations. Lastly, at the present stage, the analysis of differences in the load of PAVs between subgroups of participants relies solely on IQ scores. Future extensions of this work should use the present multi-step approach to investigate genetic differences also between autism subtypes defined by their core and non-core features in motor, language, intellectual, and adaptive functioning.
6 Conclusions
In the present study, we have shown evidence that an unbiased, multi-step analysis could identify sets of genes involved in neurocognition, ion cell communication, gastrointestinal function, and immune system that are potentially related to the phenotypic differences among autistic children with different IQ levels. Besides being early expressed in the brain, such biological pathways are spatio-temporally co-expressed and highly interconnected with each other and with many autism-related genes. These findings therefore support the hypothesis that the diversity in autism likely originates from multiple interacting pathways that could be altered at various levels rather than deriving from a series of independent functional cascades with cumulative impact on the final outcome. Although these observations should be considered with caution, future research could leverage the present approach to identify genetic pathways relevant to autism subtyping, investigating a link between genetic profiles and distinct biotypes of autistic individuals.
Author Contributions
Conceptualization: L.F., U.P., and A.C. Methodology: G.S., L.F., M.Ma., R.G., M.V., U.P., and A.C. Software: L.F., M.Ma., and U.P. Formal analysis: G.S., L.F., M.Ma., and U.P. Investigation: R.G., M.V., S.B.C., L.V., E.M., M.N., M.Mo., and A.C. Data curation: G.S., L.F., S.B.C., U.P., and A.C. Writing – original draft preparation: G.S., U.P., and A.C. Writing – review and editing: G.S., L.F., U.P., and A.C. Visualization: G.S., L.F., U.P., and A.C. Supervision: U.P. and A.C. Project administration: U.P. and A.C. Funding acquisition: A.C.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.