A common intronic single nucleotide variant modifies PKD1 expression level
Funding information: National Center for Advancing Translational Science of the National Institute of Health under award number UL1TR002384
Abstract
Autosomal dominant polycystic kidney disease (ADPKD), caused by mutations in PKD1 and PKD2 (PKD1/2), has unexplained phenotypic variability likely affected by environmental and other genetic factors. Approximately 10% of individuals with ADPKD phenotype have no causal mutation detected, possibly due to unrecognized risk variants of PKD1/2. This study was designed to identify risk variants of PKD genes through population genetic analyses. We used Wright's F-statistics (Fst) to evaluate common single nucleotide variants (SNVs) potentially favored by positive natural selection in PKD1 from 1000 Genomes Project (1KG) and genotyped 388 subjects from the Rogosin Institute ADPKD Data Repository. The variants with >90th percentile Fst scores underwent further investigation by in silico analysis and molecular genetics analyses. We identified a deep intronic SNV, rs3874648G> A, located in a conserved binding site of the splicing regulator Tra2-β in PKD1 intron 30. Reverse-transcription PCR (RT-PCR) of peripheral blood leukocytes (PBL) from an ADPKD patient homozygous for rs3874648-A identified an atypical PKD1 splice form. Functional analyses demonstrated that rs3874648-A allele increased Tra2-β binding affinity and activated a cryptic acceptor splice-site, causing a frameshift that introduced a premature stop codon in mRNA, thereby decreasing PKD1 full-length transcript level. PKD1 transcript levels were lower in PBL from rs3874648-G/A carriers than in rs3874648-G/G homozygotes in a small cohort of normal individuals and patients with PKD2 inactivating mutations. Our findings indicate that rs3874648G > A is a PKD1 expression modifier attenuating PKD1 expression through Tra2-β, while the derived G allele advantageously maintains PKD1 expression and is predominant in all subpopulations.
1 INTRODUCTION
Autosomal dominant polycystic kidney disease (ADPKD [MIM: 173900 and 613 095]) is the most common inherited kidney disease with a prevalence of approximately 0.1%–0.25% of the population, causing 5%–10% of end-stage kidney disease (ESKD) worldwide.1, 2 Pathogenic mutations of PKD1 (polycystin 1, PC1, [MIM: 601313]) and PKD2 (polycystin 2, PC2, [MIM: 173910]) (PKD1/2), account for about 78% and 13% of pedigrees, respectively.3, 4 In the remaining ~10% of cases, pathogenic germline mutations of PKD1 and PKD2 were not identified.3-7 There is considerable heterogeneity in the rate of disease progression, even among affected family members, despite their shared pathogenic PKD gene mutations. This phenotypic heterogeneity has been attributed to various factors, including unrecognized risk variants or PKD gene expression modifiers.6-9 However, these factors remain incompletely defined.
PKD1/2 are highly polymorphic genes, with numerous sequence variants.5 To date, 7577 variants of PKD1 (ENSG00000008710.20) and 1064 variants of PKD2 (ENSG00000118762.8) have been marked in human genomes (gnomAD v3.1.2, Accessed April 4, 2022)10. Mutations occur throughout PKD1/2, and no significant hotspot region has been identified.5, 11, 12 The biological significance of many of these PKD1/2 variants is undefined.5, 11-14
The predominant genetic mechanism of cyst formation (cystogenesis) is a “two-hit” process involving germline and somatic inactivating mutations of PKD1/2.5, 12, 15 The likelihood of cystogenesis increases when the functional levels of PKD1/2 products, (i.e., polycystin 1, and polycystin 2, respectively) are below a critical threshold.9 Reduced gene dosage was reported in orthologous ADPKD models and patients with incompletely penetrant hypomorphic alleles that reduced the level of PKD1/2 gene expression and caused mild-to-severe ADPKD, with a phenotype that can be indistinguishable from a “two-hit” mechanism.13, 16, 17
Genes that are crucial to human fitness are thought to be under positive natural selection.18 Mutations occur randomly and those most commonly observed are considered neutral variants,19 remaining in genomes for generations, with the allele frequency varying by chance.19 However, if an allele confers a phenotype with better fitness or adaptation to environmental change, then the frequency of this “advantageous allele” will increase in human populations.18, 20 The strength of natural selection can be evaluated by comparing the variation of allele frequency within and among populations.21, 22 Wright's F-statistics (Fst) is a commonly used method to infer the action of natural selection upon human genomes22 and to screen and identify phenotype-related variants in distinct loci of human genomes.23, 24
PKD1/2 genes are essential to human fitness.25 We proposed that variants in the PKD1/2 loci have undergone positive selection during human migration and the advantageous variants have high Fst values. Given that the pathogenic variants of PKD1 account for approximately 80% of ADPKD pedigrees, in the current study, we preferentially evaluated the Fst values of common single nucleotide variants (SNVs) in PKD1 across subpopulations (African, Eurasian, and American) from the 1000 Genomes Project (1KG).26 Of the top 10% variants with the highest Fst values, we identified a novel deep intronic variant functioning as a PKD1 expression modifier.
2 MATERIALS AND METHODS
2.1 Study subjects
Study subjects were participants in the Rogosin Institute ADPKD Data Repository, a single-center, longitudinal study of genotype and phenotype characteristics of individuals with ADPKD (http://clinicaltrials.gov identifier: NCT00792155). All subjects provided written informed consent. The normal individuals were healthy blood donor volunteers as previously described.27 The studies were approved by the Institutional Review Board Committees at Weill Cornell Medicine (WCM) and Rockefeller University, respectively (New York, NY).
2.2 PKD1/2 gene testing for pathogenic mutations
Genomic DNA from 388 ADPKD subjects was analyzed for PKD1 and PKD2 mutations at Athena Diagnostics, Inc. (Worcester, MA), or Weill Cornell Medicine (WCM) New York, NY, using long range-PCR (LR-PCR) Sanger sequencing and/or next-generation sequencing (NGS).28 Mutation-negative patients were further tested by multiplex ligation-dependent probe amplification (MLPA) for copy number variation by Prevention Genetics (Marshfield, WI).
2.3 Population genetic and in silico analyses
Population genetic analyses were performed on 2403 variants of PKD1 (ENSG00000008710) from 2504 individuals of 26 subpopulations from Africa, East Asia, Europe, South Asia, and the Americas of the 1KG (phase-3, GRCh37/hg19).26 The common variants (minor allele frequency >0.01) were subjected to Fst analysis using PLINK.29 Variants with high Fst-values were further evaluated by Combined Annotation-Dependent Depletion (CADD) tool with 99% confidence interval (CI).21, 22, 25, 30 The effects of intronic variants on binding affinity of RNA-splicing factors/regulators were predicted by using RBPmap with a high stringency setting.31 The binding protein profiles were cross-checked using Human Splicing Finder V3.1 (HSF)32 and the predicted splice donor and acceptor sites were further evaluated by MaxEntScan.33
2.4 Minigene splicing assay
PKD1 exons 30–34 (2.6 Kb) were amplified from peripheral blood leukocytes (PBL) DNA of ADPKD patients and normal controls (Table S1).28 PCR products were subcloned into the pcDNA3-Flag mammalian expression vector. PKD1 rs3874648G > A changes in the rescue assay were introduced by directed mutagenesis. PKD1-minigene constructs were transiently transfected into Human Embryonic Kidney (HEK293T) cells using Lipofectamine 2000™ Reagent as described in the Supplemental Methods. RNA splicing was detected by RT-PCR using minigene-specific primers (Table S1). Testing was performed twice in triplicate. All constructs and splicing products were verified by Sanger sequencing. Recombinant DNA manipulations were done following WCM Biological Safety Program DNA Recombination regulations.
2.5 RT-PCR assay
Total RNA was extracted from PBL pellets or transfected HEK293T cells using TRIzol Reagent and subjected to first-strand cDNA synthesis using Applied Biosystems™ High-Capacity cDNA Reverse-Transcription Kit following the manufacturer's instructions (ThermoFisher). Amplification of PKD1 SNV-containing and minigene splicing products was performed in the PrimeSTAR GXL system (Takara Bio), containing 10% dimethyl sulfoxide and 500 mM Betaine on a Biometra TRIO Thermocycler. Amplicons were analyzed by agarose-gel electrophoresis with ethidium-bromide staining.
2.6 Quantitative RT-PCR
Complementary DNA was synthesized from 1 μg PBL RNA using random primers and the High Capacity cDNA Archive Kit (ThermoFisher).34 PKD1 expression level relative to the housekeeping gene Ribonuclease P Protein Subunit p30 (RPP30) was quantified using PKD1-specific primers (Table S1) and the ABI 7500 Real-Time thermocycler. Data analysis was performed by the comparative threshold cycle (Ct) method using the SDS v2.2.2 software (ThermoFisher) as previously described.34 Experiments were performed in duplicate, and each sample was analyzed in triplicate.
2.7 TaqMan allelic discrimination assay
PKD1:c.10051-239G > A (rs3874648:G > A) SNP genotyping was performed using a ThermoFisher Custom TaqMan® Assay as described in the Supplemental Material.
2.8 RNA pull-down assay
Pull-down bait RNAs were transcribed in vitro in RNA Synthesis system (E2070) from DNA fragments containing variants of interest. Amplified-bait RNAs were purified using RNeasy kits (QIAGEN), biotinylated by T4 RNA ligase and coupled with Streptavidin-Magnetic Beads (Pierce™). Bait-RNA-magnetic beads were incubated with HEK293T cell nuclear extracts in the presence of protease inhibitors, washed with incubating buffer, and suspended in 1X Laemmli Sample Buffer, then heated at 100°C for 10 min, and stored at −80°C (Figure S1). The Bait-RNA-magnetic-beads-bound proteins were detected by Western blot with rabbit anti-Tra2-β (GeneTex, GTX114752) or rabbit anti-SNRPA (Small Nuclear Ribonucleoprotein Polypeptide A) (GeneTex, GTX101664) primary antibody, and fluorescence-conjugated secondary antibodies.
2.9 Statistical analysis
An unpaired t-test was used to compare the difference in the relative mRNA expression level analyzed between patients and normal individuals. A two-sided p-value <0.05 was considered significant.34 One-way analysis of variance and contingency analysis were used to compare the differences in variables among the -GA variants (JMP Pro Statistical Software16.1.0, Cary, NC).
3 RESULTS
3.1 Identification of putative functional PKD1 variants
The 211 common variants from 2403 PKD1 SNVs of 2504 individuals were subjected to Fst scoring to quantify the selection pressure upon the PKD1 locus (Figure 1A).21, 22, 29 Of the 22 variants ranking in the upper 10% Fst scores, ten have been classified as “Benign” and “Likely Neutral” in ClinVar35 and Mayo ADPKD Mutation databases,36 respectively, while the remaining variants were unclassified (Table 1). We then performed MSC-associated CADD evaluation on these unclassified variants to prioritize the variants for further analysis (Table S2).25, 30 Of these, four were excluded since they were located either in 3′-UTR or in the low-complexity genomic regions.10 (Figure 1B and Table S2). The remaining variants, located in introns, were evaluated using RBPmap for RNA splicing factor/regulator profile analysis.31 The results indicated that variants rs7206195C > T, rs7185040A > C, and rs11861948A > G did not alter while rs116189075A > G, rs114796022T > C, rs12926737T > C, and rs58999880G > T only slightly altered RNA-binding protein affinity to these sites (Table S3). By contrast, variant rs3874648G > A (PKD1 c.10051-239G > A) dramatically enhanced SRSF10 (human/mouse) and Tra2 (drosophila) binding to intron 30 (Table S4). Consistently, HSF analysis confirmed that substitution of G to A either strengthens the binding affinity of Transformer 2 Beta (Tra2-β) by 13%–42%, alters the 9G8 binding site to become a Tra2-β-preferred binding site, or creates a new binding site for Tra2-β (Table 2). HSF also showed that binding affinities of other splicing enhancers or silencers to this flanking region are unaffected (Tables S5 and S6).

SNP | Position (GRCh37) | Annotations | Exon/Intron | Alternative allele | Fst score | AFR | AMR | EAS | EUR | SAS | ClinVar annotation | Mayo annotation |
---|---|---|---|---|---|---|---|---|---|---|---|---|
rs11866494 | 2 141 396 | PKD1 c.11712 + 28C > G | Intron42 | G | 0.408 | 72.70% | 20.20% | 0.00% | 18.60% | 6.30% | Not reported in ClinVar | Likely neutral |
rs11862600 | 2 141 343 | PKD1 c.11712 + 81A > G | Intron42 | G | 0.398 | 71.40% | 19.60% | 0.00% | 18.60% | 6.00% | Not reported in ClinVar | NA |
rs7206195 | 2 145 280 | PKD1 c.10499 + 1362C > T | Intron34 | T | 0.397 | 72.10% | 21.20% | 0.00% | 19.90% | 6.10% | Not reported in ClinVar | NA |
rs7185040 | 2 145 787 | PKD1 c.10500-1069A > C | Intron34 | C | 0.395 | 72.10% | 21.20% | 0.00% | 19.50% | 6.60% | Not reported in ClinVar | NA |
rs58999880 | 2 146 528 | PKD1 c.10499 + 621G > T | Intron34 | T | 0.389 | 71.90% | 21.20% | 0.00% | 19.90% | 7.00% | Not reported in ClinVar | NA |
rs7203729 | 2 140 010 | PKD1 c.12630A > G,(p.Pro4210=) | Exon46 | G | 0.385 | 72.50% | 21.00% | 0.00% | 20.30% | 8.60% | Benign | Likely neutral |
rs12926737 | 2 146 282 | PKD1 c.10499 + 867 T > C | Intron34 | C | 0.381 | 71.10% | 21.00% | 0.00% | 19.90% | 7.00% | Not reported in ClinVar | NA |
rs3087632 | 2 140 454 | PKD1 c.12276 T > C,(p.Ala4092=) | Exon45 | C | 0.378 | 71.90% | 21.00% | 0.00% | 19.90% | 9.00% | Benign | Likely neutral |
rs61374883 | 2 154 158 | PKD1 c.8162-263_8162-262insAG | Intron22 | AG | 0.371 | 89.60% | 33.60% | 11.70% | 26.70% | 24.70% | Not reported in ClinVar | NA |
rs3874648 | 2 148 224 | PKD1 c.10051-239G > A | Intron30 | A | 0.327 | 57.10% | 10.40% | 0.40% | 9.40% | 0.60% | Not reported in ClinVar | NA |
rs2549677 | 2 162 361 | PKD1 c.3275A > G,(p.Met1092Thr) | Exon14 | G | 0.318 | 53.80% | 14.30% | 0.00% | 9.40% | 1.40% | Benign | Likely neutral |
rs3087631 | 2 139 127 | PKD1 c.*601A > T,3'UTR | 3'UTR | T | 0.318 | 65.20% | 20.50% | 0.00% | 20.00% | 9.20% | Not reported in ClinVar | NA |
rs2369068 | 2 162 887 | PKD1 c.3063A > G,(p.Gly1021=) | Exon13 | G | 0.315 | 53.60% | 14.30% | 0.00% | 9.40% | 1.40% | Benign | Likely neutral |
rs11861948 | 2 163 562 | PKD1 c.2854-269A > G | Intron11 | G | 0.315 | 53.60% | 14.30% | 0.00% | 9.40% | 1.40% | Not reported in ClinVar | NA |
rs114796022 | 2 165 740 | PKD1 c.1850-114 T > C | Intron9 | C | 0.311 | 54.40% | 15.10% | 0.00% | 10.50% | 1.80% | Not reported in ClinVar | NA |
rs116189075 | 2 165 737 | PKD1 c.1850-111A > G | Intron9 | G | 0.308 | 54.10% | 15.10% | 0.00% | 10.50% | 1.80% | Not reported in ClinVar | NA |
rs10960 | 2 140 680 | PKD1 c.12133 T > C,(p.Ile4045Val) | Exon44 | C | 0.306 | 63.50% | 20.50% | 0.00% | 19.70% | 8.50% | Benign | Likely neutral |
rs28575767 | 2 156 021 | PKD1 c.7708A > G,(p.Leu2570=) | Exon20 | G | 0.306 | 52.10% | 13.80% | 0.00% | 9.00% | 1.40% | Benign | Likely neutral |
rs9928278 | 2 152 651 | PKD1 c.8949-17 T > C | Intron24 | C | 0.276 | 56.90% | 17.60% | 0.00% | 16.40% | 6.70% | Benign | Likely neutral |
rs13332377 | 2 138 869 | PKD1 c.*859C > T,3'UTR | 3'UTR | T | 0.271 | 50.20% | 14.60% | 0.00% | 11.80% | 2.00% | NA | NA |
rs77028972 | 2 152 387 | PKD1 c.9196A > G,(p.Phe3066Leu) | Exon25 | G | 0.265 | 55.50% | 17.60% | 0.00% | 16.40% | 6.60% | Benign | Likely neutral |
rs9935834 | 2 152 388 | PKD1 c.9195C > G,(p.Val3065=) | Exon25 | G | 0.265 | 55.50% | 17.60% | 0.00% | 16.40% | 6.60% | Benign | NA |
- Note: Of the 22 variants ranking in the upper 10% Fst scores, three were nonsynonymous variants, five were synonymous variants, two were in the 3'-UTR, and the remainder were in PKD1 intronic regions. Of these, eight were in genic regions, and fourteen were in the nongenic regions. Of the 22 variants, ten were classified as “Benign” or “Likely neutral” in the public database, while the remaining twelve variants were unclassified and their significance unknown. AFR, African; AMR, Admixed American; EAS, East Asian; EUR, European; SAS, South Asian; NA, not available.
Sequence position (GRCh38) | Linked ESE protein | Reference Motif (value 0–100) | Linked ESE protein | Mutant Motif (value 0–100) | Variation |
---|---|---|---|---|---|
chr16:2098227 | Tra2-β | aaaag (83.21) | Tra2-β | aaaaa (94.14) | 13.14% |
chr16:2098226 | 9G8 | aaagaa (66.18) | Tra2-β | aaaaa (94.14) | 42.25% |
chr16:2098224 | Tra2-β | aaaaa (94.14) | New site | ||
chr16:2098223 | 9G8 | gaaaat (66.38) | Tra2-β | aaaaa (94.14) | 41.82% |
- Note: Tra2-β binding site computational analysis. Analysis of splicing regulators' binding motifs in intron 30 of PKD1 using Human Splicing Finder demonstrated that rs3874648G > A substitution is predicted to act by either increasing the affinity or creating a new binding site for Tra2-β in the intron 30 of PKD1. Tra2-β: Transformer2-β; 9G8: Serine/Arginine-Rich Splicing Factor 7.
3.2 RNA expression studies and qRT-PCR analysis
To identify the RNA expression profile associated with these variants, we performed RT-PCR analysis in PBL RNAs. RT-PCR analysis with primer set (PKD1-Ex30-F1/PKD1-Ex31-R1) demonstrated a higher molecular weight extra band (~270 bp) in PBLs of the ADPKD patient homozygous for the rs3874648-A allele that was not detectable in the normal control carrying the rs3874648-G alleles (Figure 2A). Sequence analysis of this distinct band demonstrated an insertion of 41 bp of intronic sequence resulting in premature termination of translation at codon K3350 (PKD1 p.V3351X) (Figure 2B,C). No abnormal splicing events were detected in the remaining intronic variants (rs116189075A > G, rs114796022T > C, rs11861948A > G, and rs11862600A > G) of the same RNA sample (Figure S2).

In normal individuals, the qRT-PCR analysis demonstrated that PKD1 mRNA levels were ~10% (p < 0.05) lower in PBLs heterozygous for rs3874648G > A compared to PBLs bearing rs3874648-G/G alleles (Figure 2D). This observation was consistent with the Genotype-Tissue Expression survey (see Section 4).37 To determine whether variant rs3874648G > A affected PKD1 expression in ADPKD patients, we evaluated PKD1 mRNA levels by qRT-PCR in PKD2-mutation-positive patients. PKD1 expression levels in PKD2 mutation-positive patients homozygous for rs3874648-G were significantly lower (~70%) than in normal subjects carrying the G/G genotype (Figure 2E). Furthermore, in these ADPKD patients, PKD1 mRNA levels were ~30% (p < 0.05) lower in PBLs bearing the rs3874648-G/A variant than in rs3874648-G homozygotes. No PBL was available from rs3874648-A/A normal controls or ADPKD patients with a PKD2 mutation for this analysis.
3.3 Functional characterization of PKD1 rs3874648G > A
Computational analysis of PKD1 intron 30 demonstrated the presence of a putative cryptic acceptor splice-site (gagcagGT) distal from the authentic acceptor site of exon 31 with a slightly lower consensus value compared to the canonical acceptor site (ctgcagGT) (87.07 vs. 93.71 by HSF and 7.51 vs. 11.85 by MaxENTScan) (Table 3 and Table S7, respectively).32
Sequence position (GRCh38) | Splice site type | Motif | Potential splice site | Consensus value (0–100) | HSF result interpretation |
---|---|---|---|---|---|
chr16:2098025 | Acceptor | tgtcctgagcaggt | tgtcctgagcagGT | 87.07 | Alternative 3′ splicing site |
chr16:2097984 | Acceptor | ctcgtcctgcagGT | ctcgtcctgcagGT | 93.71 | Canonical 3′ splicing site |
- Note: Splice site analysis using Human Splicing Finder identified a cryptic 3′ acceptor splicing site 41 bp from the canonical 3′ splicing site. This cryptic site yielded a high consensus value similar to that of the canonical 3′ splice site (87.07 vs. 93.71).
To determine the functional effect of rs387464G > A on PKD1 RNA cryptic splicing, we conducted a minigene splicing assay, spanning exon 30 to intron 33 (Figure 3A). RT-PCR analysis of the minigene transcripts in HEK293T cells revealed that the pcDNA3.1-rs3874648-A construct generated a distinct larger size band of ~750 bp compared to the product size (~700 bp) obtained with the pcDNA3.1-rs3874648-G allele (Figure 3B). A confirmation assay substituting the G > A in pcDNA3.1-rs3874648-G (pcDNA3.1-rs3874648-G > A) and A > G in pcDNA3.1-rs3874648-A (pcDNA3.1-rs3874648-A > G) affirmed that rs3874648G > A was the only SNV involved in regulating intron 30 cryptic splicing event (Figure 3B, Figure S3).

The rs3874648G > A variant is located in a conserved adenine-rich binding site of Tra2-β, a critical RNA splicing regulator protein,38, 39 therefore, we performed RNA-protein-pulldown in HEK293T cell nuclear extracts to evaluate the interaction between Tra2-β and rs3874648:G > A variant using RNA baits (Figure 3C and Figure S1). Compared to the motif containing the rs3874648-G allele, Tra2-β binding to rs3874648-A containing-motif was 4-fold stronger (Figure 3D,E, Figure 4). By contrast, the affinity of SNRPA to its binding motifs in the proximal region of the RNA baits was comparable for both variants, indicating a specific effect of Tra2-β (Figure 3D).

3.4 Assessing variant prevalence in ADPKD cohort
Our results indicate that the rs3874648-A allele increased the binding affinity of Tra2-β to PKD1 intron 30 leading to lower levels of PKD1 mRNA by partial intron retention. We assessed the prevalence of rs3874648G > A in our ADPKD cohort by allelic discriminating assay. Overall, the frequency of the rs3874648-A allele was 0.1095 in the ADPKD cohort (Table S8). In total, we identified eight homozygotes for the rs3874648-A allele. Of these, three carried pathogenic mutations in PKD1 and two patients had pathogenic mutations in PKD2. In the other three subjects, no pathogenic mutations in PKD1 or PKD2 were identified by LR-PCR-NGS and MLPA. Although the rs3874648-A allele distributes normally in the ADPKD cohort according to Hardy–Weinberg Equilibrium, the prevalence of rs3874648-A homozygotes in the patients without a detectable PKD1/2 mutation was slightly but significantly higher than predicted (p = 0.009) (Table S9). In a multivariate analysis, we analyzed the variables that have a strong impact on disease severity (ht-TKV, GFR, or hypertension). We found no association between rs3874648-G/A and patient characteristics known to be associated with ADPKD severity, for example, total kidney volume, Mayo Clinic Classification, age of onset of ESKD, or hypertension (Figure S4).
4 DISCUSSION
Many SNVs in PKD1/2 genes have been previously reported, but their pathogenicity, penetrance, and molecular mechanisms are incompletely defined.5 In this study, we identified a common PKD1 variant, rs3874648G > A, which can alter the expression of full-length PKD1 by modifying its binding affinity to splicing factor Tra2-β and activating a cryptic splicing site leading to PKD1 premature termination. To our knowledge, our report is the first to show a significant effect of a common SNV on PKD1 expression in ADPKD patients by using a combination of population genomics and molecular genetics analyses. There was no statistically significant association between rs3874648-G/A variant type and clinical characteristics or Mayo Classification. However, the number of rs3874648-A homozygotes in our cohort was very small to allow the drawing of meaningful conclusions.
rs3874648G > A is located in a conserved Tra2-β binding site in PKD1 intron 30. Using a minigene functional assay, we showed that rs3874648-A disrupted normal splicing by engaging an adjacent cryptic acceptor splice site predicted to truncate the PKD1 protein, thereby decreasing levels of full-length PKD1 transcript. This adverse effect of the -A allele was “rescued” by substituting it with the -G allele, thereby restoring normal PKD1 mRNA transcript levels. Accordingly, PKD1 transcript levels were ~10% lower in PBL from rs3874648-G/A carriers compared to rs3874648-G/G carriers in a small cohort of normal individuals. Moreover, in our ADPKD cohort with a PKD2 mutation, PKD1 expression was ~30% lower in rs3874648-G/A carriers than in rs3874648-G/G carriers. Notably, PKD1 expression levels were ~70% lower in ADPKD patients having PKD2 mutations who also had the rs3874648-G/G genotype compared to non-ADPKD controls with this variant. This may reflect interdependent regulation of PKD1/2, previously reported in human and mouse cell lines, and attributed to PKD2 depletion-induced anomalies in PKD1 (post-)transcription regulations.40, 41
The comparative genomics analysis revealed that the allele rs3874648-A was shared in primates (Figure S5) and Neanderthals and Denisovans (Figure S6). Compared with the rs3874648-A allele, the derived rs3874648-G-allele in PKD1 has a lower binding affinity to Tra2-β, enabling the latter to maintain higher PKD1 transcript levels. These results are consistent with the Genotype-Tissue Expression (GTEx) dataset of a large cohort of normal participants that reported an association of rs3874648-G with higher PKD1 transcript levels in numerous tissue types.37 Compared to rs3874648-A carriers, the mean PKD1 mRNA level in those with the rs3874648-G/G genotype was 33% higher in kidneys and 18% higher in non-kidney tissues (accessed on October 01, 2021) (Figure S7).37
The association of higher PKD1 expression levels with rs3874648-G was significant in tissues with relatively high expression levels of Tra2-β.37 Tra2-β is also known as splicing factor arginine/serine (RS)-rich 10 (SFRS10), a transformer 2 (Tra2) homolog in Drosophila, and one of the strongest splicing regulators found in RS-rich protein family members,38 regulating alternative splice-site selection in a dosage-dependent manner.38, 39 A comprehensive transcriptome-wide Tra2-β binding site analysis in human cells indicated that Tra2-β preferentially binds to adenine-rich motifs.42 Increased expression levels of Tra2-β were reported during oxidative stress,43 which can be magnified by hyperglycemia in diabetes, angiotensin II in hypertension, and hypoxia in ischemia injury.44-46 In these situations, elevated Tra2-β protein levels might further decrease PKD1 expression in carriers of the rs3874648-A allele. Accordingly, reduced functioning renal mass due to acute or chronic kidney disease, or a hypomorphic PKD1/2 allele might further reduce PC1 production through the previously proposed “third-hit” mechanism47 and accelerate the progression of ADPKD in those with the rs3874648-A-allele. Altogether, these data support the hypothesis that lower PKD1 transcript levels associated with the rs3874648-A-allele could increase the risk for ADPKD by maintaining the dosage of PC1 at or below a critical threshold in susceptible individuals.9, 13, 14, 17
The common variant rs3874648-A is present across all populations with an overall prevalence of 0.189 in 1KG and 0.194 in gnomAD (Tables S10 and S11).10 The prevalence of the PKD1 rs3874648-A variant in our cohort was 0.1095, in agreement with gnomAD for Caucasians in Europe (0.1094), but lower than reported in African/African Americans (0.5024). This likely reflects the higher prevalence of Caucasian subjects in our cohort (74%). Because of the disproportionately high prevalence of the rs3874648-A variant, a higher rate of ADPKD in the African population is anticipated. Although the prevalence of ADPKD in the African population is likely to be underdiagnosed,48 in California, the crude prevalence of ADPKD for Blacks was 73.0, non-Hispanic Whites 63.2, Hispanics 39.9, and Asian/Pacific Islanders 48.9 per 100 000 (p < 0.001).49 However, our study was not designed to assess the association of this variant with the prevalence of ADPKD in various populations.
Genetic variants that affect RNA splicing have been reported in PKD genes.50 Previously, we characterized a novel missense mutation in PKD2 (p.L441CfsX4) that abolished a conserved acceptor splice site of intron 5, resulting in premature translation termination, to a lesser magnitude, in PBLs from other ADPKD patients as well as normal individuals.27 Even under physiological conditions, two long polypyrimidine regions in human PKD1 introns 21 and 22 were associated with abnormal splicing across these introns and early transcript termination, potentially reducing PKD1 expression levels below the “cystogenic threshold.”51 Xie et al. have recently identified eight rare PKD1 intronic variants extracted from the Mayo Clinic ADPKD and ClinVar databases shown to alter RNA splicing by minigene assays.52
Strengths of this study include the use of well-established LR-PCR-NGS techniques with high sensitivity and call rates, and the recruitment of ADPKD patients with standard selection criteria in a single health system. Results from population and molecular genetics strongly support the functional significance of the rs3874648G > A variant in regulating PKD1 expression.
A limitation of this study was the small cohort (n = 8) homozygous for the rs3874648-A-allele available for clinical traits association and functional studies (Table S12). PBLs were available from only two ADPKD patients with the rs3874648-G/A variant, and from one (MC9-002572) who was homozygous for rs3874648-A. No mutation of PKD1/2 was identified in the latter patient, who demonstrated a ~40% lower PKD1 transcript level in PBL, likely reflecting either an undetected mutation in PKD1/2 or another gene causing an ADPKD phenotype (Figure S8). Nonetheless, we were able to identify a higher-than-expected prevalence of rs3874648-A-allele homozygotes in the PKD1/2 mutation-negative cohort and demonstrate the functional significance of the rs3874648-A-allele and the pathogenic mechanism of its action. Additional studies are required to clarify the impact of this variant on ADPKD phenotype including in patients with no identified exonic PKD1/2 mutations.
In summary, through population genomics and functional analyses, we identified a common deep intronic variant rs3874648G > A in PKD1 that acts as a gene expression modifier by increasing the binding affinity of splicing regulator Tra2-β, activating a cryptic splicing site, and reducing levels of full-length PKD1 mRNA by disrupting normal splicing. Molecular genetics analysis supports the derived rs3874648-G allele as the advantageous allele predominating in all subpopulations.
ACKNOWLEDGMENTS
We are grateful to all patients and their families for their invaluable participation. We thank Dr. Yanlin Wang in the Department of Medicine, University of Connecticut Health Center for sharing the pcDNA3.1-Flag mammalian expression vector. Research reported in this publication was supported by the National Center for Advancing Translational Science of the National Institute of Health under award number UL1TR002384.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.