Common origin of pure and interrupted repeat expansions in spinocerebellar ataxia type 2 (SCA2)†
How to Cite this Article: Ramos EM, Martins S, Alonso I, Emmel VE, Saraiva-Pereira ML, Jardim LB, Coutinho P, Sequeiros J, Silveira I. 2009. Common Origin of Pure and Interrupted Repeat Expansions in Spinocerebellar Ataxia Type 2 (SCA2). Am J Med Genet Part B 153B:524–531.
Abstract
The spinocerebellar ataxia type 2 (SCA2) is an autosomal dominant neurodegenerative disease characterized by gait and limb ataxia. This disease is caused by the expansion of a (CAG)n located in the ATXN2, that encodes a polyglutamine tract of more than 34 repeats. Lately, alleles with 32–33 CAGs have been associated to late-onset disease cases. Repeat interruptions by CAA triplets are common in normal alleles, while expanded alleles usually contain a pure repeat tract. To investigate the mutational origin and the instability associated to the ATXN2 repeat, we performed an extensive haplotype study and sequencing of the CAG/CAA repeat, in a cohort of families of different geographic origins and phenotypes. Our results showed (1) CAA interruptions also in expanded ATXN2 alleles; (2) that pathological CAA interrupted alleles shared an ancestral haplotype with pure expanded alleles; and (3) higher genetic diversity in European SCA2 families, suggesting an older European ancestry of SCA2. In conclusion, we found instability towards expansion in interrupted ATXN2 alleles and a shared ancestral ATXN2 haplotype for pure and interrupted expanded alleles; this finding has strong implications in mutation diagnosis and counseling. Our results indicate that interrupted alleles, below the pathological threshold, may be a reservoir of mutable alleles, prone to expansion in subsequent generations, leading to full disease mutation. © 2009 Wiley-Liss, Inc.
INTRODUCTION
Autosomal dominant spinocerebellar ataxias (SCAs) are a clinically heterogeneous group of neurological degenerative diseases. The main clinical symptoms are gait and limb ataxia, often accompanied by pyramidal signs, peripheral neuropathy, ophthalmologic, or cognitive signs. There are, at least, 30 loci known to be implicated in SCAs. The cause is often the expansion of an oligonucleotide (tri- or pentanucleotide) repeat; most are due to the expansion of a (CAG)n, which encodes a toxic polyglutamine tract in the specific protein. Although very variable among populations, several epidemiological studies estimate the prevalence of inherited ataxias to be less than six cases per 100,000 persons worldwide [Sequeiros et al., in press].
Wadia and Swami [1971] described several families from India, with ataxia and very slow ocular movements. Several years later, a form of dominant olivopontocerebellar atrophy was reported in northeastern Cuba, with a prevalence of up to 500:100,000 in the province of Holguin [Orozco et al., 1989; Velázquez-Pérez et al., 2001]. This form was later associated to the expansion of a (CAG)n in the ATXN2 gene, and named SCA2 [Imbert et al., 1996; Pulst et al., 1996; Sanpei et al., 1996]. Holguin's ataxia and Wadia and Swami's proved to be the same clinical entity [Wadia et al., 1998].
The trinucleotide repeat is located in exon 1 of the ATXN2 gene, on chromosome 12q24.1. In the first reports, normal alleles were found to range 14–31 repeats and to be stably transmitted, whereas expanded mutant alleles had over 34 repeat units. More recently, alleles with 32–33 triplets have been associated to late-onset patients of SCA2 [Costanzi-Porrini et al., 2000; Fernandez et al., 2000; Kim et al., 2007]. The repeat has one to four CAA interruptions in normal alleles, whereas expanded alleles usually contain a pure tract [Imbert et al., 1996; Pulst et al., 1996; Sanpei et al., 1996; Choudhry et al., 2001; Sobczak and Krzyzosiak, 2005]. Although these interruptions may not fully prevent the pathogenesis of expanded alleles (as the CAA triplet encodes also a glutamine), some authors have proposed that they may enhance its intergenerational stability, by inhibiting strand slippage [Costanzi-Porrini et al., 2000; Choudhry et al., 2001].
Recently, ATXN2 expansions have been reported in familial parkinsonism and even in sporadic parkinsonism [Gwinn-Hardy et al., 2000; Shan et al., 2004]. These cases show late-onset, slow progression of symptoms, bradykinesia, rigidity, and sustained response to l-dopa, and are caused by borderline expansions of 32–43 repeats, interrupted by CAA triplets [Costanzi-Porrini et al., 2000; Gwinn-Hardy et al., 2000; Shan et al., 2001; Furtado et al., 2002; Lu et al., 2002].
To investigate the mutational origin and the instability of ATXN2 alleles, in a cohort of families with different geographic origins, we characterized the repeat tract configuration and determined the flanking haplotype of normal and expanded chromosomes with intragenic SNPs and microsatellites.
SUBJECTS AND METHODS
SCA2 Families and Control Population
This study was carried out in 17 unrelated SCA2 families, comprising 51 subjects, including patients and family members, of diverse ethnic origins, namely Portuguese (7), Brazilian (7), Indian (2), and Italian-American (1) [Suite et al., 1986; Silveira et al., 2002]. In addition, six isolated cases of ataxia with alleles of undetermined significance (31–32 repeats) were included. A total of 77 normal chromosomes from different geographic origins, including Portuguese, Brazilian, Indian, and Italian were also analyzed. Peripheral blood samples were collected from patients and healthy individuals after a written informed consent was obtained. Genomic DNA was obtained as previously reported [Silveira et al., 2002].
Genotyping and Sequencing
Repeat sizes at the ATXN2 were determined by PCR amplification, using primers previously described [Pulst et al., 1996]. PCR amplification of markers D12S1333 and D12S1672 was carried out with 0.6 µM of each primer, 200 µM of dNTPs, 1.5 mM MgCl2, 0.75 U of Taq DNA polymerase (Fermentas, Burlington, Ontario), and 2% formamide, in a final volume of 25 µl. The size of the fluorescently labeled PCR fragments was determined using the ABI PRISM 310 automated DNA Sequencer (Applied Biosystems, Foster City, CA) and GeneScan version 3.1.2 software. Samples with accurate size, assessed by sequencing, were used as controls in each PCR.
SNPs rs695871 and rs695872 were detected in phase with the CAG repeat by PCR, with primers SCA2-FP3 and SCA2-RP3 (Fig. 1), as described [Choudhry et al., 2001], followed by sequencing. PCR was carried out with 1 µM of each primer, 260 µM of dNTPs, 1.25 mM MgCl2, 2.5 U of high fidelity DNA polymerase (Fermentas), and 10% DMSO, in a final volume of 30 µl. PCR products were purified from agarose gel, using GFX™ pCR DNA and Gel Band Purification kit (GE Healthcare, Buckinghamshire, UK), and sequencing was performed using a Big Dye Terminator Cycle Sequencing version 1.1 Ready reaction (Applied Biosystems, Warrington, UK), in an ABI PRISM 310 automated DNA sequencer. Whenever required, the PCR products were purified and cloned into a TOPO pcR4 vector, from the TOPO TA cloning kit (Invitrogen, Carlsbad, CA), according to manufacturer's instructions.

Scheme of the relevant portion of exon 1 of ATXN2. The CAG repeat with the poly(P) and its flanking region is represented, showing the location of intragenic SNPs (rs695871 and rs695872) and microsatellite markers (D12S1333 and D12S1672) used for the haplotype reconstruction.
Haplotype Analysis
Haplotype analysis of markers D12S1333 and D12S1672, spanning a region of ∼300 kb flanking the repeat—cen-D12S1672-CAG/CAA repeat-D12S1333-tel (Fig. 1)—was based on segregation analysis. When the allelic phase could not be determined by segregation analysis, PHASE 2.1 software (www.stat.washington.edu/stephens/software.html) was used; based on homozygosity, and taking into account the known allelic phases introduced, the probability of all possible haplotype combinations was calculated for each individual.
Average gene diversity over D12S1333 and D12S1672 loci, that is, the probability that two randomly chosen haplotypes are different in the sample, was estimated for our Portuguese and Brazilian SCA2 populations, as well as for others from the literature. Analysis by Arlequin version 3.11 [Excoffier et al., 2005].
RESULTS
Interruption Pattern of Indeterminate and Expanded Alleles
Previous studies have reported CAA interruptions in normal alleles [Choudhry et al., 2001]. In this study, repeat analysis showed CAA interruptions also in indeterminate penetrance and fully expanded ATXN2 alleles (Table I). All six indeterminate alleles, with 31–32 repeats, had an interrupted configuration: five showed a CAA interruption proximal to the 3′-end and shared the C–C SNP-based haplotype with expanded alleles, whereas a 32 CAG/CAA allele had three interruptions, (CAG)8CAA(CAG)9CAA(CAG)4CAA(CAG)8, and was associated to the G–T haplotype. The configuration of indeterminate alleles with the C–C haplotype and expanded alleles with interruptions was similar: (CAG)22–34CAA(CAG)8,9. Interrupted alleles with 33–44 repeats were previously reported in one family with parkinsonism [Socal et al., 2009]. In this study, these alleles (33–44) showed one CAA interruption at the 3′-end (Table I); each patient from this family had an expanded allele and also an interrupted 33 repeat allele; from the analysis of intragenic SNPs and flanking microsatellites (Table II, family BR2), all these 33–44 alleles showed the same extended haplotype (10-C–C-5). This indicates that they had the same origin and resulted from repeat instability. Two patients in a different family had a pure tract of 38 CAGs, followed by (CCG)3CCC, instead of the most commonly observed (CCG)2CCC sequence.
Repeat length | Number of chromosomes | Sequence configuration |
---|---|---|
Interrupted GT alleles | ||
22 | 50 | (CAG)8CAA(CAG)4CAA(CAG)8 |
27 | 2 | (CAG)8CAA(CAG)4CAA(CAG)4CAA(CAG)8 |
32 | 1 | (CAG)8CAA(CAG)9CAA(CAG)4CAA(CAG)8 |
Interrupted CC alleles | ||
19 | 1 | (CAG)9CAA(CAG)9 |
20 | 1 | (CAG)11CAA(CAG)8 |
22 | 19 | (CAG)13CAA(CAG)8 |
22 | 1 | (CAG)8CAA(CAG)4CAA(CAG)8 |
23 | 2 | (CAG)14CAA(CAG)8 |
23 | 1 | (CAG)13CAA(CAG)9 |
31 | 4 | (CAG)22CAA(CAG)8 |
32 | 1 | (CAG)23CAA(CAG)8 |
33 | 1 | (CAG)23CAA(CAG)9 |
34 | 1 | (CAG)24CAA(CAG)9 |
44 | 1 | (CAG)34CAA(CAG)9 |
Pure CC alleles | ||
34 | 1 | (CAG)34 |
36 | 2 | (CAG)36 |
37 | 3 | (CAG)37 |
38 | 9 | (CAG)38 |
39 | 5 | (CAG)39 |
40 | 2 | (CAG)40 |
41 | 1 | (CAG)41 |
43 | 1 | (CAG)43 |
44 | 2 | (CAG)44 |
45 | 1 | (CAG)45 |
47 | 1 | (CAG)47 |
48 | 1 | (CAG)48 |
50 | 1 | (CAG)50 |
Origin | Family | D12S1333 allele | SNP haplotype | D12S1672 allele | |
---|---|---|---|---|---|
Portuguese | PT1 | 7 | CC | 7 | (0.547)a |
PT2 | 10 | CC | 5 | (0.846)a | |
PT3 | 3 | CC | 7 | ||
PT4 | 10 | CC | 5 | (0.977)a | |
PT5 | 9 | CC | 7 | ||
PT6 | 3 | CC | 7 | (0.678)a | |
PT7 | 4 | CC | 7 | (0.698)a | |
Brazilian | BR1 | 3 | CC | 9 | |
BR2 | 10 | CC | 5 | ||
BR3 | 8 | CC | 5 | (0.955)a | |
BR4 | 10 | CC | 5 | (0.970)a | |
BR5 | 10 | CC | 5 | ||
BR6 | 8 | CC | 5 | (0.644)a | |
BR7 | 10 | CC | 5 | ||
Indian | IND1 | 4 | CC | 8 | |
IND2 | 3 | CC | 8 | (0.999)a | |
Italian | IT1 | 4 | CC | 7 |
- D12S1333: alleles 3 = 225 bp, 4 = 227 bp, 7 = 233 bp, 8 = 235 bp, 9 = 237 bp, 10 = 239 bp, 13 = 245 bp; D12S1672: alleles 5 = 279 bp, 7 = 283 bp, 8 = 285 bp, 9 = 287 bp.
- a Probabilities for PHASE haplotypes in parentheses; Brazilian family BR2 showed an interrupted repeat configuration, whereas all the remaining had pure repeats; family BR2 had parkinsonism.
Extended Haplotypes in SCA2 Families
The analysis of SNPs together with STR (D12S1333 and D12S1672) markers in SCA2 families of Portuguese, Brazilian, Indian, and Italian origin showed a conserved C–C haplotype, but a great diversity within and among populations on what concerns more flanking and unstable polymorphisms (Table II). The Portuguese families showed two STR haplotypes more frequently associated to the expansion, 10-Exp-5 and 3-Exp-7, but three other SCA2 haplotypes were observed: 7-Exp-7, 9-Exp-7, and 4-Exp-7. Among Brazilians, there was one major haplotype, 10-Exp-5, in four families, followed by the 8-Exp-5 in two families, and 3-Exp-9 in only one. In the two Indian families, we observed two SCA2 haplotypes: 4-Exp-8 and 3-Exp-8. The haplotype 4-Exp-7 was shared by the Italian family.
The extended haplotype in alleles with 33, 34, and 44 interrupted repeats was 10-C–C-Exp-5 (Table II, family BR2), shared by other five families with pure repeat tracts (Table II). This haplotype was associated to the disease expansion in a total of six families (four Brazilian and two Portuguese).
DISCUSSION
To gain insight into the mutational origin and instability associated with the ATXN2, we assessed repeat tract configuration and flanking haplotypes of normal and expanded alleles. This is the first haplotype study of intragenic SNPs in families with SCA2 from different ethnic origins. This study describes the finding of a common haplotype for expansions with pure repeats or CAA interruptions. Interrupted alleles with 33, 34, and 44 repeats shared the same flanking STR haplotype with five other families, segregating pure repeat tracts. This finding indicates that pure and interrupted expanded alleles share a common ancestral origin and that interrupted alleles below the pathological threshold may be a population reservoir of mutable alleles prone to expansion.
These findings have strong implications for mutation analysis and genetic counseling. We found several indeterminate alleles (31–32 repeats), interrupted by CAA triplets; based on our findings, these alleles might expand into the pathological range in subsequent generations. Furthermore, the polymorphic CCG/CCC tract, immediately adjacent to the (CAG)n, should be assessed by sequencing, at least in intermediate size alleles, since it may misestimate the number of CAG repeats and therefore have implications in the mutation diagnosis. It is often not clear if these are included in repeat size estimation.
Poly(P) Role in SCA2 Alleles
In ataxin-2, the polyglutamine (polyQ) tract is described as being followed by three proline residues, encoded by the (CCG)2CCC sequence. We found a family with a (CCG)3CCC sequence in an expanded allele, immediately adjacent to the poly(Q). Four families have been reported with a polymorphic (CCG)1–2CAG, in the 3′-end of the repeat [Mizushima et al., 1999]. Huntingtin also has a proline repeat adjacent to the poly(Q), polymorphic in size, which seems to modulate poly(Q) toxicity in yeast and mammalian cells [Khoshnan et al., 2002; Dehay and Bertolotti, 2006]. As for huntingtin, two possible roles can be proposed to this proline-rich region in ataxin-2: it may influence (1) protein interactions of ataxin-2 and/or (2) the conformation of the poly(Q) region and, consequently, its structure and toxicity [Bhattacharyya et al., 2006].
Evolution From Normal to Expanded Alleles
The study of the evolution of normal ATXN2 alleles provides further insight into the mutational process (Fig. 2). The C–C haplotype has been identified as the ancestral lineage in several non-human primates [Choudhry et al., 2001]. Regarding the interrupted tract configuration of human's closest living evolutionary relatives, the chimpanzees, our results on ethnically diverse normal human chromosomes from this lineage suggest (1) the loss of the most proximal 5′ interruption in (CAG)8CAA(CAG)4CAA(CAG)8 alleles; if that is the case, the first CAA would have been corrected to CAG, originating (CAG)13CAA(CAG)8 alleles, a hypothesis supported by our finding of a similar STR background for both alleles; (2) that interrupted expanded alleles from the C–C lineage have originated by a multistep process, as 33- and 44-repeat alleles share the STR-haplotype with 19 and 23 alleles (Fig. 3); (3) on the other hand, indeterminate alleles 31 and 32 might have a more distant origin, with the ancient STR haplotype (4-Ind31-7) shared with the majority of 22 repeats alleles; in this background, a recombination upstream the CAG repeat tract would explain the new 8-Ind31-7 haplotype, which, followed by the addition of one CAG at 5′ and a stepwise mutation downstream (between the deleterious repeat and D12S1672) could have given rise to the 8-Ind32-8 haplotype. A multistep process has been suggested to underlie the evolution of normal alleles at other SCA locus [Martins et al., 2006]. Alternatively, a stepwise evolution from the most common 4-Nor22-7 up to the 10-Exp-5 haplotype, as observed in interrupted alleles with 33, 34, and 44 repeats, would also have been possible in a larger time scale; nevertheless, more difficult to explain based on our results.

Postulated expansion mechanism for ATXN2 alleles. Ancestral normal C–C alleles, both by removal of interruptions and by 5′ tract expansion, originated the expanded alleles with pure CAG repeats and also interrupted repeat tracts.

Extended haplotypes in normal and interrupted alleles. A: G–T lineage: 8-G–T-5 was the most frequent (27.3%), followed by 10-G–T-5 (9.1%); alleles with 27 repeats carried the G–T haplotype and different flanking STRs. B: C–C lineage: 4-C–C-7 was the most common haplotype (7.8%).
In the G-T lineage, the observation of only 22, 27, and 32 alleles—with the configurations (CAG)8CAA(CAG)4CAA(CAG)8, (CAG)8CAA(CAG)4CAA(CAG)4CAA(CAG)8, and (CAG)8CAA(CAG)9CAA(CAG)4CAA(CAG)8, respectively—suggests a replication process in their evolution. Assuming (CAG)8CAA(CAG)4CAA(CAG)8 alleles as the ancestral form (once they are present also in the ancestral lineage C–C), alleles 27 would have emerged from alleles 22, by the formation of a hairpin with the sequence CAA(CAG)4 on the nascent strand; alternatively, by unequal crossing-over, the cassette-like structure could vary to include an additional CAA(CAG)4 or CAA(CAG)8 (followed by slippage of one CAG); in this case, however, the less interrupted homologous chromosomes, arisen by the process, would have been lost.
European Ancestry of the SCA2 Expansions
To identify the mutational origin of SCA2, we studied haplotypes in families of Portuguese, Brazilian, Indian, and Italian origin. The C–C haplotype, previously described only in families of Indian origin [Choudhry et al., 2001], was shared by all these families of several origins. This suggests (1) a common ancestry for the SCA2 patients, since in the normal population C–C had a frequency of only 32% (vs. 68% of G–T haplotypes); alternatively, it could reflect (2) the presence of multiple founders on a predisposing SNP background. On the other hand, the analysis of STRs, did not disclose a shared extended haplotype in the ethnically diverse pedigrees, showing instead different degrees of variability.
A higher diversity was observed in the SCA2 Portuguese families (average diversity over loci: 0.69 ± 0.55), when compared to the ones from Brazil (0.48 ± 0.42), India (0.29 ± 0.29 [Pang et al., 1999] or 0.32 ± 0.29 [Choudhry et al., 2001]), or Japan (0.40 ± 0.37 [Mizushima et al., 1999]). A STR haplotype study, including SCA2 families from several ethnic backgrounds, showed different founders in Europe, whereas a conserved D12S1333–D12S1672 haplotype was found in three French West Indian families; in addition, diversity among five North African families led the authors to hypothesize the existence of independent ancestral mutations [Didierjean et al., 1999]. The sharing of a SNP-based haplotype by ethnically different SCA2 families do not support this hypothesis, however, it would be interesting to analyze the SNP background in these pedigrees as well as to extend the STR analysis to other North African families.
Overall, the higher diversity in European families with the same SNP core haplotype suggests an European origin for the ancestral expansion. In agreement with an European origin of SCA2 is the fact that the Indian families previously described shared similar flanking haplotypes; the authors proposed a founder effect in this population [Pang et al., 1999; Choudhry et al., 2001]. The lower genetic variation in India, together with the presence of the same haplotype among the (more diverse) European families, suggest that the SCA2 mutation was introduced by Europeans into India. Although lower than in the Portuguese, the still considerable high variability observed in Brazilian families (from Rio Grande do Sul) may be related to their European origins and could reflect multiple introductions of SCA2 in Brazil. One possibility would be its introduction from Italy, where the relative frequency of SCA2 is high (24–47%) [Filla et al., 2000; Brusco et al., 2004], as in this region of southern Brazil there is a strong influence of Italian settlements.
In conclusion, we found a shared ancestral ATXN2 SNP-based haplotype in pure and CAA interrupted alleles in SCA2, with strong implications for diagnosis, as well as for genetic counseling, as alleles with interrupted repeat tracts with fewer than 32 triplets may be a reservoir of alleles prone to expansion in subsequent generations, leading to full disease penetrance.
Acknowledgements
We would like to thank Victor Mendes for technical assistance. This work was supported by research grants POCI/SAU-MMO/56387/2004 and PIC/IC/82897/2007, FCT (Fundação para a Ciência e Tecnologia) and co-funded by FEDER. S.M. and I.A. are recipients of scholarships SFRH/BPD/29225/2006 and SFRH/BPD/27781/2006 (from FCT). The experiments performed comply with the current laws of Portugal.